朱晓峰

一只生之无趣死之乏味的丧家之犬

0%

docker无法启动解决方案

太长不看:重启大法好,sudo reboot

先说个题外话,dockerd是用sudo启动的话,docker-compose也要用sudo

有一次排查问题,我多次重启docker

sudo systemctl restart docker

并且删除了/run/containerd/containerd.sock/var/run/containerd/containerd.sock两个socket文件

1
2
sudo ls -l /run/containerd/containerd.sock
srw-rw----. 1 root root 0 Oct 10 13:59 /run/containerd/containerd.sock

然后docker就起不来了 >.<

那就排查问题,先看看sudo systemctl status docker -lsudo journalctl -xe

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: activating (start) since Thu 2019-10-10 13:48:27 CST; 14s ago
Docs: https://docs.docker.com
Main PID: 25992 (dockerd)
Tasks: 12
Memory: 39.7M
CGroup: /system.slice/docker.service
└─25992 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Oct 10 13:48:27 node-1.local systemd[1]: Starting Docker Application Container Engine...
Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.700450359+08:00" level=info msg="Starting up"
Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701786486+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701818831+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701856021+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701880147+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc

看不出问题,那就用debug模式启动docker看看

sudo /usr/bin/dockerd --containerd=/run/containerd/containerd.sock

1
2
3
4
5
6
7
8
9
10
INFO[2019-10-10T14:35:29.260193095+08:00] Starting up
DEBU[2019-10-10T14:35:29.260691124+08:00] Listener created for HTTP on unix (/var/run/docker.sock)
DEBU[2019-10-10T14:35:29.261151949+08:00] Golang's threads limit set to 57240
INFO[2019-10-10T14:35:29.261477776+08:00] parsed scheme: "unix" module=grpc
INFO[2019-10-10T14:35:29.261496182+08:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2019-10-10T14:35:29.261518855+08:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>} module=grpc
INFO[2019-10-10T14:35:29.261529702+08:00] ClientConn switching balancer to "pick_first" module=grpc
WARN[2019-10-10T14:35:29.262168877+08:00] grpc: addrConn.createTransport failed to connect to {unix:///run/containerd/containerd.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused". Reconnecting... module=grpc
DEBU[2019-10-10T14:35:29.262315604+08:00] Cleaning up old mountid : start.
failed to start daemon: failed to dial "/run/containerd/containerd.sock": all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable

找到问题了,是连接不到/run/containerd/containerd.sock,那被我删了嘛,docker自己不会新建嘛?那我帮它创建好了。

创建socket文件的方式有mksocknc或者用python

sudo mksocket /run/containerd/containerd.sock

sudo nc -lU /run/containerd/containerd.sock

python -c "import socket as s; sock = s.socket(s.AF_UNIX); sock.bind('/run/containerd/containerd.sock')"

继续debug模式启动docker,可是得到了上述同样的报错

最后我把docker卸载了重装也不行,无奈之下只好….

重启大法

sudo reboot

重启docker服务,搞定。