太长不看:重启大法好,sudo reboot
先说个题外话,dockerd
是用sudo
启动的话,docker-compose
也要用sudo
。
有一次排查问题,我多次重启docker
sudo systemctl restart docker
并且删除了/run/containerd/containerd.sock
和/var/run/containerd/containerd.sock
两个socket文件
1 2
| sudo ls -l /run/containerd/containerd.sock srw-rw----. 1 root root 0 Oct 10 13:59 /run/containerd/containerd.sock
|
然后docker就起不来了 >.<
那就排查问题,先看看sudo systemctl status docker -l
和sudo journalctl -xe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) Active: activating (start) since Thu 2019-10-10 13:48:27 CST; 14s ago Docs: https://docs.docker.com Main PID: 25992 (dockerd) Tasks: 12 Memory: 39.7M CGroup: /system.slice/docker.service └─25992 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Oct 10 13:48:27 node-1.local systemd[1]: Starting Docker Application Container Engine... Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.700450359+08:00" level=info msg="Starting up" Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701786486+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701818831+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701856021+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc Oct 10 13:48:27 node-1.local dockerd[25992]: time="2019-10-10T13:48:27.701880147+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
|
看不出问题,那就用debug模式启动docker看看
sudo /usr/bin/dockerd --containerd=/run/containerd/containerd.sock
1 2 3 4 5 6 7 8 9 10
| INFO[2019-10-10T14:35:29.260193095+08:00] Starting up DEBU[2019-10-10T14:35:29.260691124+08:00] Listener created for HTTP on unix (/var/run/docker.sock) DEBU[2019-10-10T14:35:29.261151949+08:00] Golang's threads limit set to 57240 INFO[2019-10-10T14:35:29.261477776+08:00] parsed scheme: "unix" module=grpc INFO[2019-10-10T14:35:29.261496182+08:00] scheme "unix" not registered, fallback to default scheme module=grpc INFO[2019-10-10T14:35:29.261518855+08:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>} module=grpc INFO[2019-10-10T14:35:29.261529702+08:00] ClientConn switching balancer to "pick_first" module=grpc WARN[2019-10-10T14:35:29.262168877+08:00] grpc: addrConn.createTransport failed to connect to {unix:///run/containerd/containerd.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused". Reconnecting... module=grpc DEBU[2019-10-10T14:35:29.262315604+08:00] Cleaning up old mountid : start. failed to start daemon: failed to dial "/run/containerd/containerd.sock": all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable
|
找到问题了,是连接不到/run/containerd/containerd.sock
,那被我删了嘛,docker自己不会新建嘛?那我帮它创建好了。
创建socket文件的方式有mksock
、nc
或者用python
sudo mksocket /run/containerd/containerd.sock
sudo nc -lU /run/containerd/containerd.sock
python -c "import socket as s; sock = s.socket(s.AF_UNIX); sock.bind('/run/containerd/containerd.sock')"
继续debug模式启动docker,可是得到了上述同样的报错
最后我把docker卸载了重装也不行,无奈之下只好….
重启大法
sudo reboot
重启docker服务,搞定。