Hide Forgot
Description of problem: Right now, it is possible to start either docker or docker-latest daemon if there is another docker/docker-latest is running. The second one will take over the first one became the one that is running, i.e., # systemctl start docker # systemctl start docker-latest (Now, docker-latest is running, and docker is dead.) After this happened, it created a mess including systemctl stop docker/docker-latest or docker/docker-latest stop $(docker/docker-latest ps -a -q) unable to clean up thinp resources. For example, # docker-latest run -d nginx # dmsetup info Name: rhel--latest-docker--latest--pool State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-11skPfGYN2KRf5gk6ioa79x0edYyxteFMlKw3QbD3wS1O8ueAFHPaVtfTgmnRo4g-tpool ... Name: docker-8:1-8571283-425cb51f6f604209c0dbc26691918c9dfe685f16188a29107352fbb39670ebdf State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 # docker-latest stop $(docker-latest ps -a -q) (No docker process is running after that.) # dmsetup info Name: rhel--latest-docker--latest--pool State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-11skPfGYN2KRf5gk6ioa79x0edYyxteFMlKw3QbD3wS1O8ueAFHPaVtfTgmnRo4g-tpool ... Name: docker-8:1-8571283-425cb51f6f604209c0dbc26691918c9dfe685f16188a29107352fbb39670ebdf State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 6 Number of targets: 1 # systemctl stop docker-latest # dmsetup info (Same output as the above.) As the results, there is no way to destroy the thinp, # lvremove rhel-latest Logical volume rhel-latest/docker-latest-pool is used by another device. until issued one of the followings, 1) "docker-latest rm" 2) reboot 3) manually deleted all docker-8:1-xxx devices using "dmsetup remove -f" How reproducible: always
Vivek any ideas?
Created attachment 1148260 [details] the docker-latest journal file when it is in a mess
Antonio from the log file above it looks like we have the parseint error.
Looks like some error at higher level fails to cleanup container storage hence making thin pool busy.
Dan is that happening with a docker 1.9 daemon? Not sure we merged the fix in rhel 1.9
Which patch?
https://github.com/projectatomic/docker/pull/88 We didn't end up merging this in Fedora branch, nor we merged it for rhel in 1.9 I'll make a PR against the rhel branch for inclusion
OK, it turned out it does not even need docker/docker-latest daemons to take over each other to reproduce this. Just use all legit operations. # rpm -qa | grep docker docker-selinux-1.9.1-28.el7.x86_64 docker-1.9.1-28.el7.x86_64 docker-utils-1.9.1-28.el7.x86_64 docker-forward-journald-1.9.1-28.el7.x86_64 docker-latest-1.10.3-9.el7.x86_64 # cat /etc/sysconfig/docker-storage-setup DEVS=/dev/vda VG=rhel # cat /etc/sysconfig/docker-latest-storage-setup DEVS=/dev/vdb VG=rhel-latest Everything starts from scratch. # systemctl stop docker # systemctl stop docker-latest # rm -rf /var/lib/docker/* # rm -rf /var/lib/docker-latest/* # rm -f /etc/sysconfig/docker-storage # rm -f /etc/sysconfig/docker-latest-storage # lvs (Empty as expected.) # dmsetup info | grep 'docker-8:1' (Empty as expected.) # systemctl start docker # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool rhel twi-a-t--- 3.19g 1.82 0.33 # systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2016-04-18 14:11:06 EDT; 5s ago # Easy to reproduce by running a lot of containers. # for i in `seq 1 100`; do docker run -d nginx; done # docker stop $(docker ps -a -q) # dmsetup info | grep 'docker-8:1' (Empty as expected.) # docker info Containers: 100 Images: 8 Server Version: 1.9.1 Storage Driver: devicemapper Pool Name: rhel-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 107.4 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 2.992 GB Data Space Total: 5.931 GB Data Space Available: 2.939 GB Metadata Space Used: 3.027 MB Metadata Space Total: 12.58 MB Metadata Space Available: 9.556 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2015-12-01) Execution Driver: native-0.2 Logging Driver: json-file Kernel Version: 3.10.0-327.10.1.el7.x86_64 Operating System: Employee SKU CPUs: 2 Total Memory: 3.702 GiB Name: rhel7-00.localdomain ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled # systemctl stop docker # systemctl start docker-latest # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool rhel twi-a-t--- 4.60g 59.27 24.06 docker-latest-pool rhel-latest twi-a-t--- 3.19g 0.60 0.33 # systemctl status docker-latest ● docker-latest.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker-latest.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2016-04-18 14:14:10 EDT; 4s ago # for i in `seq 1 100`; do docker run -d nginx; done # docker-latest stop $(docker-latest ps -a -q) # dmsetup info | grep 'docker-8:1' Name: docker-8:1-41991960-b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466 Name: docker-8:1-41991960-404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187 Name: docker-8:1-41991960-40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7 ... # dmsetup info Name: rhel--latest-docker--latest--pool State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 99 # docker-latest info Containers: 100 Running: 0 Paused: 0 Stopped: 100 Images: 1 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: rhel--latest-docker--latest--pool Pool Blocksize: 524.3 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 3.102 GB Data Space Total: 3.427 GB Data Space Available: 324.5 MB Metadata Space Used: 3.039 MB Metadata Space Total: 12.58 MB Metadata Space Available: 9.544 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2015-12-01) Execution Driver: native-0.2 Logging Driver: journald Plugins: Volume: local Network: host bridge null Kernel Version: 3.10.0-327.10.1.el7.x86_64 Operating System: Employee SKU OSType: linux Architecture: x86_64 Number of Docker Hooks: 0 CPUs: 2 Total Memory: 3.702 GiB Name: rhel7-00.localdomain ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Registries: docker.io (secure)
(In reply to CAI Qian from comment #9) > # for i in `seq 1 100`; do docker run -d nginx; done Also tried docker-latest instead docker, and still same issue. > # docker-latest stop $(docker-latest ps -a -q) > # dmsetup info | grep 'docker-8:1' > Name: > docker-8:1-41991960- > b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466 > Name: > docker-8:1-41991960- > 404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187 > Name: > docker-8:1-41991960- > 40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7 > ... # ps aux | grep docker ... root 8868 2.6 1.8 5259952 70544 ? Sl 14:35 0:19 /usr/bin/docker-latest daemon --exec-opt native.cgroupdriver=systemd -g /var/lib/docker-latest --selinux-enabled --log-driver=journald --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel--latest-docker--latest--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true
This time lots of those errors in the journal, Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885603306-04:00" level=error msg="devmapper: Error unmounting device 421409e18955c41801c0ea5e9a9c18d08f9d2b7908804d06f86ec93d0bb3215e: invalid argument" Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885640183-04:00" level=error msg="Error unmounting container 842e029fbc03ac6ee7cf1b3cdfe79c39ccbab41ded849e1c9c35abb7f3b8bc1a: invalid argument" Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887038753-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax" Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887059613-04:00" level=info msg="{Action=stop, ID=2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f, LoginUID=1000, PID=14470}" Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148741288-04:00" level=error msg="devmapper: Error unmounting device 346562c043c92d1b64982c9a1c20f0491c68192bd42492fd7ccd289be9df424b: invalid argument" Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148773849-04:00" level=error msg="Error unmounting container 2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f: invalid argument" Apr 18 14:38:48 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:48.603985368-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax" Apr 18 14:45:01 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:45:01.191656527-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Again, # docker-latest rm $(docker-latest ps -a -q) can clean all the mess and destroyed all the dm-devices.
Occasionally, we saw those errors in the docker-latest journal too, Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407826974-04:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker-latest/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/shm: invalid argument" Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407852802-04:00" level=error msg="Error unmounting container eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d: not mounted" Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.408034711-04:00" level=error msg="Handler for POST /v1.22/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/start returned error: Container command not found or does not exist.
Created attachment 1148284 [details] docker-latest daemon journal with debug enabled
Stephen can you take a look at this one.
*** This bug has been marked as a duplicate of bug 1347821 ***