| Summary: | docker/docker-latest daemons unable to clean up thinp resources | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Qian Cai <qcai> | ||||||
| Component: | docker | Assignee: | Stephen Chung <schung> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | atomic-bugs <atomic-bugs> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.2 | CC: | ajia, amurdaca, jrieden, lsm5, vgoyal | ||||||
| Target Milestone: | rc | Keywords: | Extras | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-06-20 13:35:39 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Qian Cai
2016-04-18 17:37:19 UTC
Vivek any ideas? Created attachment 1148260 [details]
the docker-latest journal file when it is in a mess
Antonio from the log file above it looks like we have the parseint error. Looks like some error at higher level fails to cleanup container storage hence making thin pool busy. Dan is that happening with a docker 1.9 daemon? Not sure we merged the fix in rhel 1.9 Which patch? https://github.com/projectatomic/docker/pull/88 We didn't end up merging this in Fedora branch, nor we merged it for rhel in 1.9 I'll make a PR against the rhel branch for inclusion OK, it turned out it does not even need docker/docker-latest daemons to take over each other to reproduce this. Just use all legit operations. # rpm -qa | grep docker docker-selinux-1.9.1-28.el7.x86_64 docker-1.9.1-28.el7.x86_64 docker-utils-1.9.1-28.el7.x86_64 docker-forward-journald-1.9.1-28.el7.x86_64 docker-latest-1.10.3-9.el7.x86_64 # cat /etc/sysconfig/docker-storage-setup DEVS=/dev/vda VG=rhel # cat /etc/sysconfig/docker-latest-storage-setup DEVS=/dev/vdb VG=rhel-latest Everything starts from scratch. # systemctl stop docker # systemctl stop docker-latest # rm -rf /var/lib/docker/* # rm -rf /var/lib/docker-latest/* # rm -f /etc/sysconfig/docker-storage # rm -f /etc/sysconfig/docker-latest-storage # lvs (Empty as expected.) # dmsetup info | grep 'docker-8:1' (Empty as expected.) # systemctl start docker # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool rhel twi-a-t--- 3.19g 1.82 0.33 # systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2016-04-18 14:11:06 EDT; 5s ago # Easy to reproduce by running a lot of containers. # for i in `seq 1 100`; do docker run -d nginx; done # docker stop $(docker ps -a -q) # dmsetup info | grep 'docker-8:1' (Empty as expected.) # docker info Containers: 100 Images: 8 Server Version: 1.9.1 Storage Driver: devicemapper Pool Name: rhel-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 107.4 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 2.992 GB Data Space Total: 5.931 GB Data Space Available: 2.939 GB Metadata Space Used: 3.027 MB Metadata Space Total: 12.58 MB Metadata Space Available: 9.556 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2015-12-01) Execution Driver: native-0.2 Logging Driver: json-file Kernel Version: 3.10.0-327.10.1.el7.x86_64 Operating System: Employee SKU CPUs: 2 Total Memory: 3.702 GiB Name: rhel7-00.localdomain ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled # systemctl stop docker # systemctl start docker-latest # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool rhel twi-a-t--- 4.60g 59.27 24.06 docker-latest-pool rhel-latest twi-a-t--- 3.19g 0.60 0.33 # systemctl status docker-latest ● docker-latest.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker-latest.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2016-04-18 14:14:10 EDT; 4s ago # for i in `seq 1 100`; do docker run -d nginx; done # docker-latest stop $(docker-latest ps -a -q) # dmsetup info | grep 'docker-8:1' Name: docker-8:1-41991960-b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466 Name: docker-8:1-41991960-404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187 Name: docker-8:1-41991960-40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7 ... # dmsetup info Name: rhel--latest-docker--latest--pool State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 99 # docker-latest info Containers: 100 Running: 0 Paused: 0 Stopped: 100 Images: 1 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: rhel--latest-docker--latest--pool Pool Blocksize: 524.3 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 3.102 GB Data Space Total: 3.427 GB Data Space Available: 324.5 MB Metadata Space Used: 3.039 MB Metadata Space Total: 12.58 MB Metadata Space Available: 9.544 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2015-12-01) Execution Driver: native-0.2 Logging Driver: journald Plugins: Volume: local Network: host bridge null Kernel Version: 3.10.0-327.10.1.el7.x86_64 Operating System: Employee SKU OSType: linux Architecture: x86_64 Number of Docker Hooks: 0 CPUs: 2 Total Memory: 3.702 GiB Name: rhel7-00.localdomain ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Registries: docker.io (secure) (In reply to CAI Qian from comment #9) > # for i in `seq 1 100`; do docker run -d nginx; done Also tried docker-latest instead docker, and still same issue. > # docker-latest stop $(docker-latest ps -a -q) > # dmsetup info | grep 'docker-8:1' > Name: > docker-8:1-41991960- > b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466 > Name: > docker-8:1-41991960- > 404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187 > Name: > docker-8:1-41991960- > 40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7 > ... # ps aux | grep docker ... root 8868 2.6 1.8 5259952 70544 ? Sl 14:35 0:19 /usr/bin/docker-latest daemon --exec-opt native.cgroupdriver=systemd -g /var/lib/docker-latest --selinux-enabled --log-driver=journald --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel--latest-docker--latest--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true This time lots of those errors in the journal,
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885603306-04:00" level=error msg="devmapper: Error unmounting device 421409e18955c41801c0ea5e9a9c18d08f9d2b7908804d06f86ec93d0bb3215e: invalid argument"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885640183-04:00" level=error msg="Error unmounting container 842e029fbc03ac6ee7cf1b3cdfe79c39ccbab41ded849e1c9c35abb7f3b8bc1a: invalid argument"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887038753-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887059613-04:00" level=info msg="{Action=stop, ID=2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f, LoginUID=1000, PID=14470}"
Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148741288-04:00" level=error msg="devmapper: Error unmounting device 346562c043c92d1b64982c9a1c20f0491c68192bd42492fd7ccd289be9df424b: invalid argument"
Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148773849-04:00" level=error msg="Error unmounting container 2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f: invalid argument"
Apr 18 14:38:48 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:48.603985368-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Apr 18 14:45:01 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:45:01.191656527-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Again, # docker-latest rm $(docker-latest ps -a -q) can clean all the mess and destroyed all the dm-devices. Occasionally, we saw those errors in the docker-latest journal too, Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407826974-04:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker-latest/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/shm: invalid argument" Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407852802-04:00" level=error msg="Error unmounting container eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d: not mounted" Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.408034711-04:00" level=error msg="Handler for POST /v1.22/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/start returned error: Container command not found or does not exist. Created attachment 1148284 [details]
docker-latest daemon journal with debug enabled
Stephen can you take a look at this one. *** This bug has been marked as a duplicate of bug 1347821 *** |