Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1328204

Summary: docker/docker-latest daemons unable to clean up thinp resources
Product: Red Hat Enterprise Linux 7 Reporter: Qian Cai <qcai>
Component: dockerAssignee: Stephen Chung <schung>
Status: CLOSED DUPLICATE QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: ajia, amurdaca, jrieden, lsm5, vgoyal
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-20 13:35:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the docker-latest journal file when it is in a mess
none
docker-latest daemon journal with debug enabled none

Description Qian Cai 2016-04-18 17:37:19 UTC
Description of problem:
Right now, it is possible to start either docker or docker-latest daemon if there is another docker/docker-latest is running. The second one will take over the first one became the one that is running, i.e.,

# systemctl start docker
# systemctl start docker-latest
(Now, docker-latest is running, and docker is dead.)

After this happened, it created a mess including systemctl stop docker/docker-latest or docker/docker-latest stop $(docker/docker-latest ps -a -q) unable to clean up thinp resources. For example,

# docker-latest run -d nginx
# dmsetup info
Name:              rhel--latest-docker--latest--pool
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-11skPfGYN2KRf5gk6ioa79x0edYyxteFMlKw3QbD3wS1O8ueAFHPaVtfTgmnRo4g-tpool
...
Name:              docker-8:1-8571283-425cb51f6f604209c0dbc26691918c9dfe685f16188a29107352fbb39670ebdf
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 6
Number of targets: 1

# docker-latest stop $(docker-latest ps -a -q)
(No docker process is running after that.)
# dmsetup info
Name:              rhel--latest-docker--latest--pool
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-11skPfGYN2KRf5gk6ioa79x0edYyxteFMlKw3QbD3wS1O8ueAFHPaVtfTgmnRo4g-tpool
...
Name:              docker-8:1-8571283-425cb51f6f604209c0dbc26691918c9dfe685f16188a29107352fbb39670ebdf
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 6
Number of targets: 1

# systemctl stop docker-latest
# dmsetup info
(Same output as the above.)

As the results, there is no way to destroy the thinp,

# lvremove rhel-latest
  Logical volume rhel-latest/docker-latest-pool is used by another device.

until issued one of the followings,

1) "docker-latest rm"
2) reboot
3) manually deleted all docker-8:1-xxx devices using "dmsetup remove -f"

How reproducible:
always

Comment 1 Daniel Walsh 2016-04-18 17:48:32 UTC
Vivek any ideas?

Comment 2 Qian Cai 2016-04-18 17:53:33 UTC
Created attachment 1148260 [details]
the docker-latest journal file when it is in a mess

Comment 4 Daniel Walsh 2016-04-18 17:57:12 UTC
Antonio from the log file above it looks like we have the parseint error.

Comment 5 Vivek Goyal 2016-04-18 17:59:21 UTC
Looks like some error at higher level fails to cleanup container storage hence making thin pool busy.

Comment 6 Antonio Murdaca 2016-04-18 18:12:21 UTC
Dan is that happening with a docker 1.9 daemon? Not sure we merged the fix in rhel 1.9

Comment 7 Daniel Walsh 2016-04-18 18:18:11 UTC
Which patch?

Comment 8 Antonio Murdaca 2016-04-18 18:20:07 UTC
https://github.com/projectatomic/docker/pull/88

We didn't end up merging this in Fedora branch, nor we merged it for rhel in 1.9

I'll make a PR against the rhel branch for inclusion

Comment 9 Qian Cai 2016-04-18 18:41:31 UTC
OK, it turned out it does not even need docker/docker-latest daemons to take over each other to reproduce this. Just use all legit operations.

# rpm -qa | grep docker
docker-selinux-1.9.1-28.el7.x86_64
docker-1.9.1-28.el7.x86_64
docker-utils-1.9.1-28.el7.x86_64
docker-forward-journald-1.9.1-28.el7.x86_64
docker-latest-1.10.3-9.el7.x86_64

# cat /etc/sysconfig/docker-storage-setup
DEVS=/dev/vda
VG=rhel
# cat /etc/sysconfig/docker-latest-storage-setup
DEVS=/dev/vdb
VG=rhel-latest

Everything starts from scratch.
# systemctl stop docker
# systemctl stop docker-latest
# rm -rf /var/lib/docker/*
# rm -rf /var/lib/docker-latest/*
# rm -f /etc/sysconfig/docker-storage
# rm -f /etc/sysconfig/docker-latest-storage
# lvs
(Empty as expected.)
# dmsetup info | grep 'docker-8:1'
(Empty as expected.)

# systemctl start docker
# lvs
  LV          VG   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel twi-a-t--- 3.19g             1.82   0.33                            
# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-04-18 14:11:06 EDT; 5s ago

# Easy to reproduce by running a lot of containers.
# for i in `seq 1 100`; do docker run -d nginx; done
# docker stop $(docker ps -a -q)
# dmsetup info | grep 'docker-8:1'
(Empty as expected.)

# docker info
Containers: 100
Images: 8
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: rhel-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 2.992 GB
 Data Space Total: 5.931 GB
 Data Space Available: 2.939 GB
 Metadata Space Used: 3.027 MB
 Metadata Space Total: 12.58 MB
 Metadata Space Available: 9.556 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: Employee SKU
CPUs: 2
Total Memory: 3.702 GiB
Name: rhel7-00.localdomain
ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

# systemctl stop docker

# systemctl start docker-latest
# lvs
  LV                 VG          Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool        rhel        twi-a-t--- 4.60g             59.27  24.06                           
  docker-latest-pool rhel-latest twi-a-t--- 3.19g             0.60   0.33                            
# systemctl status docker-latest
● docker-latest.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker-latest.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-04-18 14:14:10 EDT; 4s ago

# for i in `seq 1 100`; do docker run -d nginx; done
# docker-latest stop $(docker-latest ps -a -q)
# dmsetup info | grep 'docker-8:1'
Name:              docker-8:1-41991960-b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466
Name:              docker-8:1-41991960-404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187
Name:              docker-8:1-41991960-40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7
...

# dmsetup info
Name:              rhel--latest-docker--latest--pool
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        99

# docker-latest info
Containers: 100
 Running: 0
 Paused: 0
 Stopped: 100
Images: 1
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: rhel--latest-docker--latest--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 3.102 GB
 Data Space Total: 3.427 GB
 Data Space Available: 324.5 MB
 Metadata Space Used: 3.039 MB
 Metadata Space Total: 12.58 MB
 Metadata Space Available: 9.544 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: journald
Plugins: 
 Volume: local
 Network: host bridge null
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 0
CPUs: 2
Total Memory: 3.702 GiB
Name: rhel7-00.localdomain
ID: BOU7:OGFB:IV5G:Z5SH:EI6T:NGXN:RDKR:MHWM:IJUE:DYBU:ZATU:GENW
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Registries: docker.io (secure)

Comment 10 Qian Cai 2016-04-18 18:48:34 UTC
(In reply to CAI Qian from comment #9)
> # for i in `seq 1 100`; do docker run -d nginx; done
Also tried docker-latest instead docker, and still same issue.
> # docker-latest stop $(docker-latest ps -a -q)
> # dmsetup info | grep 'docker-8:1'
> Name:             
> docker-8:1-41991960-
> b54610d86248a72e653dbbd34f222352d5690faa0a8464869722db94b88e4466
> Name:             
> docker-8:1-41991960-
> 404f2097bb6a53625f708907036d7c00339811939157734c759c826482f84187
> Name:             
> docker-8:1-41991960-
> 40fc396d0b2aa5dab237c2f2e76af84b7d48ab37deafd3ac3a2c43fe2f34e2f7
> ...

# ps aux | grep docker
...
root      8868  2.6  1.8 5259952 70544 ?       Sl   14:35   0:19 /usr/bin/docker-latest daemon --exec-opt native.cgroupdriver=systemd -g /var/lib/docker-latest --selinux-enabled --log-driver=journald --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel--latest-docker--latest--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true

Comment 11 Qian Cai 2016-04-18 19:16:18 UTC
This time lots of those errors in the journal,

Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885603306-04:00" level=error msg="devmapper: Error unmounting device 421409e18955c41801c0ea5e9a9c18d08f9d2b7908804d06f86ec93d0bb3215e: invalid argument"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.885640183-04:00" level=error msg="Error unmounting container 842e029fbc03ac6ee7cf1b3cdfe79c39ccbab41ded849e1c9c35abb7f3b8bc1a: invalid argument"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887038753-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Apr 18 14:38:14 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:14.887059613-04:00" level=info msg="{Action=stop, ID=2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f, LoginUID=1000, PID=14470}"
Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148741288-04:00" level=error msg="devmapper: Error unmounting device 346562c043c92d1b64982c9a1c20f0491c68192bd42492fd7ccd289be9df424b: invalid argument"
Apr 18 14:38:15 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:15.148773849-04:00" level=error msg="Error unmounting container 2d50e31a73c9f60b78f05fc9b87a483c0a2e3ed75577c236e60cf33f21f4207f: invalid argument"
Apr 18 14:38:48 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:38:48.603985368-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"
Apr 18 14:45:01 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T14:45:01.191656527-04:00" level=error msg="Failed to get pwuid struct: strconv.ParseInt: parsing \"Ϩ\": invalid syntax"

Comment 12 Qian Cai 2016-04-18 19:22:29 UTC
Again, # docker-latest rm $(docker-latest ps -a -q) can clean all the mess and destroyed all the dm-devices.

Comment 13 Qian Cai 2016-04-18 19:49:06 UTC
Occasionally, we saw those errors in the docker-latest journal too,

Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407826974-04:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker-latest/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/shm: invalid argument"
Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.407852802-04:00" level=error msg="Error unmounting container eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d: not mounted"
Apr 18 15:27:59 rhel7-00.localdomain forward-journal[8869]: time="2016-04-18T15:27:59.408034711-04:00" level=error msg="Handler for POST /v1.22/containers/eb90b7099eaa2a18fe3968551a4cee19515c02160ced417a040748238702e47d/start returned error: Container command not found or does not exist.

Comment 14 Qian Cai 2016-04-18 21:45:53 UTC
Created attachment 1148284 [details]
docker-latest daemon journal with debug enabled

Comment 18 Daniel Walsh 2016-06-03 13:05:21 UTC
Stephen can you take a look at this one.

Comment 19 Qian Cai 2016-06-20 13:35:39 UTC

*** This bug has been marked as a duplicate of bug 1347821 ***