Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1367141

Summary: docker not removing cgroups for exited containers
Product: Red Hat Enterprise Linux 7 Reporter: Seth Jennings <sjenning>
Component: dockerAssignee: Matthew Heon <mheon>
Status: CLOSED CURRENTRELEASE QA Contact: atomic-bugs <atomic-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: dwalsh, jshivers, lsm5, mpatel
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-30 15:01:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1328913, 1357081, 1496245    

Description Seth Jennings 2016-08-15 17:03:27 UTC
Description of problem:

Repeating errors in cadvisor/kubelet/openshift-node process

Either:

Unable to get network stats from pid 128265: couldn't read network stats: failure opening /proc/128265/net/dev: open /proc/128265/net/dev: no such file or directory

or

failed to collect filesystem stats - du command failed on /rootfs/var/lib/docker/containers/a5febc665c196470ec125d7354103b707604      476ad8c31255d54a53cdb9352b41 with output stdout: , stderr: du: cannot access '/rootfs/var/lib/docker/containers/a5febc665c196470ec125d7354103b707604476ad8c31255d54a53cdb9352b41': No such file or directory

Follows this error:

Failed to remove orphaned pod "c134c5fe-4d6c-11e6-b1cc-0e227273c3bd" dir; err: remove /var/lib/origin/openshift.local.volumes/pods/c134c5fe-4d6c-11e6-b1cc-0e227273c3bd/volumes/kubernetes.io~secret/deployer-token-0hr8m: device or resource busy

Or this error:

kernel: XFS (dm-95): Unmounting Filesystem
kernel: device-mapper: thin: Deletion of thin device 68560 failed.
forward-journal[18687]: time="2016-07-15T04:55:27.490877319Z" level=error msg="Handler for DELETE /v1.21/containers/5cec00e7355de03fdc8966895223776c1163875b92a7ed05734ea80ce6527436 returned error: Driver devicemapper failed to remove root filesystem 5cec00e7355de03fdc8966895223776c1163875b92a7ed05734ea80ce6527436: Device is Busy"
forward-journal[18687]: time="2016-07-15T04:55:27.491095598Z" level=error msg="HTTP Error" err="Driver devicemapper failed to remove root filesystem 5cec00e7355de03fdc8966895223776c1163875b92a7ed05734ea80ce6527436: Device is Busy" statusCode=500

Version-Release number of selected component (if applicable):
Openshift 3.1

How reproducible:
No reproduction procedure.  Seems to be a race.

Steps to Reproduce:
1.
2.
3.

Actual results:

The device mapper thin device fails to unmount or remove pod directories due to a "device busy" error.  This causes the container's cgroup to not be cleaned up properly.  This causes cadvisor (and therefore kubelet and openshift-node) to continue monitoring for a container whose pid 1 has exited, leading to failure messages in the logs.

Expected results:

Ideally, we would not encounter the "device busy" errors which leave orphaned thin devices everywhere.  But in the case the error does occur (due to a kernel bug, race in docker, or whatever), docker should _always_ clean up the container cgroup so that cadvisor will stop monitoring the container.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1328913
https://bugzilla.redhat.com/show_bug.cgi?id=1357081

Comment 1 Seth Jennings 2016-08-15 17:15:35 UTC
Should be noted that the device mapper error causing the orphaned cgroup issue is a theory.  Might not be right.  The real problem is the orphaned cgoup which was confirmed here:

https://bugzilla.redhat.com/show_bug.cgi?id=1328913#c9

The reason the cgroup is not being removed is really unknown at this point.

Comment 3 Daniel Walsh 2016-10-18 13:51:21 UTC
Mrunal have any ideas here?

Comment 4 Mrunal Patel 2016-10-18 17:53:06 UTC
I think that this is related to the mounts leaking into machined issue.

Comment 5 Daniel Walsh 2017-06-30 15:01:37 UTC
Should be fixed by oci-umount I believe.