RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1403027 - Build fails with Driver devicemapper failed to remove root filesystem: mount still active
Summary: Build fails with Driver devicemapper failed to remove root filesystem: mount ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Daniel Walsh
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1186913
TreeView+ depends on / blocked
 
Reported: 2016-12-08 22:43 UTC by Walid A.
Modified: 2020-05-14 15:27 UTC (History)
16 users (show)

Fixed In Version: docker-1.12.6-11.el7
Doc Type: Bug Fix
Doc Text:
Cause: Previous docker-storage-setup code tried to determine the amount of free space to leave when creating volumes with lvm while lvm did not provide the size of the spare volume. Consequence: It was possible for lvconvert to fail because of lack of available space. Fix: Use lvm to create the metadata, data and spare volume while using the --poolmetadatasize option to control the metadata size. Result: This patch changes behavior of MIN_DATA_SIZE. Previously we will make sure free space in VG is more than MIN_DATA_SIZE before creating data volume (and after crating metadata volume). Now we do the same check before creating metadata volume. Given metadata volume is very small (.1% of data size). This can be considered sort of round off error and should not be a real problem.
Clone Of:
Environment:
Last Closed: 2017-11-21 21:46:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Walid A. 2016-12-08 22:43:29 UTC
Description of problem:
-----------------------
This is on OCP Cluster of 5 nodes: 1 etcd, 1 master, 1 infra node, and 2 application nodes, running on AWS EC2 m4.xlarge instances.  

NOTE: This cluster installed with docker 1.10.3

During multi-day reliability testing (where multiple projects are created over time, users added/removed, builds executed etc.), on day 2, we started seeing intermittent build errors for several of the sample applications (dancer-mysql-example, cake-php-exampel, django-psqlexample, etc) at 35% failure rate:

error: Execution of post execute step failed
warning: Failed to remove container "80a380004979dd6536e6b76fe3460d0391090d96418d022b763b3744d54a7c23": Error response from daemon: Driver devicemapper failed to remove root filesystem 80a380004979dd6536e6b76fe3460d0391090d96418d022b763b3744d54a7c23: mount still active
error: build error: building walid/dancer-mysql-example-1:02773ab0 failed when committing the image due to error: Cannot connect to the Docker daemon. Is the docker daemon running on this host?

Note what is causing the "mount still active" error.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
oc v3.4.0.32+d349492
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.4.0.32+d349492
kubernetes v1.4.0+776c994

docker 1.10.3

How reproducible:
-----------------
Reproducible almost at will.

Steps to Reproduce:
===================
1. Install OCP 3.4.0.32 cluster with docker 1.10.3
2. Run reliability testing, then manually create a new project
3. Inside new project, run:  oc new-app dancer-mysql-example.  Right now on this cluster I can reproduce it the first time I create a new project and a new app.

Actual results:
===============
# oc get pods
NAME                           READY     STATUS    RESTARTS   AGE
dancer-mysql-example-1-build   0/1       Error     0          38m
database-1-h939j               1/1       Running   0          38m

Expected results:
=================
# oc get pods
NAME                           READY     STATUS    RESTARTS   AGE
dancer-mysql-example-1-build   0/1       Completed 0          38m
dancer-mysql-example-1-pk317   1/1       Running   0          38m
database-1-h939j               1/1       Running   0          38m


Additional info:
================
Examining /usr/lib/systemd/system/docker.service shows we are running with MountFlags=slave

On the node where the build failed:
I do not see a docker container with the same ID as the container it failed to remove "80a380004979dd6536e6b76fe3460d0391090d96418d022b763b3744d54a7c23" or "80a38000497" in the errored build log (attached).

Attempt at finding the offending process using the technique described in https://bugzilla.redhat.com/show_bug.cgi?id=1391665#c7


# find /proc/*/mounts | xargs grep 80a38000
grep: /proc/130827/mounts: No such file or directory

also did a docker ps -a | grep <project_name> and tried the find command above 
on the two exited containers:
# docker ps -a | grep walid
783c97d7aabf        registry.access.redhat.com/rhscl/mysql-56-rhel7@sha256:0d32a738023a7e76e5df41a69e4c77cae80ed60676f7b5455dc70364896cc32b                            "container-entrypoint"   About an hour ago   Up About an hour                                   k8s_mysql.486ea395_database-1-h939j_walid_15a55b28-bd8d-11e6-a6b0-02b95abd7a23_168846b8
0cdc1c55c3f5        registry.ops.openshift.com/openshift3/ose-pod:v3.4.0.32                                                                                            "/pod"                   About an hour ago   Up About an hour                                   k8s_POD.f7ee6ba_database-1-h939j_walid_15a55b28-bd8d-11e6-a6b0-02b95abd7a23_356b4f22
5e8ac33d0b6b        registry.ops.openshift.com/openshift3/ose-sti-builder:v3.4.0.32                                                                                    "/usr/bin/openshift-s"   About an hour ago   Exited (1) 54 minutes ago                          k8s_sti-build.6cb594e0_dancer-mysql-example-1-build_walid_fd35c41b-bd8c-11e6-a6b0-02b95abd7a23_6a76125d
081230e27790        registry.ops.openshift.com/openshift3/ose-pod:v3.4.0.32                                                                                            "/pod"                   About an hour ago   Exited (0) 54 minutes ago                          k8s_POD.f7ee6ba_dancer-mysql-example-1-build_walid_fd35c41b-bd8c-11e6-a6b0-02b95abd7a23_4cbc6d38

# find /proc/*/mounts | xargs grep  "08123"
grep: /proc/128523/mounts: No such file or directory

# find /proc/*/mounts | xargs grep 081230e27790
grep: /proc/129424/mounts: No such file or directory

# find /proc/*/mounts | xargs grep 5e8ac33d0b6b
grep: /proc/130174/mounts: No such file or directory

docker-current PID:  9434

# ls -l /proc/9434/ns/mnt
lrwxrwxrwx. 1 root root 0 Dec  8 16:43 /proc/9434/ns/mnt -> mnt:[4026532301]

# ls -l /proc/self/ns/mnt
lrwxrwxrwx. 1 root root 0 Dec  8 17:18 /proc/self/ns/mnt -> mnt:[4026531840]

# ls -l /proc/$$/ns/mnt
lrwxrwxrwx. 1 root root 0 Dec  8 17:17 /proc/91851/ns/mnt -> mnt:[4026531840]



Please email me if you need access to this testbed.  If you send me your id_rsa.pub public key, I can add it to authorized_keys file on my nodes, so you can ssh to my nodes.  Access info in next comment.

Comment 7 Ben Parees 2016-12-09 19:31:53 UTC
the root cause of this bug appears to be this error:

error: build error: building walid/dancer-mysql-example-1:02773ab0 failed when committing the image due to error: Cannot connect to the Docker daemon. Is the docker daemon running on this host?

that is the first error that occurs, after that error occurs we attempt to remove the container, which results in this error:

error: Execution of post execute step failed
warning: Failed to remove container "80a380004979dd6536e6b76fe3460d0391090d96418d022b763b3744d54a7c23": Error response from daemon: Driver devicemapper failed to remove root filesystem 80a380004979dd6536e6b76fe3460d0391090d96418d022b763b3744d54a7c23: mount still active

The errors are printed in reverse order due to how our error handling logic is processed.

Note that even though we couldn't commit the container, I would still expect us to be able to remove the container, so both errors are a problem, but the main problem is the failure to commit the container.

Comment 11 Mike Fiedler 2016-12-16 02:21:21 UTC
Raised https://bugzilla.redhat.com/show_bug.cgi?id=1405272 for docker-storage-setup setting an unsupported option in /etc/sysconfig/docker-storage on RHEL 7.3.1

Comment 12 Vivek Goyal 2016-12-16 13:16:19 UTC
Mike, upstream docker-storage-setup has already been fixed to determine dynamically if underlying kernel supports deferred_deletion or not and set the option accordingly.

We just need to make sure that both docker-1.12 and docker-1.10 builds have
latest docker-storage-setup.

Lokesh?

Comment 13 Daniel Walsh 2016-12-16 13:59:52 UTC
Vivek do you want it with the Overlay Patch that got merged yesterday for RHEL7.3.2?

Comment 14 Vivek Goyal 2016-12-16 14:07:44 UTC
Dan, I am merging shishir's changes for docker root volume now.

We should not pull that one in yet. It is very new code.

I think we should pull in till following commit.

commit 516cb9c0bc14883f46ef2362a3f5abd4d6c20b1e
Author: Vivek Goyal <vgoyal>
Date:   Tue Nov 15 09:13:45 2016 -0500

    Let lvm create metadata volume automatically

Comment 19 Troy Dawson 2017-02-06 19:30:03 UTC
Should be fixed in all versions of OCP 3.5.  For testing purposes, OCP v3.5.0.17 or newer, and docker 1.10.3 or newer.

Comment 21 Wang Haoran 2017-02-08 04:47:25 UTC
It works fine on OCP 3.5: openshift v3.5.0.17+c55cf2b , move to verified.


Note You need to log in before you can comment on or make changes to this bug.