Bug 1417300 - Error removing intermediate container XXXX: rmdriverfs: Driver devicemapper failed to remove root filesystem XXXXX: Device is Busy
Summary: Error removing intermediate container XXXX: rmdriverfs: Driver devicemapper f...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Vivek Goyal
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-27 21:08 UTC by Luiz Carvalho
Modified: 2017-09-07 13:14 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-07 13:14:23 UTC
Target Upstream Version:


Attachments (Terms of Use)
Host info (20.87 KB, text/plain)
2017-01-27 21:10 UTC, Luiz Carvalho
no flags Details
systemd docker init config (911 bytes, text/plain)
2017-01-30 16:41 UTC, Luiz Carvalho
no flags Details

Description Luiz Carvalho 2017-01-27 21:08:55 UTC
Description of problem:
During Docker build the following error message appears between build steps:

Error removing intermediate container fbf2817d0e13: rmdriverfs: Driver devicemapper failed to remove root filesystem fbf2817d0e131811bc5bf0bb02b7d125597b9c175e8baa4f5e6fa0a5e5eaa047: Device is Busy


Version-Release number of selected component (if applicable): OpenShfit v3.3.1.7, docker 1.10.3

How reproducible: Often


Steps to Reproduce:
1. Build multiple docker image simultaneously on same OpenShift node

Actual results:
Step 20 : ADD Dockerfile-lucarval-rsyslog-7.2.lucarval-11 /root/buildinfo/Dockerfile-lucarval-rsyslog-7.2.lucarval-11
---> 21b00269e1df                                                               
Removing intermediate container bb6dd340c1ce                                    
Step 21 : LABEL "authoritative-source-url" "registry.access.redhat.com" "distribution-scope" "private" "vendor" "Red Hat, Inc." "Name" "lucarval/rsyslog" "Version" "7.2.lucarval" "Release" "11" "BZComponent" "rsyslog-docker" "build-date" "2017-01-26T14:23:51.644349" "vcs-ref" "b8120b486367ec33fbbfa408542eec7eded8b54e"
---> Running in fbf2817d0e13                                                    
---> bc1df297f175                                                               
Error removing intermediate container fbf2817d0e13: rmdriverfs: Driver devicemapper failed to remove root filesystem fbf2817d0e131811bc5bf0bb02b7d125597b9c175e8baa4f5e6fa0a5e5eaa047: Device is Busy
Step 22 : RUN rm -f '/etc/yum.repos.d/extras-rhel-7.2-docker-candidate.repo'    
---> Running in 2ee4b0fe06f0                                                    
---> adba727c6b61                                                               
Removing intermediate container 2ee4b0fe06f0                                    
Error removing intermediate container fbf2817d0e13: nosuchcontainer: No such container: fbf2817d0e131811bc5bf0bb02b7d125597b9c175e8baa4f5e6fa0a5e5eaa047
Successfully built adba727c6b61                                                 


Expected results:

Expect to not see "Error removing intermediate container" messages.


Additional info:

This appears to only occur when multiple builds are executed simultaneously on the same OpenShift node.
We noticed this happening in one of our clusters a couple of months ago but it seemed to be harmless.

However, on Jan 25th, the frequency of these errors increased and actually caused build failure.
In such case, the built image could not be pushed to a docker registry with the following error:
"plugin rhel-push-plugin failed with error: AuthZPlugin.AuthZReq: Error response from daemon: layer does not exist"
At that moment we cleared the docker storage (rm -rf /var/lib/docker and lvremove) and used docker-storage-setup to re-create it.
The issue came back as soon as we started running simultaneous builds.

Comment 2 Luiz Carvalho 2017-01-27 21:10:57 UTC
Created attachment 1245274 [details]
Host info

Comment 3 Vivek Goyal 2017-01-30 16:34:58 UTC
Few questions.

- How to reproduce it. Do you have an easy way to reproduce it?
- Are you running docker daemon with MountFlags=slave

Comment 4 Luiz Carvalho 2017-01-30 16:41:05 UTC
> - How to reproduce it. Do you have an easy way to reproduce it?
On this host, I can reproduce it simply by running multiple docker builds simultaneously.

> - Are you running docker daemon with MountFlags=slave
Yes, attached systemd init file.

Comment 5 Luiz Carvalho 2017-01-30 16:41:34 UTC
Created attachment 1245925 [details]
systemd docker init config

Comment 6 Vivek Goyal 2017-01-30 16:45:11 UTC
BTW, I see following.

Successfully built adba727c6b61

That means that despite errors, docker continued to build layer. If that's the case then these errors could be a red herring. And real problem could be that
some other tool actually deleted the image and that's why push failed.

Can you please verify that these errors actually lead to build failure and image was not generated?

Comment 7 smahajan@redhat.com 2017-01-30 16:47:05 UTC
Hi,

Can you provide the ssh details of the node, where this error is happening ?
So that I can login and take a look into this.

Shishir

Comment 8 Luiz Carvalho 2017-01-30 16:53:15 UTC
> Can you please verify that these errors actually lead to build failure and image was not generated?

Build is succeeding for now. However, we noticed these messages before which we had thought of being harmless. Eventually, their frequency increased and builds started to fail. Also, in some cases, docker build was successful but pushing an image to a registry failed:
APIError: 500 Server Error: Internal Server Error ("plugin rhel-push-plugin failed with error: AuthZPlugin.AuthZReq: Error response from daemon: layer does not exist")

As previously mentioned, we're not seeing these failures anymore after clearing the storage. My concern is that the error messages we're seeing now will eventually lead up to the aforementioned failures.

Comment 9 Luiz Carvalho 2017-01-30 16:57:51 UTC
> Can you provide the ssh details of the node, where this error is happening ?
This host is managed by sysops, and even I don't have direct access to it.
Is there some specific info that you'd like gathered?
Or is direct access the only option?

Comment 10 Vivek Goyal 2017-01-30 17:00:35 UTC
(In reply to Luiz Carvalho from comment #8)
> > Can you please verify that these errors actually lead to build failure and image was not generated?
> 
> Build is succeeding for now. However, we noticed these messages before which
> we had thought of being harmless. Eventually, their frequency increased and
> builds started to fail. Also, in some cases, docker build was successful but
> pushing an image to a registry failed:
> APIError: 500 Server Error: Internal Server Error ("plugin rhel-push-plugin
> failed with error: AuthZPlugin.AuthZReq: Error response from daemon: layer
> does not exist")
> 
> As previously mentioned, we're not seeing these failures anymore after
> clearing the storage. My concern is that the error messages we're seeing now
> will eventually lead up to the aforementioned failures.


Ok, if you are not experiencing these errors after resetting the storage, then this should not be a blocker bug. So please remove status of this bug.

Also if image was successfully built and layer was found missing at the time of push, then it most likely is a different issue and not related to device being busy.

Device is most likely busy because it probably leaked into some other mount namespace. And if it happens frequently, then it should be debugged as a separate issue.

But given device busy failure does not lead to build failure, that error does not seem blocker issue.

Comment 11 smahajan@redhat.com 2017-01-31 20:29:34 UTC
Hi,

Can you setup a local node (VM) where you can reproduce this issue ?
And then provide us access to that local node.

Having a machine (VM) with a reproducer would be a good starting point for me to start looking into the issue.

Shishir

Comment 13 Vivek Goyal 2017-07-05 18:41:40 UTC
Is this still an issue with latest docker? If not, I would like to close it.

Comment 14 Vivek Goyal 2017-09-07 12:15:06 UTC
Try latest docker (-55). Device removal failures were in part due to mount point leaking. Now we ship oci-umount which will help with leaking mount points and that might fix the issue. 

If this problem is reproducible with latest docker, then I can look at it otherwise I would like to close this bug.

Comment 15 Luiz Carvalho 2017-09-07 13:14:23 UTC
+1 for closing this ticket. Thanks!


Note You need to log in before you can comment on or make changes to this bug.