Bug 1329213 - errors while doing parallel docker builds
Summary: errors while doing parallel docker builds
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-21 12:29 UTC by Jeremy Eder
Modified: 2021-12-20 02:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-19 21:23:06 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jeremy Eder 2016-04-21 12:29:04 UTC
Fix might be in the kernel, but I've filed this against docker.  Set to urgent because it was flagged during POC of a high-profile internal project.

Description of problem:
We're hitting an issue in the OpenShift Build Service (OSBS) where we are trying to parallelize docker builds.

The reproducer is:
time for i in `seq 6` ; do docker build -t build.$i - < Dockerfile.$i & done

I then have 6 different Dockerfiles.  They are all the identical, except for line #2, which is different in each file, so that all subsequent layers are forced to be created.

# cat Dockerfile.1
FROM centos
MAINTAINER jerms1
WORKDIR /tmp
RUN mkdir a
WORKDIR /tmp/a
RUN mkdir b
WORKDIR /tmp/a/b
RUN mkdir c
WORKDIR /tmp/a/b/c
RUN mkdir d
WORKDIR /tmp/a/b/c/d
RUN mkdir e
WORKDIR /tmp/a/b/c/d/e
RUN mkdir f
WORKDIR /tmp/a/b/c/d/e/f
RUN mkdir g
WORKDIR /tmp/a/b/c/d/e/f/g 

The builds seem to ultimately succeed.  However, there are errors that indicate that creating/destroying devices used for each layer during a build is racy.  Below is the error seen during the reproducer run (occurs a few times):

Error removing intermediate container f15e19e53975: rmdriverfs: Driver devicemapper failed to remove root filesystem f15e19e53975ad561ae90f239e90ba360659c5b93c3d59375d53f02af6da6c29: remove 
/var/lib/docker/devicemapper/mnt/f15e19e53975ad561ae90f239e90ba360659c5b93c3d59375d53f02af6da6c29: device or resource busy

Then, on a different VM, I ran the same test against and overlayfs backend.  I found the same errors occur:

Error removing intermediate container d372063e75a4: rmdriverfs: Driver overlay failed to remove root filesystem d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60: remove /overlay/overlay/d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60/merged: device or resource busy

Once one of those "device or resource busy" errors occurs, there will one of these for every layer that follows:

Error removing intermediate container d372063e75a4: nosuchcontainer: no such id: d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60

dmesg shows these sorts of things for devicemapper.  Nothing storage-related in the overlay system's dmesg.

[7245907.459170] XFS (dm-14): Starting recovery (logdev: internal)
[7245907.706321] device-mapper: ioctl: remove_all left 13 open device(s)
[7245907.710303] device-mapper: ioctl: remove_all left 13 open device(s)
[7245907.757289] device-mapper: ioctl: remove_all left 11 open device(s)
[7245912.139500] device-mapper: thin: Deletion of thin device 2062 failed.
[7245912.266223] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
[7245912.300270] device-mapper: ioctl: remove_all left 14 open device(s)
[7245912.357272] device-mapper: ioctl: remove_all left 12 open device(s)
[7245912.440437] device-mapper: ioctl: remove_all left 11 open device(s)
[7245915.559198] device-mapper: ioctl: remove_all left 12 open device(s)
[7245930.683331] device-mapper: thin: Deletion of thin device 2090 failed.
[7245930.821067] device-mapper: ioctl: remove_all left 10 open device(s)
[7245930.980969] device-mapper: ioctl: remove_all left 10 open device(s)
[7245934.912930] device-mapper: ioctl: remove_all left 14 open device(s)
[7245934.964902] device-mapper: ioctl: remove_all left 13 open device(s)
[7245937.224855] device-mapper: ioctl: remove_all left 12 open device(s)
[7245964.246464] device-mapper: ioctl: remove_all left 13 open device(s)
[7245966.736445] device-mapper: ioctl: remove_all left 12 open device(s)
[7245966.789387] device-mapper: ioctl: remove_all left 11 open device(s)
[7245967.229413] device-mapper: ioctl: remove_all left 13 open device(s)
[7245967.307492] device-mapper: ioctl: remove_all left 12 open device(s)
[7245967.413374] device-mapper: ioctl: remove_all left 12 open device(s)
[7245967.427642] XFS (dm-6): Starting recovery (logdev: internal)
[7245968.847111] device-mapper: thin: Deletion of thin device 2142 failed.
[7245968.959391] device-mapper: ioctl: remove_all left 13 open device(s)
[7245996.423982] device-mapper: ioctl: remove_all left 13 open device(s)
[7245996.497954] device-mapper: ioctl: remove_all left 12 open device(s)
[7246002.589917] device-mapper: thin: Deletion of thin device 2187 failed.
[7246002.624822] device-mapper: ioctl: remove_all left 11 open device(s)
[7246362.033678] XFS (dm-11): Starting recovery (logdev: internal)
[7246362.099214] device-mapper: ioctl: remove_all left 13 open device(s)
[7246363.341250] device-mapper: ioctl: remove_all left 12 open device(s)
[7246387.827227] device-mapper: thin: Deletion of thin device 2280 failed.
[7246388.097297] device-mapper: ioctl: remove_all left 13 open device(s)
[7246389.965749] device-mapper: ioctl: remove_all left 14 open device(s)
[7246390.030664] device-mapper: ioctl: remove_all left 12 open device(s)
[7246415.441706] device-mapper: thin: Deletion of thin device 2320 failed.
[7246415.606304] device-mapper: ioctl: remove_all left 13 open device(s)
[7246417.995282] device-mapper: ioctl: remove_all left 13 open device(s)
[7246419.599313] device-mapper: ioctl: remove_all left 15 open device(s)
[7246421.217255] device-mapper: ioctl: remove_all left 15 open device(s)
[7246421.321213] XFS (dm-15): Starting recovery (logdev: internal)
[7246440.580960] device-mapper: ioctl: remove_all left 13 open device(s)
[7246440.664922] device-mapper: ioctl: remove_all left 12 open device(s)
[7246441.659382] XFS (dm-12): Starting recovery (logdev: internal)
[7246441.693872] device-mapper: ioctl: remove_all left 12 open device(s)
[7246442.924899] device-mapper: ioctl: remove_all left 14 open device(s)
[7246446.105837] device-mapper: ioctl: remove_all left 14 open device(s)
[7246465.787163] device-mapper: thin: Deletion of thin device 2400 failed.
[7246465.879088] device-mapper: thin: Deletion of thin device 2390 failed.
[7246466.143558] device-mapper: ioctl: remove_all left 14 open device(s)
[7246466.264591] device-mapper: ioctl: remove_all left 12 open device(s)
[7246467.272479] device-mapper: ioctl: remove_all left 12 open device(s)
[7246469.248941] device-mapper: thin: Deletion of thin device 2403 failed.
[7246469.360440] XFS (dm-14): Starting recovery (logdev: internal)
[7246469.669512] device-mapper: ioctl: remove_all left 14 open device(s)
[7246469.745539] device-mapper: ioctl: remove_all left 14 open device(s)
[7246469.825480] device-mapper: ioctl: remove_all left 13 open device(s)
[7246490.185120] device-mapper: ioctl: remove_all left 13 open device(s)
[7246493.213336] device-mapper: thin: Deletion of thin device 2430 failed.
[7246493.528083] device-mapper: ioctl: remove_all left 13 open device(s)
[7246497.030047] device-mapper: ioctl: remove_all left 13 open device(s)
[7246514.952196] device-mapper: ioctl: remove_all left 11 open device(s)


--------------

# rpm -q docker
docker-1.9.1-15.el7.x86_64

# ps aux|grep "docker daemon"                                                                                                                                   
root     28136  0.1  0.0 1308144 64056 ?       Ssl  Apr11  21:00 /usr/bin/docker daemon -D --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --add-registry registry.access.redhat.com --add-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --insecure-registry docker-registry.usersys.redhat.com --insecure-registry registry.qe.openshift.com --insecure-registry atomic-registry.usersys.redhat.com:5000

Comment 2 Jeremy Eder 2016-04-21 12:49:15 UTC
I just confirmed this still occurs in docker-1.10:

# rpm -q docker-latest
docker-latest-1.10.3-11.el7.x86_64

Comment 3 Vivek Goyal 2016-04-27 18:01:27 UTC
Jermey, so basically build does not fail. You are more concerned that what these messages mean and are they harmful, right? At max these might be leaving some resources behind (if that's happening). 

After build is successful, do you see leftover images/containers which should not be there? 

Also can you do "dmsetup table" and see what devices are left behind on the system.

I am assuming you are using deferred device removal and deferred device deletion.

I have looked at "device-mapper: ioctl: remove_all left 14 open device(s)" message in the past. Its not an error. So don't worry about it.

First one seems to be that we try to remove a thin device and it is still busy. Not sure who is using it. But deferred device removal should make sure as soon as user of device goes away, device will be removed.

"device-mapper: thin: Deletion of thin device 2142 failed." seems to be new. I will look into it.

If build succeeds and at max if we have issue of some resource leak, that does not sound like an "urgent" issue.

Comment 4 Jeremy Eder 2016-04-27 18:03:03 UTC
OSBS has a different clean-up process, but sure, I'll try to catch this in the act.  Agree on the urgency, will lower it.

Comment 5 Vivek Goyal 2016-04-27 18:04:08 UTC
Jeremy, what's the kernel version you are using.

Comment 6 Jeremy Eder 2016-04-27 18:05:58 UTC
3.10.0-327.13.1.el7.x86_64

Comment 7 Vivek Goyal 2016-04-27 18:28:23 UTC
Jeremy, can you attach the journal logs. I want to go through it and see if something interesting is there.

Comment 8 Daniel Walsh 2016-06-03 12:35:30 UTC
Jeremy, we never went any further with this.  Is this still something we should pursue?


Note You need to log in before you can comment on or make changes to this bug.