Hide Forgot
Fix might be in the kernel, but I've filed this against docker. Set to urgent because it was flagged during POC of a high-profile internal project. Description of problem: We're hitting an issue in the OpenShift Build Service (OSBS) where we are trying to parallelize docker builds. The reproducer is: time for i in `seq 6` ; do docker build -t build.$i - < Dockerfile.$i & done I then have 6 different Dockerfiles. They are all the identical, except for line #2, which is different in each file, so that all subsequent layers are forced to be created. # cat Dockerfile.1 FROM centos MAINTAINER jerms1 WORKDIR /tmp RUN mkdir a WORKDIR /tmp/a RUN mkdir b WORKDIR /tmp/a/b RUN mkdir c WORKDIR /tmp/a/b/c RUN mkdir d WORKDIR /tmp/a/b/c/d RUN mkdir e WORKDIR /tmp/a/b/c/d/e RUN mkdir f WORKDIR /tmp/a/b/c/d/e/f RUN mkdir g WORKDIR /tmp/a/b/c/d/e/f/g The builds seem to ultimately succeed. However, there are errors that indicate that creating/destroying devices used for each layer during a build is racy. Below is the error seen during the reproducer run (occurs a few times): Error removing intermediate container f15e19e53975: rmdriverfs: Driver devicemapper failed to remove root filesystem f15e19e53975ad561ae90f239e90ba360659c5b93c3d59375d53f02af6da6c29: remove /var/lib/docker/devicemapper/mnt/f15e19e53975ad561ae90f239e90ba360659c5b93c3d59375d53f02af6da6c29: device or resource busy Then, on a different VM, I ran the same test against and overlayfs backend. I found the same errors occur: Error removing intermediate container d372063e75a4: rmdriverfs: Driver overlay failed to remove root filesystem d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60: remove /overlay/overlay/d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60/merged: device or resource busy Once one of those "device or resource busy" errors occurs, there will one of these for every layer that follows: Error removing intermediate container d372063e75a4: nosuchcontainer: no such id: d372063e75a4b3117f3bbe73d2b21b3ac89585b9f92a2b1eaa7251835ccb3c60 dmesg shows these sorts of things for devicemapper. Nothing storage-related in the overlay system's dmesg. [7245907.459170] XFS (dm-14): Starting recovery (logdev: internal) [7245907.706321] device-mapper: ioctl: remove_all left 13 open device(s) [7245907.710303] device-mapper: ioctl: remove_all left 13 open device(s) [7245907.757289] device-mapper: ioctl: remove_all left 11 open device(s) [7245912.139500] device-mapper: thin: Deletion of thin device 2062 failed. [7245912.266223] device-mapper: ioctl: device doesn't appear to be in the dev hash table. [7245912.300270] device-mapper: ioctl: remove_all left 14 open device(s) [7245912.357272] device-mapper: ioctl: remove_all left 12 open device(s) [7245912.440437] device-mapper: ioctl: remove_all left 11 open device(s) [7245915.559198] device-mapper: ioctl: remove_all left 12 open device(s) [7245930.683331] device-mapper: thin: Deletion of thin device 2090 failed. [7245930.821067] device-mapper: ioctl: remove_all left 10 open device(s) [7245930.980969] device-mapper: ioctl: remove_all left 10 open device(s) [7245934.912930] device-mapper: ioctl: remove_all left 14 open device(s) [7245934.964902] device-mapper: ioctl: remove_all left 13 open device(s) [7245937.224855] device-mapper: ioctl: remove_all left 12 open device(s) [7245964.246464] device-mapper: ioctl: remove_all left 13 open device(s) [7245966.736445] device-mapper: ioctl: remove_all left 12 open device(s) [7245966.789387] device-mapper: ioctl: remove_all left 11 open device(s) [7245967.229413] device-mapper: ioctl: remove_all left 13 open device(s) [7245967.307492] device-mapper: ioctl: remove_all left 12 open device(s) [7245967.413374] device-mapper: ioctl: remove_all left 12 open device(s) [7245967.427642] XFS (dm-6): Starting recovery (logdev: internal) [7245968.847111] device-mapper: thin: Deletion of thin device 2142 failed. [7245968.959391] device-mapper: ioctl: remove_all left 13 open device(s) [7245996.423982] device-mapper: ioctl: remove_all left 13 open device(s) [7245996.497954] device-mapper: ioctl: remove_all left 12 open device(s) [7246002.589917] device-mapper: thin: Deletion of thin device 2187 failed. [7246002.624822] device-mapper: ioctl: remove_all left 11 open device(s) [7246362.033678] XFS (dm-11): Starting recovery (logdev: internal) [7246362.099214] device-mapper: ioctl: remove_all left 13 open device(s) [7246363.341250] device-mapper: ioctl: remove_all left 12 open device(s) [7246387.827227] device-mapper: thin: Deletion of thin device 2280 failed. [7246388.097297] device-mapper: ioctl: remove_all left 13 open device(s) [7246389.965749] device-mapper: ioctl: remove_all left 14 open device(s) [7246390.030664] device-mapper: ioctl: remove_all left 12 open device(s) [7246415.441706] device-mapper: thin: Deletion of thin device 2320 failed. [7246415.606304] device-mapper: ioctl: remove_all left 13 open device(s) [7246417.995282] device-mapper: ioctl: remove_all left 13 open device(s) [7246419.599313] device-mapper: ioctl: remove_all left 15 open device(s) [7246421.217255] device-mapper: ioctl: remove_all left 15 open device(s) [7246421.321213] XFS (dm-15): Starting recovery (logdev: internal) [7246440.580960] device-mapper: ioctl: remove_all left 13 open device(s) [7246440.664922] device-mapper: ioctl: remove_all left 12 open device(s) [7246441.659382] XFS (dm-12): Starting recovery (logdev: internal) [7246441.693872] device-mapper: ioctl: remove_all left 12 open device(s) [7246442.924899] device-mapper: ioctl: remove_all left 14 open device(s) [7246446.105837] device-mapper: ioctl: remove_all left 14 open device(s) [7246465.787163] device-mapper: thin: Deletion of thin device 2400 failed. [7246465.879088] device-mapper: thin: Deletion of thin device 2390 failed. [7246466.143558] device-mapper: ioctl: remove_all left 14 open device(s) [7246466.264591] device-mapper: ioctl: remove_all left 12 open device(s) [7246467.272479] device-mapper: ioctl: remove_all left 12 open device(s) [7246469.248941] device-mapper: thin: Deletion of thin device 2403 failed. [7246469.360440] XFS (dm-14): Starting recovery (logdev: internal) [7246469.669512] device-mapper: ioctl: remove_all left 14 open device(s) [7246469.745539] device-mapper: ioctl: remove_all left 14 open device(s) [7246469.825480] device-mapper: ioctl: remove_all left 13 open device(s) [7246490.185120] device-mapper: ioctl: remove_all left 13 open device(s) [7246493.213336] device-mapper: thin: Deletion of thin device 2430 failed. [7246493.528083] device-mapper: ioctl: remove_all left 13 open device(s) [7246497.030047] device-mapper: ioctl: remove_all left 13 open device(s) [7246514.952196] device-mapper: ioctl: remove_all left 11 open device(s) -------------- # rpm -q docker docker-1.9.1-15.el7.x86_64 # ps aux|grep "docker daemon" root 28136 0.1 0.0 1308144 64056 ? Ssl Apr11 21:00 /usr/bin/docker daemon -D --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --add-registry registry.access.redhat.com --add-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --insecure-registry docker-registry.usersys.redhat.com --insecure-registry registry.qe.openshift.com --insecure-registry atomic-registry.usersys.redhat.com:5000
I just confirmed this still occurs in docker-1.10: # rpm -q docker-latest docker-latest-1.10.3-11.el7.x86_64
Jermey, so basically build does not fail. You are more concerned that what these messages mean and are they harmful, right? At max these might be leaving some resources behind (if that's happening). After build is successful, do you see leftover images/containers which should not be there? Also can you do "dmsetup table" and see what devices are left behind on the system. I am assuming you are using deferred device removal and deferred device deletion. I have looked at "device-mapper: ioctl: remove_all left 14 open device(s)" message in the past. Its not an error. So don't worry about it. First one seems to be that we try to remove a thin device and it is still busy. Not sure who is using it. But deferred device removal should make sure as soon as user of device goes away, device will be removed. "device-mapper: thin: Deletion of thin device 2142 failed." seems to be new. I will look into it. If build succeeds and at max if we have issue of some resource leak, that does not sound like an "urgent" issue.
OSBS has a different clean-up process, but sure, I'll try to catch this in the act. Agree on the urgency, will lower it.
Jeremy, what's the kernel version you are using.
3.10.0-327.13.1.el7.x86_64
Jeremy, can you attach the journal logs. I want to go through it and see if something interesting is there.
Jeremy, we never went any further with this. Is this still something we should pursue?