Bug 1252421

Summary: Image pool becomes corrupt after parallel docker pull with common ancestor
Product: Red Hat Enterprise Linux 7 Reporter: Josep 'Pep' Turro Mauri <pep>
Component: dockerAssignee: Vivek Goyal <vgoyal>
Status: CLOSED ERRATA QA Contact: Luwen Su <lsu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dwalsh, lsm5, sdodson, sghosh, spinolacastro
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: docker-1.7.1-115.el7_1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-15 12:51:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josep 'Pep' Turro Mauri 2015-08-11 11:20:06 UTC
Description of problem:

We've hit a situation a couple of times where image pulls failed with this error:

FATA[0014] Error pulling image (latest) from docker.io/openshift/base-centos7, Driver devicemapper failed to create image rootfs 7849a3d3f7cf06828ddfa79fca10af2a3656359584c3751c46ee502ab445b1e2: Unknown device 7322fbe74aa5632b33a400959867c8ac4290e9c5112877a7754be70cfe5d66e9

Version-Release number of selected component (if applicable):
docker-1.6.2-14.el7.x86_64

How reproducible:
Unknown / only observed a couple of times.

Steps to Reproduce:

Unforunately unclear, but this has been reported in the context of OpenShift v3 after having launched two concurrent builds that rely on images that have the the centos7 image as an ancestor.

Actual results:

FATA[0014] Error pulling image (latest) from docker.io/openshift/base-centos7, Driver devicemapper failed to create image rootfs 7849a3d3f7cf06828ddfa79fca10af2a3656359584c3751c46ee502ab445b1e2: Unknown device 7322fbe74aa5632b33a400959867c8ac4290e9c5112877a7754be70cfe5d66e9

Expected results:

Image pulls succeed

Additional info:

According to initial analysis by the consultants running this environment, these upstream PRs seem to be related to the problem:

  https://github.com/docker/docker/pull/14193
  https://github.com/docker/docker/pull/15414

Comment 2 Daniel Walsh 2015-08-21 04:19:43 UTC
This just showed up at docker.

https://github.com/docker/docker/pull/15728

Comment 3 Diego Castro 2015-08-22 13:37:06 UTC
I can confirm https://github.com/docker/docker/pull/15728 solves the issue on docker-1.7.1.

As the fix was made for latest docker, i've tried to backport and seems to work very well, here's the patch:

diff --git a/graph/graph.go b/graph/graph.go
index f305622..28ab1f4 100644
--- a/graph/graph.go
+++ b/graph/graph.go
@@ -154,6 +154,10 @@ func (graph *Graph) Register(img *image.Image, layerData archive.ArchiveReader)
        // this doesn't mean Register is fully safe yet.
        graph.imageMutex.Lock(img.ID)
        defer graph.imageMutex.Unlock(img.ID)
+
+        if graph.Exists(img.ID) {
+               return nil
+       }

        defer func() {
                // If any error occurs, remove the new dir from the driver.
@@ -164,11 +168,6 @@ func (graph *Graph) Register(img *image.Image, layerData archive.ArchiveReader)
                }
        }()

-       // (This is a convenience to save time. Race conditions are taken care of by os.Rename)
-       if graph.Exists(img.ID) {
-               return fmt.Errorf("Image %s already exists", img.ID)
-       }
-
        // Ensure that the image root does not exist on the filesystem
        // when it is not registered in the graph.
        // This is common when you switch from one graph driver to another


After rebuild i couldn't reproduce the panic running the following script:

for image in openshift/nodejs-010-centos7 openshift/base-centos7 openshift/origin-sti-builder openshift/ruby-20-centos7 openshift/mysql-55-centos7 openshift/python-33-centos7 mesosphere/chronos:chronos-2.3.4-1.0.81.ubuntu1404-mesos-0.22.1-1.0.ubuntu1404 mesosphere/mesos-master:0.22.1-1.0.ubuntu1404 mesosphere/mesos-slave:0.22.1-1.0.ubuntu1404 mesosphere/marathon:v0.8.2-RC4 ; do     docker pull $image & done

Comment 4 Daniel Walsh 2015-08-23 12:04:25 UTC
Please update the pull request to give positive feedback.

Comment 5 Luwen Su 2015-08-27 16:13:52 UTC
With the comment3 steps, reproduced in docker-1.7.1-114.el7.x86_64, the docker daemon crashed
and verified in docker-1.7.1-115.el7.x86_64

Comment 7 errata-xmlrpc 2015-09-15 12:51:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1782.html