This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1252421 - Image pool becomes corrupt after parallel docker pull with common ancestor
Image pool becomes corrupt after parallel docker pull with common ancestor
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker (Show other bugs)
7.1
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Vivek Goyal
Luwen Su
: Extras
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-11 07:20 EDT by Josep 'Pep' Turro Mauri
Modified: 2015-09-15 08:51 EDT (History)
5 users (show)

See Also:
Fixed In Version: docker-1.7.1-115.el7_1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-15 08:51:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Josep 'Pep' Turro Mauri 2015-08-11 07:20:06 EDT
Description of problem:

We've hit a situation a couple of times where image pulls failed with this error:

FATA[0014] Error pulling image (latest) from docker.io/openshift/base-centos7, Driver devicemapper failed to create image rootfs 7849a3d3f7cf06828ddfa79fca10af2a3656359584c3751c46ee502ab445b1e2: Unknown device 7322fbe74aa5632b33a400959867c8ac4290e9c5112877a7754be70cfe5d66e9

Version-Release number of selected component (if applicable):
docker-1.6.2-14.el7.x86_64

How reproducible:
Unknown / only observed a couple of times.

Steps to Reproduce:

Unforunately unclear, but this has been reported in the context of OpenShift v3 after having launched two concurrent builds that rely on images that have the the centos7 image as an ancestor.

Actual results:

FATA[0014] Error pulling image (latest) from docker.io/openshift/base-centos7, Driver devicemapper failed to create image rootfs 7849a3d3f7cf06828ddfa79fca10af2a3656359584c3751c46ee502ab445b1e2: Unknown device 7322fbe74aa5632b33a400959867c8ac4290e9c5112877a7754be70cfe5d66e9

Expected results:

Image pulls succeed

Additional info:

According to initial analysis by the consultants running this environment, these upstream PRs seem to be related to the problem:

  https://github.com/docker/docker/pull/14193
  https://github.com/docker/docker/pull/15414
Comment 2 Daniel Walsh 2015-08-21 00:19:43 EDT
This just showed up at docker.

https://github.com/docker/docker/pull/15728
Comment 3 Diego Castro 2015-08-22 09:37:06 EDT
I can confirm https://github.com/docker/docker/pull/15728 solves the issue on docker-1.7.1.

As the fix was made for latest docker, i've tried to backport and seems to work very well, here's the patch:

diff --git a/graph/graph.go b/graph/graph.go
index f305622..28ab1f4 100644
--- a/graph/graph.go
+++ b/graph/graph.go
@@ -154,6 +154,10 @@ func (graph *Graph) Register(img *image.Image, layerData archive.ArchiveReader)
        // this doesn't mean Register is fully safe yet.
        graph.imageMutex.Lock(img.ID)
        defer graph.imageMutex.Unlock(img.ID)
+
+        if graph.Exists(img.ID) {
+               return nil
+       }

        defer func() {
                // If any error occurs, remove the new dir from the driver.
@@ -164,11 +168,6 @@ func (graph *Graph) Register(img *image.Image, layerData archive.ArchiveReader)
                }
        }()

-       // (This is a convenience to save time. Race conditions are taken care of by os.Rename)
-       if graph.Exists(img.ID) {
-               return fmt.Errorf("Image %s already exists", img.ID)
-       }
-
        // Ensure that the image root does not exist on the filesystem
        // when it is not registered in the graph.
        // This is common when you switch from one graph driver to another


After rebuild i couldn't reproduce the panic running the following script:

for image in openshift/nodejs-010-centos7 openshift/base-centos7 openshift/origin-sti-builder openshift/ruby-20-centos7 openshift/mysql-55-centos7 openshift/python-33-centos7 mesosphere/chronos:chronos-2.3.4-1.0.81.ubuntu1404-mesos-0.22.1-1.0.ubuntu1404 mesosphere/mesos-master:0.22.1-1.0.ubuntu1404 mesosphere/mesos-slave:0.22.1-1.0.ubuntu1404 mesosphere/marathon:v0.8.2-RC4 ; do     docker pull $image & done
Comment 4 Daniel Walsh 2015-08-23 08:04:25 EDT
Please update the pull request to give positive feedback.
Comment 5 Luwen Su 2015-08-27 12:13:52 EDT
With the comment3 steps, reproduced in docker-1.7.1-114.el7.x86_64, the docker daemon crashed
and verified in docker-1.7.1-115.el7.x86_64
Comment 7 errata-xmlrpc 2015-09-15 08:51:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1782.html

Note You need to log in before you can comment on or make changes to this bug.