1375390 – Failed to push image. Response from registry is: open /dev/mapper/docker... no such file

Bug 1375390 - Failed to push image. Response from registry is: open /dev/mapper/docker... no such file

Summary: Failed to push image. Response from registry is: open /dev/mapper/docker... n...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	3.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Vivek Goyal
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OSOPS_V3
TreeView+	depends on / blocked

Reported:	2016-09-13 01:04 UTC by Stefanie Forrester
Modified:	2016-09-20 14:55 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-20 14:55:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Stefanie Forrester 2016-09-13 01:04:19 UTC

Description of problem:

On one cluster, 25-50% of builds are failing with this build error:

build error: Failed to push image. Response from registry is: open /dev/mapper/docker-253:4-33595986-5b6aba0f60e86a8734b53c09d19763302917956ed4cbc49ded701c722e38ad5b: no such file or directory

We have a docker registry running on a single infra node. The docker service has been restarted, but the issue persists.

Version-Release number of selected component (if applicable):
oc v3.2.1.15-8-gc402626
kubernetes v1.2.0-36-g4a3f9c5

atomic-openshift-3.2.1.15-1.git.8.c402626.el7.x86_64
docker-1.9.1-40.el7.x86_64

How reproducible:
25-50% of the time so far

Steps to Reproduce:
1. Create a new project and new app of any type.
2. Watch the build logs for the error.
3.

Actual results:
Sometimes the build fails with the error.

Expected results:
Build should succeed every time.

Additional info:

Comment 1 Vivek Goyal 2016-09-13 14:36:12 UTC

can you paste "docker info" output as well.

Comment 2 Vivek Goyal 2016-09-13 15:11:31 UTC

I suspect following fix might help.

https://github.com/projectatomic/docker/pull/188

But this fix will is available only in docker-1.10. Can you please try on top of docker-1.10 and see if problem still happens.

Comment 3 Vivek Goyal 2016-09-13 15:12:33 UTC

Alternatively, on docker-1.9, try disabling deferred removal of device feature and see if that works.

You will have to remove "--storage-opt dm.use_deferred_removal=true" from /etc/sysconfig/docker-storage and restart docker.

Comment 4 Sten Turpin 2016-09-13 16:06:00 UTC

Containers: 17
Images: 265
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker_vg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 3.221 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 13.01 GB
 Data Space Total: 212.6 GB
 Data Space Available: 199.6 GB
 Metadata Space Used: 5.825 MB
 Metadata Space Total: 218.1 MB
 Metadata Space Available: 212.3 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
CPUs: 4
Total Memory: 15.26 GiB
Name: ip-172-31-10-24.ec2.internal
ID: MO2H:GBUK:PRXK:7JAZ:S6K5:6CX3:E56B:O3NL:V5DN:H2LJ:FFS5:6JWM
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Comment 5 Sten Turpin 2016-09-16 16:16:28 UTC

We have our registry pods locked to two specific nodes; on each of those nodes, I re-initialized Docker storage with deferred deletion and deferred removal disabled. The exact same error (down to the UID in the DM device) persisted. The error also persisted after I re-enabled deferred removal/deletion, rebooted the hosts, and re-initialized storage again.

Comment 6 Sten Turpin 2016-09-16 16:19:25 UTC

Further info: 

The STI build we're deploying is done from a script every half-hour as an end-to-end test. The failure rate is roughly 90%

We're doing the STI build from this repo: https://github.com/openshift/nodejs-ex

Comment 7 Vivek Goyal 2016-09-16 17:17:01 UTC

Is it possible to enable debug in docker daemon (-D flag), and restart docker daemon. And once problem happens again, please collect journal logs and attach to the bug. I would like to have a look at the logs and see if I can spot something.

Comment 8 Vivek Goyal 2016-09-16 17:26:33 UTC

Does anybody know what's this id "5b6aba0f60e86a8734b53c09d19763302917956ed4cbc49ded701c722e38ad5b". Is it an image id? If yes, could it be that it is some race with image deletion. Some other component in the system tried deleting this image while we are trying to push this image.

Comment 9 Vivek Goyal 2016-09-16 17:38:30 UTC

Error message also says that "Response from registry is". So is this an error messsage from registry? Should registry developers have a look.

Or somebody who knows openshift side better, can they break it down little bit in terms of docker commands so that I can begin to understand the workflow.

Comment 12 Sten Turpin 2016-09-16 21:23:41 UTC

Despite the error pretty specifically pointing at the Docker registry, there was a bad image on one of our compute nodes, which are separated from the registry. Once Vivek pointed me to the right node, I was able to wipe docker storage on that node and get STI builds back to normal.

Note You need to log in before you can comment on or make changes to this bug.