Bug 1277356 - Fail to push images on env with ha-registry
Fail to push images on env with ha-registry
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation (Show other bugs)
3.1.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Michal Minar
Vikram Goyal
Vikram Goyal
: TestBlocker
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2015-11-03 01:49 EST by Ma xiaoqiang
Modified: 2016-09-28 07:38 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-28 07:38:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log of the fst registry instance for a push with docker 1.7.1-115 (33.35 KB, text/plain)
2015-11-04 10:18 EST, Michal Minar
no flags Details
log of the snd registry instance for a push with docker 1.7.1-115 (20.26 KB, text/plain)
2015-11-04 10:33 EST, Michal Minar
no flags Details
log of the fst registry instance for a push with docker 1.8.2-7 (13.24 KB, text/plain)
2015-11-04 10:48 EST, Michal Minar
no flags Details
log of the snd registry instance for a push with docker 1.8.2-7 (37.99 KB, text/plain)
2015-11-04 10:49 EST, Michal Minar
no flags Details
single replica with NFS mount and additional debug statements (87.43 KB, text/plain)
2015-11-05 09:31 EST, Michal Minar
no flags Details
log of 1st replica from local setup with a failed push (62.09 KB, text/plain)
2015-11-05 14:28 EST, Michal Minar
no flags Details
log of 2nd replica from local setup with a failed push (23.29 KB, text/plain)
2015-11-05 14:28 EST, Michal Minar
no flags Details
server pcap (150.23 KB, application/octet-stream)
2015-11-10 17:34 EST, Andy Goldstein
no flags Details
client1 pcap (4.23 KB, application/octet-stream)
2015-11-10 17:34 EST, Andy Goldstein
no flags Details
client2 pcap (145.08 KB, application/octet-stream)
2015-11-10 17:35 EST, Andy Goldstein
no flags Details

  None (edit)
Description Ma xiaoqiang 2015-11-03 01:49:47 EST
Description of problem:
Fail to push images on env with ha-registry

Version-Release number of selected component (if applicable):
puddle[2015-11-02.1]

How reproducible:
Always


Steps to Reproduce:

1. Install env with ha-registry and shared nfs storage
#oc scale --replicas=2 rc/docker-registry-2
2. do a sti-build
#oc start-build nodejs-example
3. check the build logs
#oc build-logs nodejs-example-9


Actual results:
I1103 00:54:52.729793       1 sti.go:213] Using provided push secret for pushing 172.30.28.101:5000/xiama/nodejs-example:latest image
I1103 00:54:52.729816       1 sti.go:217] Pushing 172.30.28.101:5000/xiama/nodejs-example:latest image ...
I1103 00:54:54.030085       1 sti.go:222] Registry server Address: 
I1103 00:54:54.030115       1 sti.go:223] Registry server User Name: serviceaccount
I1103 00:54:54.030125       1 sti.go:224] Registry server Email: serviceaccount@example.org
I1103 00:54:54.030133       1 sti.go:229] Registry server Password: <<non-empty>>
F1103 00:54:54.030142       1 builder.go:59] Build error: Failed to push image. Response from registry is: digest invalid: provided digest did not match uploaded content

Fail to push the image to registry


Expected results:
Push images successfully


Additional info:
When QE set the replicas to 1, the sti-build can push the images successfully
Comment 1 Michal Minar 2015-11-03 10:44:37 EST
Have you setup the storage [1] for registry and made it available on all your nodes hosting registry?

[1] https://docs.openshift.com/enterprise/3.0/install_config/install/docker_registry.html#storage-for-the-registry

There are reports of similar behaviour with upstream distribution:
  https://github.com/docker/distribution/issues/1013
Long story short - may happen on less consistent distributed storage.
There's an attempt to address it:
  https://github.com/docker/distribution/pull/1141

I need to know details about your storage setup in order to debug it. But even so, it's rather an upstream (docker/distribution) issue.
Comment 4 Michal Minar 2015-11-04 09:52:56 EST
I wrongly assumed that the upstream patch apply to this case. But it's related only to swift.

Interesting observation: thu push works with older docker-1.7.1-115.el7.x86_64

I'll try upstream 1.8.2 Docker as well to see whether internal patches are at fault.
Comment 5 Michal Minar 2015-11-04 10:18 EST
Created attachment 1089731 [details]
log of the fst registry instance for a push with docker 1.7.1-115

log of registry instance #1 when following command is executed with docker-1.7.1-115.el7.x86_64:
docker -D push 172.30.177.244:5000/joe/hello-world 
The push refers to a repository [172.30.177.244:5000/joe/hello-world] (len: 1)
af340544ed62: Image already exists 
535020c3e8ad: Image successfully pushed 
Digest: sha256:729c7f8b8ee41e952083865694b85bc9b38830d48b98b1f92ce7cf3b658a8aba
Comment 6 Michal Minar 2015-11-04 10:33 EST
Created attachment 1089735 [details]
log of the snd registry instance for a push with docker 1.7.1-115

log of registry instance #2 when following command is executed with docker-1.7.1-115.el7.x86_64:
docker -D push 172.30.177.244:5000/joe/hello-world 
The push refers to a repository [172.30.177.244:5000/joe/hello-world] (len: 1)
af340544ed62: Image already exists 
535020c3e8ad: Image successfully pushed 
Digest: sha256:729c7f8b8ee41e952083865694b85bc9b38830d48b98b1f92ce7cf3b658a8aba
Comment 7 Michal Minar 2015-11-04 10:48 EST
Created attachment 1089737 [details]
log of the fst registry instance for a push with docker 1.8.2-7

log of registry instance #1 when following command is executed with docker-1.8.2-7.el7.x86_64:

docker push 172.30.177.244:5000/joe/hello-world-from-node2
The push refers to a repository [172.30.177.244:5000/joe/hello-world-from-node2] (len: 1)
975b84d108f1: Pushing 1.024 kB
digest invalid: provided digest did not match uploaded content
Comment 8 Michal Minar 2015-11-04 10:49 EST
Created attachment 1089738 [details]
log of the snd registry instance for a push with docker 1.8.2-7

log of registry instance #2 when following command is executed with docker-1.8.2-7.el7.x86_64:

docker push 172.30.177.244:5000/joe/hello-world-from-node2
The push refers to a repository [172.30.177.244:5000/joe/hello-world-from-node2] (len: 1)
975b84d108f1: Pushing 1.024 kB
digest invalid: provided digest did not match uploaded content
Comment 9 Michal Minar 2015-11-04 11:40:01 EST
Upstream docker 1.8.2 results in the same error.
Comment 10 Andy Goldstein 2015-11-04 13:09:30 EST
This is weird... here are some snippets of the errors when pushing with 1.8.2:

level=error msg="canonical digest does match provided digest" canonical=sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  http.request.uri="[SNIP]&digest=sha256%3A5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"

level=error msg="An error occured" err.code=DIGEST_INVALID err.detail="invalid digest for referenced layer: sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef, content does not match digest" err.message="provided digest did not match uploaded content"

It sure seems to me like Docker is telling the registry "the digest is sha256%3A5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef" (%3A = ":"), and the registry is computing it as "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef", which looks identical to me. I wonder if there's some URL query parsing bug somewhere?
Comment 11 Andy Goldstein 2015-11-04 13:12:49 EST
If you run 1 replica, don't use NFS, with Docker 1.8.2, does the push succeed?
Comment 12 Michal Minar 2015-11-04 14:23:15 EST
I coulnd't reproduce it with one replica.
Neither with 2 replicas running on the same node sharing a host directory.
Now it looks more nfs-related.
Comment 13 Michal Minar 2015-11-04 14:48:12 EST
Reproduce-able even with docker-distribution 2.1.0 (with OSO patches).
Comment 14 Johnny Liu 2015-11-04 22:56:56 EST
Here I summarize our testing result as the following:
2 replicas on different nodes + NFS: FAIL
1 replica + NFS: FAIL
1 replica + host dir: PASS
2 replicas on the same node + host dir: PASS
Comment 15 Johnny Liu 2015-11-05 00:21:34 EST
Now this bug is blocking testing, so raise its priority.
Comment 16 Michal Minar 2015-11-05 03:17:23 EST
Just verified that with 1 replica + NFS it fails, but not always. And only with larger layers (e.g. with registry.access.redhat.com/openshift3/mongodb-24-rhel7 image). I wasn't able to reproduce in local environment though.
Comment 17 Michal Minar 2015-11-05 07:53:56 EST
The "%3A" artifact in a digest isn't an issue. Error reporting has two bugs. The first is returning given digest (the one comming with a request) as a canonical [1]. The second is a formating of the very same digest in error message which results in the artifact.

In fact, the two digests being compared really differ in hex strings:

    level=debug msg="(*layerWriter).validateLayer: checking cannonical_digest == expected_digest (sha256:5e7d48da6780e1bf2b20f0b5d1ca35037d2ec33ced2f7d0f1cab4dc9cd4a6497 == sha256:6e0917773a3ba1fa40c8fdd50fe9d38e9fedfdd96a444637d6bf7f090b34ca71)"
Comment 18 Michal Minar 2015-11-05 09:31 EST
Created attachment 1090104 [details]
single replica with NFS mount and additional debug statements

Now it seems more like a timing issue. After adding debug statements, I can reproduce even with hostPath:

docker push 172.30.87.121:5000/jialiu/mongodb-24-rhel7
The push refers to a repository [172.30.87.121:5000/jialiu/mongodb-24-rhel7] (len: 1)
d17602c1d664: Pushing [==================================================>] 234.4 MB
digest invalid: provided digest did not match uploaded content
Comment 19 Michal Minar 2015-11-05 10:31:32 EST
Comparing 2 layer blobs from successful and unsuccessful push. Both belong to the first layer of an image:
  registry.access.redhat.com/openshift3/mongodb-24-rhel7
With a size of 234434560 bytes. They differ in 4 blocks, each 49 bytes long.
They blocks start at offsets:

 1.  15076559  0xe60ccf
 2.  76979915 0x4969ecb
 3.  83012204 0x4f2aa6c
 4. 161540983 0x9a0eb77
Comment 20 Michal Minar 2015-11-05 12:31:42 EST
I'd just like to mention that there's something wrong with network on those testing machines. I just got this when copying data from openshift-124 to openshift-114:

scp -r root@192.168.0.89:/var/tmp/docker-registry/docker/registry/v2/blobs/sha256/6e/6e0917773a3ba1fa40c8fdd50fe9d38e9fedfdd96a444637d6bf7f090b34ca71/data .
Warning: Permanently added '192.168.0.89' (ECDSA) to the list of known hosts.
data                                                                                                                                                                                                         87%  195MB  70.3MB/s   00:00 ETACorrupted MAC on input.
Disconnecting: Packet corrupt
lost connection
Comment 21 Michal Minar 2015-11-05 14:20:24 EST
I switched to my local environment because I believe the QE test environment is broken. Locally I could reproduce with 2 replicas with a shared nfs storage.

NOTE: Uploaded blob during a failed push does NOT differ from a correct layer blob resulting from successful push. So the problem is just with computing digest.

I'll upload logs with extended debug messages for both replicas.
Comment 22 Michal Minar 2015-11-05 14:28 EST
Created attachment 1090317 [details]
log of 1st replica from local setup with a failed push

Push command was:
docker -D push 172.30.228.121:5000/joe/mongodb-24-rhel7                                                                                                                                                                                     
The push refers to a repository [172.30.228.121:5000/joe/mongodb-24-rhel7] (len: 1)
d17602c1d664: Pushing [==================================================>] 234.4 MB
digest invalid: provided digest did not match uploaded content
Comment 23 Michal Minar 2015-11-05 14:28 EST
Created attachment 1090318 [details]
log of 2nd replica from local setup with a failed push

Push command was:
docker -D push 172.30.228.121:5000/joe/mongodb-24-rhel7                                                                                                                                                                                     
The push refers to a repository [172.30.228.121:5000/joe/mongodb-24-rhel7] (len: 1)
d17602c1d664: Pushing [==================================================>] 234.4 MB
digest invalid: provided digest did not match uploaded content
Comment 25 Michal Minar 2015-11-06 08:14:45 EST
I came up with a fix which solves issues on my setup. I haven't tested in QE's environment:
  https://github.com/openshift/origin/pull/5749

The problem is really a NFS storage. After an upload of a large layer (250M), it takes around 40 seconds for os.Stat() to succeed on data blob file on my VMs. And it takes few more seconds for the file to appear to another replica. So even when the layer push succeeds, its fetch may fail if the other replica is being asked.
Comment 26 Michal Minar 2015-11-09 07:05:29 EST
So it seems that NFS issue won't be resolved in 3.1. Our recommendation to our customers will be to use ClientIP session affinity in registry's service configuration, which causes requests from particular docker daemon to be handled by the same registry replica.

Johnny, Ma, could you please re-test with this setting applied? Specifically:

    oadm registry --credentials=/etc/origin/master/openshift-registry.kubeconfig --images='registry.access.redhat.com/openshift3/ose-${component}:${version}' --replicas=2
    oc get -o yaml svc docker-registry | sed 's/\(sessionAffinity:\s*\).*/\1ClientIP/' | oc replace -f -
    # any other setup needed

And if successful, could you please start using it for tests currently blocked on this one?
Comment 27 Paul Weil 2015-11-09 09:55:39 EST
Marking as UpcomingRelease.  Michal, we should document the known issue with NFS and the possible workaround with ClientIP for the release notes while we continue to work on this.
Comment 29 Andy Goldstein 2015-11-10 10:39:43 EST
Some additional (possibly repeated) information:

- pushes fail with both Docker 1.7.1 and 1.8.2

- errors are either of the form "blob upload unknown" or "digest invalid: provided digest did not match uploaded content"

- I set up my client mounts with the 'noac' option and was unable to get a push to fail after several hundred iterations. My inter-host latency, however, was minimal, as the 3 VMs in question were all on the same laptop.
Comment 30 Andy Goldstein 2015-11-10 10:40:39 EST
Upstream issue: https://github.com/docker/distribution/issues/1176
Comment 31 Steve Dickson 2015-11-10 11:18:01 EST
From both the client side and server please do a 
network trace of the NFS traffic.

On the server side
# yum install wireshark
# tshark -w /tmp/server.pcap host <client_ip>
# bzip2 /tmp/server.pcap

On the client side 
# yum install wireshark
# tshark -w /tmp/client.pcap host <server_ip>
# bzip2 /tmp/client.pcap
Comment 32 Andy Goldstein 2015-11-10 17:34 EST
Created attachment 1092481 [details]
server pcap
Comment 33 Andy Goldstein 2015-11-10 17:34 EST
Created attachment 1092482 [details]
client1 pcap
Comment 34 Andy Goldstein 2015-11-10 17:35 EST
Created attachment 1092483 [details]
client2 pcap
Comment 35 Andy Goldstein 2015-11-10 17:37:14 EST
I've attached the packet captures from the NFS server, node1, and node2. During these captures, I attempted to push an image to the load balancer sitting in front of the 2 registry backends. I think I tried to push 10 or 12 times. Each time I got the same error:

[root@ose3-node1 haproxy]# docker push localhost:5010/test1:centos7
The push refers to a repository [localhost:5010/test1] (len: 1)
ce20c473cd8a: Pushing 1.024 kB
blob upload unknown

(Note: these captures were with haproxy and the registry:2.2.0 image)
Comment 36 Andy Goldstein 2015-11-11 10:40:48 EST
From https://github.com/docker/distribution/issues/1176#issuecomment-155558615: "Looks like NFS isn't flushing the writes from one instance to another. Read-after-write consistency is a requirement of the backend."
Comment 37 Michal Minar 2015-11-13 13:52:24 EST
I did few experiments with client mount options. Tested on 3VMs sharing single NFS export of a host:

   /var/shared 192.168.122.0/24(rw,sync,all_squash)

Two nodes were running upstream docker registry (v2.2.0) balanced with an haproxy with a round robin balance algorithm. I run

    docker -D push 192.168.122.101:5010/joe/mongodb-24-rhel7

100 times from the 3rd VM after each change to NFS mount options on nodes. Storage had been wiped out before each consequent push.

The only option modified was `actimeo` which according to NFS(5) man page:

> Using  actimeo  sets  all  of acregmin, acregmax, acdirmin, and acdirmax to the same value.

It's a number of seconds before a client cache is invalidated for file/dir.

All the other NFS options were default:

    rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.101,local_lock=none,addr=192.168.122.1

Here are my foundings:

    actimeo [s]            successful rate [%]
    0                      100
    3                      100
    4                      90
    5                      5
    10                     0

In order words. Invalidating client cache after 3 seconds was enough to get to 100% success rate on a low-latency network.

I went further and found out that setting just `acdirmin` and `acdirmax` to 3 while keeping default values of `acregmin` and `acregmax` reduced success rate to 97%.
So these two options are the most critical. On the other hand setting just `acregmin` and `acregmax` didn't help at all.

Steve, can I do anything else to help you?
Comment 38 Michal Minar 2015-11-16 10:54:54 EST
There's an upstream issue to address NFS mount option specification:
  https://github.com/kubernetes/kubernetes/issues/17226
Comment 43 Michal Minar 2015-12-07 15:23:39 EST
Erik,

I run registry instances like this:

  docker run --rm --name upstream-registry -v /mnt/shared/registry:/var/lib/registry:rw docker.io/registry:2.2.0

outside of OpenShift. With OpenShift, the easiest way is to use hostMount:

  oc volume deploymentconfigs/docker-registry --add --overwrite --name=registry-storage -t hostPath -m /registry --path mnt/shared/registry

Or you could also "remount" existing persistent NFS volume:

  mount /var/lib/origin/openshift.local.volumes/pods/5164de5d-846e-11e5-8629-525400045043/volumes/kubernetes.io~nfs/registry-storage -o remount,noac
Comment 44 Michal Minar 2016-01-07 11:17:48 EST
Steve,

are you looking into this? Is there anything I can help with?
Comment 45 Steve Dickson 2016-01-11 18:13:54 EST
(In reply to Michal Minar from comment #44)
> Steve,
> 
> are you looking into this? Is there anything I can help with?
I'm not ignoring it.. ;-) but would mind setting up a couple
beaker machines or VMs where I can reproduce this problem? 

That would definitely help get started.
Comment 48 Wang Haoran 2016-01-14 05:30:14 EST
1 replica + host dir: sometime will fail
Comment 49 Andy Goldstein 2016-01-14 09:16:16 EST
Could you please provide more info around your issue with hostDir? Is it a hostDir that happens to be stored on an NFS server? Or just a regular directory on the host? What error messages are you seeing, and what's in the logs? It's possible you're running into a different issue.
Comment 50 Wang Haoran 2016-01-14 21:09:52 EST
Hi:
It is just a regular directory on the host, openshift is startup using container install version on atomic host. see the error hereunder:
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:16.101s
[INFO] Finished at: Thu Jan 14 03:05:55 EST 2016
[INFO] Final Memory: 14M/93M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "openshift" could not be activated because it does not exist.
Copying all WAR artifacts from /home/jboss/source/target directory into /opt/webserver/webapps for later deployment...
'/home/jboss/source/target/websocket-chat.war' -> '/opt/webserver/webapps/websocket-chat.war'
I0114 03:05:55.835371       1 docker.go:481] Container wait returns with 0 and <nil>
I0114 03:05:55.865325       1 docker.go:488] Container exited
I0114 03:05:55.910049       1 docker.go:571] Invoking postExecution function
I0114 03:05:55.910129       1 sti.go:270] No .sti/environment provided (no environment file found in application sources)
I0114 03:05:56.294482       1 docker.go:606] Committing container with config: {Hostname: Domainname: User:185 Memory:0 MemorySwap:0 CPUShares:0 CPUSet: AttachStdin:false AttachStdout:false AttachStderr:false PortSpecs:[] ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[OPENSHIFT_BUILD_SOURCE=https://github.com/jboss-openshift/openshift-quickstarts.git OPENSHIFT_BUILD_REFERENCE=1.2 OPENSHIFT_BUILD_NAME=openshift-quickstarts-1 OPENSHIFT_BUILD_NAMESPACE=ovuu3] Cmd:[/usr/local/s2i/run] DNS:[] Image: Volumes:map[] VolumeDriver: VolumesFrom: WorkingDir: MacAddress: Entrypoint:[] NetworkDisabled:false SecurityOpts:[] OnBuild:[] Mounts:[] Labels:map[Authoritative_Registry:registry.access.redhat.com BZComponent:jboss-webserver-3-webserver30-tomcat8-openshift-docker Release:7 io.openshift.build.source-context-dir:tomcat-websocket-chat io.openshift.build.source-location:https://github.com/jboss-openshift/openshift-quickstarts.git org.jboss.deployments-dir:/opt/webserver/webapps vcs-type:git Build_Host:rcm-img-docker01.build.eng.bos.redhat.com Name:jboss-webserver-3/webserver30-tomcat8-openshift io.openshift.build.image:rcm-img-docker01.build.eng.bos.redhat.com:5001/jboss-webserver-3/webserver30-tomcat8-openshift:latest io.openshift.build.commit.message:set pom versions of org.openshift quickstarts to 1.2.0.Final io.openshift.build.commit.author:David Ward <dward@redhat.com> Architecture:x86_64 io.openshift.expose-services:8080:http io.k8s.description:Platform for building and running web applications on JBoss Web Server 3.0 - Tomcat v8 io.openshift.build.commit.id:dd6ef49437a8b9aec08523e69166854cc11a0805 Version:1.1 Vendor:Red Hat, Inc. vcs-ref:6db374ff8ce77187745cdc0b09d62991c7820c89 architecture:x86_64 io.openshift.s2i.scripts-url:image:///usr/local/s2i io.k8s.display-name:172.30.115.226:5000/ovuu3/openshift-quickstarts:latest io.openshift.tags:builder,java,tomcat8 build-date:2015-12-10T19:36:07.840739Z io.openshift.build.commit.date:Tue Dec 15 13:22:35 2015 -0500 io.openshift.build.commit.ref:1.2]}
I0114 03:06:36.423794       1 sti.go:315] Successfully built 172.30.115.226:5000/ovuu3/openshift-quickstarts:latest
I0114 03:06:36.498740       1 cleanup.go:23] Removing temporary directory /tmp/s2i-build217028334
I0114 03:06:36.498779       1 fs.go:117] Removing directory '/tmp/s2i-build217028334'
I0114 03:06:36.524868       1 sti.go:214] Using provided push secret for pushing 172.30.115.226:5000/ovuu3/openshift-quickstarts:latest image
I0114 03:06:36.524908       1 sti.go:218] Pushing 172.30.115.226:5000/ovuu3/openshift-quickstarts:latest image ...
I0114 03:07:45.126836       1 sti.go:223] Registry server Address: 
I0114 03:07:45.126964       1 sti.go:224] Registry server User Name: serviceaccount
I0114 03:07:45.127180       1 sti.go:225] Registry server Email: serviceaccount@example.org
I0114 03:07:45.127202       1 sti.go:230] Registry server Password: <<non-empty>>
F0114 03:07:45.127349       1 builder.go:185] Error: build error: Failed to push image. Response from registry is: digest invalid: provided digest did not match uploaded content
Comment 52 Michal Minar 2016-01-15 06:40:48 EST
Wang, can you please post a version of the registry you're using?
`oc -n default status | grep registry`
And the version of docker daemon?
Comment 54 Mike Hepburn 2016-01-17 21:03:04 EST
I just deployed a new ha registry (two separate nodes running the registry) onto an OSE3.1 using fedora 23 as nfs server and was getting the exact same error:

"Response from registry is: digest invalid: provided digest did not match uploaded content"

using this registry version: registry.access.redhat.com/openshift3/ose-docker-registry:v3.1.0.4

i have had some success with changing the nfs server options to add no_wdelay:

-- /etc/exports
/mnt/docker-registry *(rw,sync,no_root_squash,no_wdelay)

STI builds now push OK.
Comment 55 Wang Haoran 2016-01-18 00:07:21 EST
(In reply to Michal Minar from comment #52)
> Wang, can you please post a version of the registry you're using?
> `oc -n default status | grep registry`
> And the version of docker daemon?

[root@openshift-135 ~]# oc -n default status | grep registry
svc/docker-registry - 172.30.160.148:5000
  dc/docker-registry deploys registry.access.redhat.com/openshift3/ose-docker-registry:v3.1.1.4 
  dc/router deploys registry.access.redhat.com/openshift3/ose-haproxy-router:v3.1.1.4 
[root@openshift-135 ~]# docker version
Client:
 Version:      1.8.2-el7
 API version:  1.20
 Package Version: docker-1.8.2-10.el7.x86_64
 Go version:   go1.4.2
 Git commit:   a01dc02/1.8.2
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2-el7
 API version:  1.20
 Package Version: 
 Go version:   go1.4.2
 Git commit:   a01dc02/1.8.2
 Built:        
 OS/Arch:      linux/amd64
Comment 58 Johnny Liu 2016-01-21 05:13:34 EST
After add "no_wdelay" option for nfs server mentioned in comment 54, sti build succeed to push data to docker-registry.
Comment 61 Steve Ovens 2016-03-10 09:56:27 EST
I can confirm the "no_wdelay" option worked in my environment as well.

I have a RHEL 7 OSE environment with an HA load balancer but my NFS storage was on a CentOS 6.7 server.


[root@master00 ~]# oc -n default status | grep registry
svc/docker-registry - 172.50.225.185:5000
  dc/docker-registry deploys registry.access.redhat.com/openshift3/ose-docker-registry:v3.1.1.6 


I was receiving the following in one of my registries:

time="2016-03-10T09:00:56.671348073-05:00" level=error msg="response completed with error" err.code="BLOB_UPLOAD_INVALID" err.detail="Invalid token" err.message="blob upload invalid" go.version=go1.4.2 http.request.host="172.50.225.185:5000" http.request.id=e4066c94-950d-4306-89de-57a1ac573f72 http.request.method=PUT http.request.remoteaddr="10.5.0.1:34874"

attempting to deploy TicketMonster with EAP 1.2.
Comment 62 Michal Minar 2016-04-15 04:17:10 EDT
Origin documentation PR: https://github.com/openshift/openshift-docs/pull/1908
Comment 63 Michal Minar 2016-04-28 11:25:54 EDT
Documented in PR https://github.com/openshift/openshift-docs/pull/1935
Comment 64 Johnny Liu 2016-06-26 23:34:43 EDT
The PR seem good to QE, so move it to verified.

Note You need to log in before you can comment on or make changes to this bug.