Description of problem: After running several hours or days, image registry was suddenly restarted it's own. Below is the critical panic logs when issue happened: dockerd-current[79274]: fatal error: concurrent map writes dockerd-current[79274]: dockerd-current[79274]: goroutine 174282 [running]: dockerd-current[79274]: runtime.throw(0x19ac0f9, 0x15) dockerd-current[79274]: /usr/lib/golang/src/runtime/panic.go:566 +0x95 fp=0xc425eab1a8 sp=0xc425eab188 dockerd-current[79274]: runtime.mapassign1(0x1758e60, 0xc421d01b90, 0xc425eab390, 0xc425eab380) dockerd-current[79274]: /usr/lib/golang/src/runtime/hashmap.go:458 +0x8ef fp=0xc425eab290 sp=0xc425eab1a8 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*remoteBlobGetterService).proxyStat(0xc421b6e300, 0x7f8e69351528, 0xc425785590, 0x2530f80, 0xc4228862d0, 0xc425eab808, 0xc425fce5a5, 0x47, 0x0, 0x0, ...) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/remoteblobgetter.go:181 +0xcf9 fp=0xc425eab720 sp=0xc425eab290 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*remoteBlobGetterService).findCandidateRepository(0xc421b6e300, 0x7f8e69351528, 0xc425785590, 0xc421addb90, 0x1, 0x1, 0xc4252d5b60, 0xc421c35900, 0x2, 0x2, ...) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/remoteblobgetter.go:228 +0x1ab fp=0xc425eab908 sp=0xc425eab720 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*remoteBlobGetterService).Stat(0xc421b6e300, 0x7f8e69351528, 0xc425785590, 0xc425fce5a5, 0x47, 0x0, 0x0, 0x0, 0x0, 0x0, ...) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/remoteblobgetter.go:94 +0x3e1 fp=0xc425eabc38 sp=0xc425eab908 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*pullthroughBlobStore).copyContent(0xc425377760, 0x254fd00, 0xc421b6e300, 0x7f8e69351528, 0xc425785590, 0xc425fce5a5, 0x47, 0x7f8e692b1c78, 0xc4206b98a0, 0x0, ...) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/pullthroughblobstore.go:152 +0xa9 fp=0xc425eabd50 sp=0xc425eabc38 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*pullthroughBlobStore).storeLocal(0xc425377760, 0x254fd00, 0xc421b6e300, 0x7f8e69351528, 0xc425785590, 0xc425fce5a5, 0x47, 0x0, 0x0) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/pullthroughblobstore.go:199 +0x1d2 fp=0xc425eabe58 sp=0xc425eabd50 dockerd-current[79274]: github.com/openshift/origin/pkg/dockerregistry/server.(*pullthroughBlobStore).ServeBlob.func1(0x7f8e69351528, 0xc425785590, 0xc425377760, 0x254fd00, 0xc421b6e300, 0xc425fce5a5, 0x47) dockerd-current[79274]: /builddir/build/BUILD/atomic-openshift-git-0.b6f55a2/_output/local/go/src/github.com/openshift/origin/pkg/dockerregistry/server/pullthroughblobstore.go:85 +0x19f fp=0xc425eabf18 sp=0xc425eabe58 Version-Release number of selected component (if applicable): - ose-docker-registry:v3.5.5.31 Steps to Reproduce: 1. It is a data race issue, we cannot produce easily. But the logs are evidence. Actual results: - Above gopanic happens Expected results: - No data race happens. Additional info: - Upstream already fixed it. But apparently no backport for enterprise version at this moment(28 Feb). - Make remoteBlobGetterService thread-safe https://github.com/openshift/image-registry/commit/8cac4cd531f7770745f9d8aea5d349ab77e5c28f
The customer would like to keep using OCP v3.5 for their cluster. So, backport the fix[1] to v3.5.x is ideal. However, if v3.6.x's or v3.7.x's registry image is compatible with OCP 3.5, they are happy to use it on OCP. (v3.7 does not have the fix yet, though...) [1] Make remoteBlobGetterService thread-safe https://github.com/openshift/image-registry/commit/8cac4cd531f7770745f9d8aea5d349ab77e5c28f
Verified openshift v3.5.5.31.66 kubernetes v1.5.2+43a9be4 etcd 3.1.0
No changes for this bug, change back to VERIFIED since it's mistakenly changed by scripts
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1235