Description of problem: After installing an 4.2 OCP cluster in Azure, using nightly build 4.2.0-0.nightly-2019-08-25-233755, cluster operator image-registry has not successfully rolled out. The image-registry pods in openshift-image-regitry namespace are both in CrashLoopbackOff state. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 152m Unable to apply 4.2.0-0.nightly-2019-08-25-233755: the cluster operator image-registry has not yet successfully rolled out # oc get co image-registry NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE image-registry False True False 10h # oc get pods --all-namespaces | grep registry openshift-image-registry cluster-image-registry-operator-564bd5dd8f-rg7dk 2/2 Running 0 4h16m openshift-image-registry image-registry-5d76bcb8d8-jnb48 0/1 CrashLoopBackOff 54 4h16m openshift-image-registry image-registry-89b7dbc78-9kn2k 0/1 CrashLoopBackOff 54 4h16m openshift-image-registry node-ca-6kfq8 1/1 Running 0 4h16m openshift-image-registry node-ca-c2g7l 1/1 Running 0 4h16m openshift-image-registry node-ca-g5597 1/1 Running 0 4h16m openshift-image-registry node-ca-mxxcd 1/1 Running 0 4h16m openshift-image-registry node-ca-nw8g5 1/1 Running 0 4h16m # oc logs -n openshift-image-registry image-registry-5d76bcb8d8-jnb48 time="2019-08-26T19:18:17.207010527Z" level=info msg="start registry" distribution_version=v2.6.0+unknown go.version=go1.11.6 openshift_version=v4.2.0-201908251340+a3a2106-dirty time="2019-08-26T19:18:17.207664529Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.11.6 panic: storage: service returned error: StatusCode=400, ErrorCode=InvalidResourceName, ErrorMessage=The specifed resource name contains invalid characters. RequestId:7faf7f4c-901e-00cc-2443-5cbfc0000000 Time:2019-08-26T19:18:17.3080595Z, RequestInitiated=Mon, 26 Aug 2019 19:18:16 GMT, RequestId=7faf7f4c-901e-00cc-2443-5cbfc0000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue= goroutine 1 [running]: github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers.NewApp(0x199c320, 0xc000042038, 0xc0004f1180, 0xc0002d6a00) /go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers/app.go:127 +0x335a github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp(0x199c320, 0xc000042038, 0xc0004f1180, 0x19a28e0, 0xc000496ea0, 0x19a4900) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0x85 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp(0x199c320, 0xc000042038, 0x1982aa0, 0xc00000e738, 0xc0004f1180, 0xc0004006e0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x287 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer(0x199c320, 0xc000042038, 0xc0004f1180, 0xc0004006e0, 0x0, 0x0, 0x19bc4a0) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:187 +0x190 github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute(0x197f280, 0xc00000e1a8) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:162 +0x96a main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x3f6 # oc get pods -n openshift-image-registry NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-564bd5dd8f-rg7dk 2/2 Running 0 4h17m image-registry-5d76bcb8d8-jnb48 0/1 CrashLoopBackOff 54 4h17m image-registry-89b7dbc78-9kn2k 0/1 CrashLoopBackOff 54 4h17m node-ca-6kfq8 1/1 Running 0 4h17m node-ca-c2g7l 1/1 Running 0 4h17m node-ca-g5597 1/1 Running 0 4h17m node-ca-mxxcd 1/1 Running 0 4h17m node-ca-nw8g5 1/1 Running 0 4h17m Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-25-233755 How reproducible: Happened once after install Steps to Reproduce: 1. Install and Azure 4.2 cluster with worker nodes and 3 worker nodes in eastus, instance-type=Standard_D4s_v3, using nightly build: OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE: registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-08-25-233755 2. Copy the kubeconfig to your jump host and check cluster by running oc commands 3. oc get co, oc get clusterversion, oc get pods -n openshift-image-registry Actual results: image-registry-5d76bcb8d8-jnb48 0/1 CrashLoopBackOff 54 4h17m image-registry-89b7dbc78-9kn2k 0/1 CrashLoopBackOff 54 4h17m Expected results: cluster operator image-registry should be deployed successfully and available and image-registry pods Running Additional info: Links to must-gather logs and pod logs will be in next comment
I don't reproduce this issue with 4.2.0-0.nightly-2019-08-26-202352 payload $ oc get pods -n openshift-image-registry NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-dbf75d975-mfb5w 2/2 Running 0 66m image-registry-58654c58cd-gsctf 1/1 Running 0 63m node-ca-9522m 1/1 Running 0 63m node-ca-gsh84 1/1 Running 0 63m node-ca-j2nnp 1/1 Running 0 63m node-ca-tz627 1/1 Running 0 63m node-ca-zd5r2 1/1 Running 0 63m [xiuwang@dhcp-141-173 tmp]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-26-202352 True False 60m Cluster version is 4.2.0-0.nightly-2019-08-26-202352 @Walid Could you paste which version of installer you used?
@XiuJuan I used installer from automated installer Jenkins job: ./openshift-install v4.2.0-201908251340-dirty built from commit c2e6b0afd7f33ae0125d1ac96f3948919748ffc5 release image registry.svc.ci.openshift.org/ocp/release@sha256:0143dc81a5d87bba07c20b7800742b58e24a201f1a1706b0a4f5374cab22e415
Hmm, My jenkins jobs both succeed installation with 4.2.0-0.nightly-2019-08-25-233755 and 4.2.0-0.nightly-2019-08-26-202352 in azure yesterday.
Right now we're about 50% pass rate on Azure CI, which is running every 4 hours, and on-demand from PRs. You may be hitting some of the same flakes CI is. https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-4.2
When set the REGISTRY_STORAGE_AZURE_CONTAINER=qe-xiuwang--azure-909-74z7s-image-registry-qyctpisglrltvbixdqrw manually in 4.2.0-0.nightly-2019-09-08-180038 version,the image-registry pod will be CrashLoopBackOff. It's better to restrict a sequence of dashes saved in config.imageregistry.operator.openshift.io
XiuJuan - the fix ensures that we don't have an invalid container name for IPI installs, or UPI installations where the container name is not provided. If a customer provides a bad container name for UPI installs, failing with a CrashLoopBackoff - though not ideal - is not a release blocker. Please file a separate bug for this. Please verify that if a container name is not provided (UPI or IPI installs), that we generate a valid container name on Azure.
Adam, Thanks, I will open a new low bug to track issue in comment #9 . Installed several azure clusters with IPI installation, also checked azure ci, didn't find the image-registry pods in CrashLoopBackoff error. All generated container name has no a sequence of dashes, such as hongli-azupg-x2v5q-image-registry-uckvbxabnnvpkfmdopbgjxuyuici Checked with 4.2.0-0.nightly-2019-09-08-180038 payload.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922