Description of problem: Image registry are crash on cluster with GCP workload identity enabled Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-03-31-213412 4.11.0-0.nightly-2022-03-27-140854 How reproducible: always Steps to Reproduce: 1.Installed cluster enabled GCP workload identity 2. 3. Actual results: $oc logs -f image-registry-598f5bc8b4-gqrh8 time="2022-03-31T03:27:22.32696505Z" level=info msg="start registry" distribution_version=v2.7.1+unknown go.version=go1.17.5 openshift_version=4.11.0-202203281806.p0.g3a9755b.assembly.stream-3a9755b time="2022-03-31T03:27:22.327497699Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.17.5 panic: google: read JWT from JSON credentials: 'type' field is "external_account" (expected "service_account") goroutine 1 [running]: github.com/docker/distribution/registry/handlers.NewApp({0x2025388, 0xc000042088}, 0xc000532700) /go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers/app.go:127 +0x2829 github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2025388, 0xc000042088}, 0x0, {0x2041f78, 0xc000455a70}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2025388, 0xc000042088}, {0x2009830, 0xc000012218}, 0xc000532700, 0xc0002d20a0, {0x0, 0x0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x466 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2025388, 0xc000042088}, 0xc000532700, 0xc0002d20a0) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:210 +0x36a github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x1ff2720, 0xc000012028}) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:164 +0x889 main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496 Other operators have no issues. $oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-03-29-152521 True False False 42m baremetal 4.11.0-0.nightly-2022-03-29-152521 True False False 58m cloud-controller-manager 4.11.0-0.nightly-2022-03-29-152521 True False False 60m cloud-credential 4.11.0-0.nightly-2022-03-29-152521 True False False 57m cluster-autoscaler 4.11.0-0.nightly-2022-03-29-152521 True False False 57m config-operator 4.11.0-0.nightly-2022-03-29-152521 True False False 59m console 4.11.0-0.nightly-2022-03-29-152521 True False False 43m csi-snapshot-controller 4.11.0-0.nightly-2022-03-29-152521 True False False 58m dns 4.11.0-0.nightly-2022-03-29-152521 True False False 57m etcd 4.11.0-0.nightly-2022-03-29-152521 True False False 57m image-registry False True True 52m Available: The deployment does not have available replicas... ingress 4.11.0-0.nightly-2022-03-29-152521 True False False 50m insights 4.11.0-0.nightly-2022-03-29-152521 True False False 52m kube-apiserver 4.11.0-0.nightly-2022-03-29-152521 True False False 55m kube-controller-manager 4.11.0-0.nightly-2022-03-29-152521 True False False 55m kube-scheduler 4.11.0-0.nightly-2022-03-29-152521 True False False 55m kube-storage-version-migrator 4.11.0-0.nightly-2022-03-29-152521 True False False 59m machine-api 4.11.0-0.nightly-2022-03-29-152521 True False False 54m machine-approver 4.11.0-0.nightly-2022-03-29-152521 True False False 57m machine-config 4.11.0-0.nightly-2022-03-29-152521 True False False 56m marketplace 4.11.0-0.nightly-2022-03-29-152521 True False False 58m monitoring 4.11.0-0.nightly-2022-03-29-152521 True False False 49m network 4.11.0-0.nightly-2022-03-29-152521 True False False 59m node-tuning 4.11.0-0.nightly-2022-03-29-152521 True False False 51m openshift-apiserver 4.11.0-0.nightly-2022-03-29-152521 True False False 50m openshift-controller-manager 4.11.0-0.nightly-2022-03-29-152521 True False False 56m openshift-samples 4.11.0-0.nightly-2022-03-29-152521 True False False 52m operator-lifecycle-manager 4.11.0-0.nightly-2022-03-29-152521 True False False 58m operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-03-29-152521 True False False 58m operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-03-29-152521 True False False 52m service-ca 4.11.0-0.nightly-2022-03-29-152521 True False False 59m storage 4.11.0-0.nightly-2022-03-29-152521 True False False 58m Expected results: Image registry should be running Additional info: Workaround: Admin could replace the registry credentials secret with the long-lived one https://coreos.slack.com/archives/C03A3A0SS4Q/p1648746001091929
Marking this as a TestBlocker since it is perm failing installation of our GCP STS profile
@obulatov @arane Marking this blocker? for re-evaluation since it is a TestBlocker for QE and permfails our GCP with STS installs. Can you please add some details why it is blocker- ? Thank you
The target release for this BZ should be 4.12, and we should have a BZ for 4.11.z. The fix for this bug isn't trivial, it requires upstream changes. This fact and other higher priority issues make it problematic for us to fix it in 4.11.0. If someone can upstream Akhil's changes and backport them to our fork, that'll help us move forward. But realistically I expect 4.11.0 to be as broken as 4.10. [1]: https://github.com/openshift/docker-distribution/pull/30
The pr has merged into 4.11.0-0.nightly-2022-06-21-040754, move the bug to verified manually
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069