2070791 – [GCP]Image registry are crash on cluster with GCP workload identity enabled

Bug 2070791 - [GCP]Image registry are crash on cluster with GCP workload identity enabled

Summary: [GCP]Image registry are crash on cluster with GCP workload identity enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Akhil Rane
QA Contact:	XiuJuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2099416
TreeView+	depends on / blocked

Reported:	2022-04-01 01:18 UTC by XiuJuan Wang
Modified:	2022-08-10 11:03 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:03:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift docker-distribution pull 31	None	open	Bug 2070791: Support authentication using gcp workload identity federation	2022-06-16 20:28:28 UTC
Github	openshift image-registry pull 335	None	open	Bug 2070791: Support authentication using gcp workload identity federation	2022-06-16 20:56:24 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 11:03:20 UTC

Description XiuJuan Wang 2022-04-01 01:18:32 UTC

Description of problem:
Image registry are crash on cluster with GCP workload identity enabled

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-03-31-213412
4.11.0-0.nightly-2022-03-27-140854

How reproducible:
always

Steps to Reproduce:
1.Installed cluster enabled GCP workload identity 
2.
3.

Actual results:

$oc logs -f image-registry-598f5bc8b4-gqrh8
time="2022-03-31T03:27:22.32696505Z" level=info msg="start registry" distribution_version=v2.7.1+unknown go.version=go1.17.5 openshift_version=4.11.0-202203281806.p0.g3a9755b.assembly.stream-3a9755b
time="2022-03-31T03:27:22.327497699Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.17.5
panic: google: read JWT from JSON credentials: 'type' field is "external_account" (expected "service_account")

goroutine 1 [running]:
github.com/docker/distribution/registry/handlers.NewApp({0x2025388, 0xc000042088}, 0xc000532700)
	/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers/app.go:127 +0x2829
github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2025388, 0xc000042088}, 0x0, {0x2041f78, 0xc000455a70})
	/go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9
github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2025388, 0xc000042088}, {0x2009830, 0xc000012218}, 0xc000532700, 0xc0002d20a0, {0x0, 0x0})
	/go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x466
github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2025388, 0xc000042088}, 0xc000532700, 0xc0002d20a0)
	/go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:210 +0x36a
github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x1ff2720, 0xc000012028})
	/go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:164 +0x889
main.main()
	/go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496

Other operators have no issues.
$oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-0.nightly-2022-03-29-152521   True        False         False      42m
baremetal                                  4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m
cloud-controller-manager                   4.11.0-0.nightly-2022-03-29-152521   True        False         False      60m
cloud-credential                           4.11.0-0.nightly-2022-03-29-152521   True        False         False      57m
cluster-autoscaler                         4.11.0-0.nightly-2022-03-29-152521   True        False         False      57m
config-operator                            4.11.0-0.nightly-2022-03-29-152521   True        False         False      59m
console                                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      43m
csi-snapshot-controller                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m
dns                                        4.11.0-0.nightly-2022-03-29-152521   True        False         False      57m
etcd                                       4.11.0-0.nightly-2022-03-29-152521   True        False         False      57m
image-registry                                                                  False       True          True       52m     Available: The deployment does not have available replicas...
ingress                                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      50m
insights                                   4.11.0-0.nightly-2022-03-29-152521   True        False         False      52m
kube-apiserver                             4.11.0-0.nightly-2022-03-29-152521   True        False         False      55m
kube-controller-manager                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      55m
kube-scheduler                             4.11.0-0.nightly-2022-03-29-152521   True        False         False      55m
kube-storage-version-migrator              4.11.0-0.nightly-2022-03-29-152521   True        False         False      59m
machine-api                                4.11.0-0.nightly-2022-03-29-152521   True        False         False      54m
machine-approver                           4.11.0-0.nightly-2022-03-29-152521   True        False         False      57m
machine-config                             4.11.0-0.nightly-2022-03-29-152521   True        False         False      56m
marketplace                                4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m
monitoring                                 4.11.0-0.nightly-2022-03-29-152521   True        False         False      49m
network                                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      59m
node-tuning                                4.11.0-0.nightly-2022-03-29-152521   True        False         False      51m
openshift-apiserver                        4.11.0-0.nightly-2022-03-29-152521   True        False         False      50m
openshift-controller-manager               4.11.0-0.nightly-2022-03-29-152521   True        False         False      56m
openshift-samples                          4.11.0-0.nightly-2022-03-29-152521   True        False         False      52m
operator-lifecycle-manager                 4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m
operator-lifecycle-manager-catalog         4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m
operator-lifecycle-manager-packageserver   4.11.0-0.nightly-2022-03-29-152521   True        False         False      52m
service-ca                                 4.11.0-0.nightly-2022-03-29-152521   True        False         False      59m
storage                                    4.11.0-0.nightly-2022-03-29-152521   True        False         False      58m


Expected results:
Image registry should be running

Additional info:
Workaround:
Admin could replace the registry credentials secret with the long-lived one
https://coreos.slack.com/archives/C03A3A0SS4Q/p1648746001091929

Comment 2 Mike Fiedler 2022-06-07 12:13:48 UTC

Marking this as a TestBlocker since it is perm failing installation of our GCP STS profile

Comment 3 Mike Fiedler 2022-06-07 12:26:36 UTC

@obulatov @arane Marking this blocker? for re-evaluation since it is a TestBlocker for QE and permfails our GCP with STS installs.  Can you please add some details why it is blocker- ?  Thank you

Comment 4 Oleg Bulatov 2022-06-07 16:46:49 UTC

The target release for this BZ should be 4.12, and we should have a BZ for 4.11.z. The fix for this bug isn't trivial, it requires upstream changes. This fact and other higher priority issues make it problematic for us to fix it in 4.11.0.

If someone can upstream Akhil's changes and backport them to our fork, that'll help us move forward. But realistically I expect 4.11.0 to be as broken as 4.10.

[1]: https://github.com/openshift/docker-distribution/pull/30

Comment 15 XiuJuan Wang 2022-06-23 01:38:50 UTC

The pr has merged into 4.11.0-0.nightly-2022-06-21-040754, move the bug to verified manually

Comment 16 errata-xmlrpc 2022-08-10 11:03:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.