Bug 1857478 - cluster-version-operator does not fail over to store-openshift-official-release-mirror if store-openshift-official-release is unreachable
Summary: cluster-version-operator does not fail over to store-openshift-official-relea...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.4
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 4.5.z
Assignee: Jack Ottofaro
QA Contact: Johnny Liu
URL:
Whiteboard:
: 1906498 (view as bug list)
Depends On: 1840343
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-15 23:43 UTC by Dan Seals
Modified: 2021-05-12 19:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-13 23:43:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
cluster-version-operator.log (28.53 KB, text/plain)
2020-08-05 16:31 UTC, Dan Seals
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 487 0 None closed Bug 1857478: pkg/verify: Parallelize HTTP(S) signature retrieval 2021-05-12 19:45:59 UTC
Red Hat Product Errata RHBA-2021:1015 0 None None None 2021-04-13 23:43:32 UTC

Description Dan Seals 2020-07-15 23:43:51 UTC
Description of problem:
If the store-openshift-official-release in configmap/release-verification is unreachable the upgrade fails to continue. The CVO log never indicates that the 
store-openshift-official-release-mirror is being used.



1. Block google in firewall
2. Attempt an upgrade

Actual results:
From cluster-version-operator log

will check for signatures in containers/image format at https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release and from config maps in openshift-config-managed with label "release.openshift.io/verification-signatures"

I0707 19:06:23.118683       1 verify.go:404] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/<signature-key>/signature-1: context deadline exceeded
E0707 19:06:23.118768       1 sync_worker.go:329] unable to synchronize image (waiting 21.565712806s): The update cannot be verified: context deadline exceeded

I0707 19:12:31.000456       1 verify.go:404] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=<signature key>/signature-1: context deadline exceeded
E0707 19:12:31.000518       1 sync_worker.go:329] unable to synchronize image (waiting 43.131425612s): The update cannot be verified: context deadline exceeded

Comment 1 Jack Ottofaro 2020-07-30 21:20:15 UTC
Please provide the complete cluster-version-operator pod log. Thanks.

Comment 2 Jack Ottofaro 2020-07-30 21:21:06 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 3 Dan Seals 2020-08-05 16:31:18 UTC
Created attachment 1710540 [details]
cluster-version-operator.log

Cluster-version-operator.log per request

Comment 4 W. Trevor King 2020-08-07 03:45:53 UTC
From the attached logs:

$ grep 'verif\|signature' cvo.log
I0707 19:00:37.954735       1 cvo.go:264] Verifying release authenticity: All release image digests must have GPG signatures from verifier-public-key-redhat (567E347AD0044ADE55BA8A5F199E2F91FD431D51: Red Hat, Inc. (release key 2) <security@redhat.com>, B08B659EE86AF623BC90E8DB938A80CAF21541EB: Red Hat, Inc. (beta key 2) <security@redhat.com>) - will check for signatures in containers/image format at https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release and from config maps in openshift-config-managed with label "release.openshift.io/verification-signatures"
I0707 19:00:38.067139       1 store.go:74] use cached most recent signature config maps
I0707 19:00:38.073947       1 store.go:65] remember most recent signature config maps: signatures-managed
I0707 19:00:38.073963       1 store.go:116] searching for sha256-15280aba8f1c82fe39180a617e3b4886401f6c2aef63c7962203aa10530d1db8 in signature config map signatures-managed
I0707 19:06:23.118683       1 verify.go:404] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=15280aba8f1c82fe39180a617e3b4886401f6c2aef63c7962203aa10530d1db8/signature-1: context deadline exceeded
E0707 19:06:23.118768       1 sync_worker.go:329] unable to synchronize image (waiting 21.565712806s): The update cannot be verified: context deadline exceeded
I0707 19:06:45.948896       1 store.go:74] use cached most recent signature config maps
I0707 19:06:45.956044       1 store.go:65] remember most recent signature config maps: signatures-managed
I0707 19:06:45.956062       1 store.go:116] searching for sha256-15280aba8f1c82fe39180a617e3b4886401f6c2aef63c7962203aa10530d1db8 in signature config map signatures-managed
I0707 19:12:31.000456       1 verify.go:404] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=15280aba8f1c82fe39180a617e3b4886401f6c2aef63c7962203aa10530d1db8/signature-1: context deadline exceeded
E0707 19:12:31.000518       1 sync_worker.go:329] unable to synchronize image (waiting 43.131425612s): The update cannot be verified: context deadline exceeded
I0707 19:13:15.914354       1 store.go:74] use cached most recent signature config maps
I0707 19:13:15.922623       1 store.go:65] remember most recent signature config maps: signatures-managed
I0707 19:13:15.922642       1 store.go:116] searching for sha256-15280aba8f1c82fe39180a617e3b4886401f6c2aef63c7962203aa10530d1db8 in signature config map signatures-managed

I think this is a dup of bug 1840343, which is currently fixed in 4.6 and not backported.  Linking it up to formalize that, but feel free to break the dependency chain again if folks disagree.  Not clear to me is why the attached logs here lack the ~2m timeouts and signature-2 requests seen in 1840343's 4.3 logs.  But still, parallel signature retrieval would certainly not hurt here.  Just might not be sufficient.

Comment 5 W. Trevor King 2020-08-07 03:50:53 UTC
Workaround for now for 4.4+ and 4.3.12+ is to manually copy a signature ConfigMap into the cluster, e.g. [1].

[1]: https://docs.openshift.com/container-platform/4.4/updating/updating-restricted-network-cluster.html#updating-restricted-network-image-configmap

Comment 6 Jack Ottofaro 2020-08-21 18:47:57 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 7 W. Trevor King 2020-09-12 20:52:32 UTC
Backporting the series of changes that lead to parallel signature retrieval in 4.6 is not trivial, and the ConfigMap workaround from comment 5 is fairly straightforward.  Maybe we'll get the backport done next sprint.

Comment 8 Jack Ottofaro 2020-09-17 19:24:00 UTC
I'm working on a higher priority task. Hence I'm adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 9 Jack Ottofaro 2020-10-23 18:51:08 UTC
I'm working on a higher priority task. Hence I'm adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 10 Jack Ottofaro 2020-11-12 16:07:33 UTC
I'm working on a higher priority task. Hence I'm adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 11 Jack Ottofaro 2020-12-04 20:49:05 UTC
I'm working on a higher priority task. Hence I'm adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 14 Johnny Liu 2021-03-29 07:32:46 UTC
Verified this bug with 4.5.0-0.nightly-2021-03-28-125930, PASS.


Install a disconnected cluster with 4.5.0-0.nightly-2021-03-28-125930
upgrade the cluster to quay.io/openshift-release-dev/ocp-release:4.6.20-x86_64 without manual signature configmap created
check cvo log, cvo is trying to load signature file in failover way.

[root@preserve-jialiu-ansible ~]# oc logs cluster-version-operator-7fd4f68cb4-29d66 -n openshift-cluster-version |grep 'unable to load'|sort -u
I0329 07:19:34.708103       1 verify.go:362] unable to load signature: Get https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-1: dial tcp 54.173.18.88:443: connect: connection timed out
I0329 07:20:57.913109       1 verify.go:362] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-1: dial tcp 172.253.115.128:443: i/o timeout
I0329 07:20:57.913254       1 verify.go:362] unable to load signature: Get https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-2: dial tcp 54.172.163.83:443: i/o timeout
I0329 07:20:57.913768       1 verify.go:362] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-2: context deadline exceeded
I0329 07:20:57.913791       1 verify.go:362] unable to load signature: Get https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-3: context deadline exceeded
I0329 07:25:43.343723       1 verify.go:362] unable to load signature: Get https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-1: dial tcp 54.173.18.88:443: connect: connection timed out
I0329 07:27:07.332983       1 verify.go:362] unable to load signature: Get https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-2: context deadline exceeded
I0329 07:27:07.332984       1 verify.go:362] unable to load signature: Get https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=ac5bbe391f9f5db07b8a710cfda1aee80f6eb3bf37a3c44a5b89763957d8d5ad/signature-1: context deadline exceeded

Comment 16 errata-xmlrpc 2021-04-13 23:43:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.37 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1015

Comment 17 W. Trevor King 2021-05-12 19:47:48 UTC
*** Bug 1906498 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.