2064991 – cluster-version operator stops applying manifests when blocked by a precondition check

Bug 2064991 - cluster-version operator stops applying manifests when blocked by a precondition check

Summary: cluster-version operator stops applying manifests when blocked by a precondit...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.10.z
Assignee:	Over the Air Updates
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:	1822752
Blocks:	1822922 2091806
TreeView+	depends on / blocked

Reported:	2022-03-17 04:00 UTC by liujia
Modified:	2022-06-06 16:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1822752
Environment:
Last Closed:	2022-04-08 05:04:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 753	0	None	Merged	Bug 2064991: pkg/cvo: Separate payload load from payload apply	2022-04-03 04:22:43 UTC
Red Hat Product Errata	RHSA-2022:1162	0	None	None	None	2022-04-08 05:04:42 UTC

Comment 2 liujia 2022-03-17 06:37:34 UTC

Yeah, I agree that we should be careful on the backport to v4.10/v4.9 since the change is big enough. I clone the bug to raise the question here, except for un-completed ResourceDeletesInProgress, we need a decision on it's worth to resolve the issue(stop syncing manifests) in 4.10/4.9 or just leave it in current status to be a known issue.

Comment 5 liujia 2022-04-02 06:30:29 UTC

Version : 4.10.8

1. Upgrade cluster to an unsigned payload.
# ./oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295

2. Check that the image check failed and no upgrade happened.
# ./oc get clusterversion -ojson|jq .items[].status.conditions[1]
{
  "lastTransitionTime": "2022-04-02T06:11:53Z",
  "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"
}
# ./oc get clusterversion -ojson|jq .items[].status.conditions[]
...
{
  "lastTransitionTime": "2022-04-02T06:11:53Z",
  "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"
}
...
{
  "lastTransitionTime": "2022-04-02T04:16:55Z",
  "message": "Cluster version is 4.10.8",
  "status": "False",
  "type": "Progressing"
}

3. Patch maxUnavailable of marketplace-operator deployment 
# ./oc patch -n openshift-marketplace deployment/marketplace-operator --type=json -p '[{"op": "replace", "path": "/spec/strategy/rollingUpdate/maxUnavailable", "value": "50%"}]'
deployment.apps/marketplace-operator patched

# ./oc -n openshift-marketplace get deployment -ojson|jq .items[].spec.strategy.rollingUpdate
{
  "maxSurge": "25%",
  "maxUnavailable": "50%"
}

4. Wait for several minutes, and check the resource reconciled back to 25%
# ./oc -n openshift-marketplace get deployment -ojson|jq .items[].spec.strategy.rollingUpdate
{
  "maxSurge": "25%",
  "maxUnavailable": "25%"
}

Comment 7 errata-xmlrpc 2022-04-08 05:04:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.10.8 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1162

Note You need to log in before you can comment on or make changes to this bug.