Bug 2064991

Summary: cluster-version operator stops applying manifests when blocked by a precondition check
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Over the Air Updates <aos-team-ota>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.zCC: anaik, aos-bugs, bleanhar, jack.ottofaro, lmohanty, mzali, pmahajan, wking, yanyang
Target Milestone: ---Keywords: Upgrades
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1822752 Environment:
Last Closed: 2022-04-08 05:04:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1822752    
Bug Blocks: 1822922, 2091806    

Comment 2 liujia 2022-03-17 06:37:34 UTC
Yeah, I agree that we should be careful on the backport to v4.10/v4.9 since the change is big enough. I clone the bug to raise the question here, except for un-completed ResourceDeletesInProgress, we need a decision on it's worth to resolve the issue(stop syncing manifests) in 4.10/4.9 or just leave it in current status to be a known issue.

Comment 5 liujia 2022-04-02 06:30:29 UTC
Version : 4.10.8

1. Upgrade cluster to an unsigned payload.
# ./oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295

2. Check that the image check failed and no upgrade happened.
# ./oc get clusterversion -ojson|jq .items[].status.conditions[1]
{
  "lastTransitionTime": "2022-04-02T06:11:53Z",
  "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"
}
# ./oc get clusterversion -ojson|jq .items[].status.conditions[]
...
{
  "lastTransitionTime": "2022-04-02T06:11:53Z",
  "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:edb2f74d5caf03746726808655745baa7f9561f25e9dac39d226380ca0d20295\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"
}
...
{
  "lastTransitionTime": "2022-04-02T04:16:55Z",
  "message": "Cluster version is 4.10.8",
  "status": "False",
  "type": "Progressing"
}

3. Patch maxUnavailable of marketplace-operator deployment 
# ./oc patch -n openshift-marketplace deployment/marketplace-operator --type=json -p '[{"op": "replace", "path": "/spec/strategy/rollingUpdate/maxUnavailable", "value": "50%"}]'
deployment.apps/marketplace-operator patched

# ./oc -n openshift-marketplace get deployment -ojson|jq .items[].spec.strategy.rollingUpdate
{
  "maxSurge": "25%",
  "maxUnavailable": "50%"
}

4. Wait for several minutes, and check the resource reconciled back to 25%
# ./oc -n openshift-marketplace get deployment -ojson|jq .items[].spec.strategy.rollingUpdate
{
  "maxSurge": "25%",
  "maxUnavailable": "25%"
}

Comment 7 errata-xmlrpc 2022-04-08 05:04:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.10.8 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1162