Bug 2011951

Summary: [4.9] ClusterVersion Upgradeable=False MultipleReasons should include all messages
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: Over the Air Updates <aos-team-ota>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.8CC: aos-bugs, jialiu, jokerman
Target Milestone: ---Keywords: FastFix
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 2011896
: 2011954 (view as bug list) Environment:
Last Closed: 2021-10-18 17:52:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2011896    
Bug Blocks: 2011954    

Description W. Trevor King 2021-10-07 19:21:44 UTC
+++ This bug was initially created as a clone of Bug #2011896 +++

Because:

Upgradeable=False

  Reason: MultipleReasons
  Message: Cluster cannot be upgraded between minor versions for multiple reasons: AdminAckRequired,IncompatibleOperatorsInstalled

doesn't include all the useful information needed to resolve those issues.  We should pivot to using the same approach we use today when aggregating multiple Upgradeable=False ClusterOperators, and use a bulleted list to append all the constituent messages.

The CVO's current logic goes way back, but the need to urgently fix this begins in 4.8.14, when we grew admin-ack via bug 1999092, colliding with OLM's IncompatibleOperatorsInstalled, which a lot of 4.8 clusters were already experiencing.

--- Additional comment from W. Trevor King on 2021-10-07 17:24:30 UTC ---

Verification should look something like:

1. Install a version with the fix.
2. Put something in spec.overrides to trigger ClusterVersionOverridesSet:

     $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/overrides", "value": [{"kind": "Deployment", "group": "apps/v1", "name": "network-operator", "namespace": "openshift-network-operator", "unmanaged": true}]}]'

3. Create a ClusterOperator to trigger ClusterOperatorsNotUpgradeable:

     $ cat co.yaml 
     apiVersion: config.openshift.io/v1
     kind: ClusterOperator
     metadata:
       name: testing
     spec: {}
     $ oc apply -f co.yaml
     $ oc proxy &  # working around the lack of --subresource: https://github.com/kubernetes/kubernetes/pull/99556
     [1] 16920
     Starting to serve on 127.0.0.1:8001
     $ curl -k -XPATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 'http://127.0.0.1:8001/apis/config.openshift.io/v1/clusteroperators/testing/status' -d '[{"op": "add", "path": "/status", "value": {"conditions": [{"lastTransitionTime": "2021-08-31T01:01:01Z", "type": "Upgradeable", "status": "False", "reason": "Testing", "message": "Testing upgradeable https://example.com/a."}]}}]'
     $ fg
     oc proxy
     ^C

3. Wait a minute or so for the CVO to notice.

4. Check the 'oc adm upgrade' output.  It should include:

     Upgradeable=False

     Reason: MultipleReasons
     Message: Cluster should not be upgraded between minor versions for multiple reasons: ClusterVersionOverridesSet,Testing
     * Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.
     * Cluster operator testing should not be upgraded between minor versions: Testing upgradeable https://example.com/a.

5. Check the web-console output at:

   * The cluster settings page: ${CONSOLE}/settings/cluster
   * The ClusterVersion detail page ${CONSOLE}/k8s/cluster/config.openshift.io~v1~ClusterVersion/version

   They should both include the full message, clearly formatted.

Comment 1 Scott Dodson 2021-10-08 15:58:51 UTC
This bug is not a blocker, but I've labeled the PR for merge as we'd like to get this UX improvement into 4.8 ASAP.

Comment 5 Johnny Liu 2021-10-09 09:28:09 UTC
Verified this bug with 4.9.0-0.nightly-2021-10-08-232649, and PASS.

Install a private disconnected cluster on aws with manuall cco.

[root@preserve-jialiu-ansible ~]# oc adm upgrade
Cluster version is 4.9.0-0.nightly-2021-10-08-232649

Upgradeable=False

  Reason: MissingUpgradeableAnnotation
  Message: Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.


[root@preserve-jialiu-ansible ~]# oc -n openshift-config-managed patch cm admin-gates --patch '{"data":{"ack-4.9-dummy":"testing"}}' --type=merge
configmap/admin-gates patched


[root@preserve-jialiu-ansible ~]# oc adm upgrade
Cluster version is 4.9.0-0.nightly-2021-10-08-232649

Upgradeable=False

  Reason: MultipleReasons
  Message: Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,MissingUpgradeableAnnotation
* testing
* Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.


All the Upgradeable=False reason message is listed in multiple lines.

Comment 7 errata-xmlrpc 2021-10-18 17:52:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759