Bug 2011896

Summary: [4.10] ClusterVersion Upgradeable=False MultipleReasons should include all messages
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.8CC: aos-bugs
Target Milestone: ---Keywords: FastFix
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2011951 (view as bug list) Environment:
Last Closed: 2022-03-10 16:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2011951    

Description W. Trevor King 2021-10-07 16:27:32 UTC
Because:

Upgradeable=False

  Reason: MultipleReasons
  Message: Cluster cannot be upgraded between minor versions for multiple reasons: AdminAckRequired,IncompatibleOperatorsInstalled

doesn't include all the useful information needed to resolve those issues.  We should pivot to using the same approach we use today when aggregating multiple Upgradeable=False ClusterOperators, and use a bulleted list to append all the constituent messages.

The CVO's current logic goes way back, but the need to urgently fix this begins in 4.8.14, when we grew admin-ack via bug 1999092, colliding with OLM's IncompatibleOperatorsInstalled, which a lot of 4.8 clusters were already experiencing.

Comment 1 W. Trevor King 2021-10-07 17:24:30 UTC
Verification should look something like:

1. Install a version with the fix.
2. Put something in spec.overrides to trigger ClusterVersionOverridesSet:

     $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/overrides", "value": [{"kind": "Deployment", "group": "apps/v1", "name": "network-operator", "namespace": "openshift-network-operator", "unmanaged": true}]}]'

3. Create a ClusterOperator to trigger ClusterOperatorsNotUpgradeable:

     $ cat co.yaml 
     apiVersion: config.openshift.io/v1
     kind: ClusterOperator
     metadata:
       name: testing
     spec: {}
     $ oc apply -f co.yaml
     $ oc proxy &  # working around the lack of --subresource: https://github.com/kubernetes/kubernetes/pull/99556
     [1] 16920
     Starting to serve on 127.0.0.1:8001
     $ curl -k -XPATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 'http://127.0.0.1:8001/apis/config.openshift.io/v1/clusteroperators/testing/status' -d '[{"op": "add", "path": "/status", "value": {"conditions": [{"lastTransitionTime": "2021-08-31T01:01:01Z", "type": "Upgradeable", "status": "False", "reason": "Testing", "message": "Testing upgradeable https://example.com/a."}]}}]'
     $ fg
     oc proxy
     ^C

3. Wait a minute or so for the CVO to notice.

4. Check the 'oc adm upgrade' output.  It should include:

     Upgradeable=False

     Reason: MultipleReasons
     Message: Cluster should not be upgraded between minor versions for multiple reasons: ClusterVersionOverridesSet,Testing
     * Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.
     * Cluster operator testing should not be upgraded between minor versions: Testing upgradeable https://example.com/a.

5. Check the web-console output at:

   * The cluster settings page: ${CONSOLE}/settings/cluster
   * The ClusterVersion detail page ${CONSOLE}/k8s/cluster/config.openshift.io~v1~ClusterVersion/version

   They should both include the full message, clearly formatted.

Comment 2 W. Trevor King 2021-10-08 05:00:54 UTC
(In reply to W. Trevor King from comment #1)
> 4. Check the 'oc adm upgrade' output.  It should include:
> 
>      Upgradeable=False
> 
>      Reason: MultipleReasons
>      Message: Cluster should not be upgraded between minor versions for multiple reasons: ClusterVersionOverridesSet,Testing
>      * Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.
>      * Cluster operator testing should not be upgraded between minor versions: Testing upgradeable https://example.com/a.

Indenting will be a bit off until [1] lands, but that didn't seem important enough to tie up in this bug.  So the important part for this bug is just that all that text is there, and not the amount of whitespace in front of each line.

[1]: https://github.com/openshift/oc/pull/952

Comment 6 Johnny Liu 2021-10-09 06:36:50 UTC
Verified this bug with 4.10.0-0.nightly-2021-10-08-215502, and PASS.


1. Install a private disconnected cluster on aws with manuall cco.

2. After installation, check `oc adm upgrdae` output
[root@preserve-jialiu-ansible ~]# oc adm upgrade
Cluster version is 4.10.0-0.nightly-2021-10-08-215502

Upgradeable=False

  Reason: MissingUpgradeableAnnotation
  Message: Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.

3. add a custom admin-gate to create another Upgradeable=False message
[root@preserve-jialiu-ansible ~]# oc -n openshift-config-managed patch cm admin-gates --patch '{"data":{"ack-4.10-dummy":"testing"}}' --type=merge
configmap/admin-gates patched
[root@preserve-jialiu-ansible ~]# oc adm upgrade
Cluster version is 4.10.0-0.nightly-2021-10-08-215502

Upgradeable=False

  Reason: MultipleReasons
  Message: Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,MissingUpgradeableAnnotation
* testing
* Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.

All the Upgradeable=False reason message is listed in multiple lines.

Also checked the cluster settings page: ${CONSOLE}/settings/cluster, the message is also listed there clearly.

Comment 10 errata-xmlrpc 2022-03-10 16:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056