Bug 1952230

Summary: 'oc adm upgrade' only lists one client-side guard failure
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: ocAssignee: W. Trevor King <wking>
Status: CLOSED WONTFIX QA Contact: Yang Yang <yanyang>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.0CC: aos-bugs, mfojtik, yanyang
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleReset
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-12 13:14:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1992680    
Bug Blocks:    

Description W. Trevor King 2021-04-21 19:53:27 UTC
Since it landed, at least by 4.2, 'oc adm upgrade' has had client-side guards [1].  Those guards are sequential still, so the caller will only hear about the first guard that fails.  Then the user will be prompted, with modern 'oc', to set --allow-upgrade-with-warnings to waive the warnings.  We should adjust checkForUpgrade to return []error, so the user can see all the guards they're waiving before the set the override (and also to see all the guards they waived if the do set the override).

[1]: https://github.com/openshift/oc/commit/cd30f2f864c34bb65e52bf83e929655ebada55dd#diff-65b6199099c673e23589269abae7fddf0d4d849ee63fe6a6d77d458dfe766f59R296-R307

Comment 3 Michal Fojtik 2021-11-05 09:25:15 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Michal Fojtik 2021-11-05 22:25:16 UTC
The LifecycleStale keyword was removed because the bug moved to QE.
The bug assignee was notified.

Comment 6 Yang Yang 2021-11-10 08:15:08 UTC
Reproduced it with Client Version: 4.9.0-0.nightly-2021-09-21-215600:

1. Install a 4.8 cluster
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.19    True        False         20h     Cluster version is 4.8.19

2. Make co degraded 

cat <<EOF >oauth.yaml
     apiVersion: config.openshift.io/v1
     kind: OAuth
     metadata:
       name: cluster
     spec:
       identityProviders:
       - name: oidcidp 
         mappingMethod: claim 
         type: OpenID
         openID:
           clientID: test
           clientSecret: 
             name: test
           claims: 
             preferredUsername:
             - preferred_username
             name:
             - name
             email:
             - email
           issuer: https://www.idp-issuer.example.com 
     EOF

# ./oc apply -f oauth.yaml 
oauth.config.openshift.io/cluster configured

Then, we get authentication degraded, mco unavailable

# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.19    True        False         True       4h17m
baremetal                                  4.8.19    True        False         False      23h
cloud-credential                           4.8.19    True        False         False      23h
cluster-autoscaler                         4.8.19    True        False         False      23h
config-operator                            4.8.19    True        False         False      23h
console                                    4.8.19    True        False         False      22h
csi-snapshot-controller                    4.8.19    True        False         False      23h
dns                                        4.8.19    True        False         False      23h
etcd                                       4.8.19    True        False         False      23h
image-registry                             4.8.19    True        False         False      22h
ingress                                    4.8.19    True        False         False      22h
insights                                   4.8.19    True        False         False      22h
kube-apiserver                             4.8.19    True        False         False      23h
kube-controller-manager                    4.8.19    True        False         False      23h
kube-scheduler                             4.8.19    True        False         False      23h
kube-storage-version-migrator              4.8.19    True        False         False      23h
machine-api                                4.8.19    True        False         False      22h
machine-approver                           4.8.19    True        False         False      23h
machine-config                             4.8.19    False       False         True       35s
marketplace                                4.8.19    True        False         False      23h
monitoring                                 4.8.19    True        False         False      22h
network                                    4.8.19    True        False         False      23h
node-tuning                                4.8.19    True        False         False      23h
openshift-apiserver                        4.8.19    True        False         False      22h
openshift-controller-manager               4.8.19    True        False         False      3h56m
openshift-samples                          4.8.19    True        False         False      22h
operator-lifecycle-manager                 4.8.19    True        False         False      23h
operator-lifecycle-manager-catalog         4.8.19    True        False         False      23h
operator-lifecycle-manager-packageserver   4.8.19    True        False         False      22h
service-ca                                 4.8.19    True        False         False      23h
storage                                    4.8.19    True        False         False      23h

3. Patch to update the channel and desired version
# oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "candidate-4.9"}, {"op": "add", "path": "/spec/desiredUpdate", "value": {"version": "4.9.6"}}]'
clusterversion.config.openshift.io/version patched

# oc adm upgrade 
Error while reconciling 4.8.19: the cluster operator authentication is degraded

Upgradeable=False

  Reason: AdminAckRequired
  Message: Kubernetes 1.22 and therefore OpenShift 4.9 remove several APIs which require admin consideration. Please see
the knowledge article https://access.redhat.com/articles/6329921 for details and instructions.


Upstream is unset, so the cluster will use an appropriate default.
Channel: candidate-4.9
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.8.19 not found in the "fast-4.9" channel

So, oc adm upgrade only reports the co degraded issue, didn't mention the invalid version issue.

# oc get clusterversion -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2021-11-09T08:56:50Z"
    generation: 5
    name: version
    resourceVersion: "516382"
    uid: 51dd6fbb-966e-41f1-bd15-05cbde4cd5ad
  spec:
    channel: candidate-4.9
    clusterID: 9331eba0-85a8-4a94-af81-739f89c70c97
    desiredUpdate:
      version: 4.9.6
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2021-11-09T09:22:55Z"
      message: Done applying 4.8.19
      status: "True"
      type: Available
    - lastTransitionTime: "2021-11-10T07:46:16Z"
      message: Cluster operator authentication is degraded
      reason: ClusterOperatorDegraded
      status: "True"
      type: Failing
    - lastTransitionTime: "2021-11-09T09:22:55Z"
      message: 'Error while reconciling 4.8.19: the cluster operator authentication
        is degraded'
      reason: ClusterOperatorDegraded
      status: "False"
      type: Progressing
    - lastTransitionTime: "2021-11-10T06:41:18Z"
      message: 'Unable to retrieve available updates: currently reconciling cluster
        version 4.8.19 not found in the "fast-4.9" channel'
      reason: VersionNotFound
      status: "False"
      type: RetrievedUpdates
    - lastTransitionTime: "2021-11-09T08:57:20Z"
      message: |
        Kubernetes 1.22 and therefore OpenShift 4.9 remove several APIs which require admin consideration. Please see
        the knowledge article https://access.redhat.com/articles/6329921 for details and instructions.
      reason: AdminAckRequired
      status: "False"
      type: Upgradeable
    - lastTransitionTime: "2021-11-10T07:43:47Z"
      message: 'The cluster version is invalid: spec.desiredUpdate.version: Invalid
        value: "4.9.6": when image is empty the update must be a previous version
        or an available update'
      reason: InvalidClusterVersion
      status: "True"
      type: Invalid
    desired:
      image: quay.io/openshift-release-dev/ocp-release@sha256:ac19c975be8b8a449dedcdd7520e970b1cc827e24042b8976bc0495da32c6b59
      url: https://access.redhat.com/errata/RHBA-2021:4109
      version: 4.8.19
    history:
    - completionTime: "2021-11-09T09:22:55Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:ac19c975be8b8a449dedcdd7520e970b1cc827e24042b8976bc0495da32c6b59
      startedTime: "2021-11-09T08:56:50Z"
      state: Completed
      verified: false
      version: 4.8.19
    observedGeneration: 5
    versionHash: oJVcBisP_Ao=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 8 Yang Yang 2021-11-12 03:44:06 UTC
Verifying it with:
# ./oc version
Client Version: 4.10.0-0.nightly-2021-11-09-181140
Server Version: 4.10.0-0.nightly-2021-11-09-181140
Kubernetes Version: v1.22.1+1b2affc


1. Install a 4.10 cluster
# ./oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-09-181140   True        False         6m20s   Cluster version is 4.10.0-0.nightly-2021-11-09-181140

2. Make co degraded 

cat <<EOF >oauth.yaml
     apiVersion: config.openshift.io/v1
     kind: OAuth
     metadata:
       name: cluster
     spec:
       identityProviders:
       - name: oidcidp 
         mappingMethod: claim 
         type: OpenID
         openID:
           clientID: test
           clientSecret: 
             name: test
           claims: 
             preferredUsername:
             - preferred_username
             name:
             - name
             email:
             - email
           issuer: https://www.idp-issuer.example.com 
     EOF

# ./oc apply -f oauth.yaml 
oauth.config.openshift.io/cluster configured

# ./oc get co | grep auth
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.nightly-2021-11-09-181140   True        False         True       46m     OAuthServerConfigObservationDegraded: failed to apply IDP oidcidp config: dial tcp: lookup www.idp-issuer.example.com on 172.30.0.10:53: no such host

Patch to an invalid desired version:
# ./oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "nightly-4.10"}, {"op": "add", "path": "/spec/desiredUpdate", "value": {"version": "4.10.1"}}]'
clusterversion.config.openshift.io/version patched

# ./oc get clusterversion/version -ojson | jq -r .status.conditions
[
  {
    "lastTransitionTime": "2021-11-12T02:27:44Z",
    "message": "Done applying 4.10.0-0.nightly-2021-11-09-181140",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2021-11-12T03:16:59Z",
    "message": "Cluster operator authentication is degraded",
    "reason": "ClusterOperatorDegraded",
    "status": "True",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2021-11-12T02:27:44Z",
    "message": "Error while reconciling 4.10.0-0.nightly-2021-11-09-181140: the cluster operator authentication is degraded",
    "reason": "ClusterOperatorDegraded",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2021-11-12T02:37:40Z",
    "status": "True",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2021-11-12T03:21:35Z",
    "message": "The cluster version is invalid: spec.desiredUpdate.version: Invalid value: \"4.10.1\": when image is empty the update must be a previous version or an available update",
    "reason": "InvalidClusterVersion",
    "status": "True",
    "type": "Invalid"
  }
]

We have Invalid version and co degraded conditions.

# ./oc adm upgrade 
Error while reconciling 4.10.0-0.nightly-2021-11-09-181140: the cluster operator authentication is degraded

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: nightly-4.10
Available Updates:

VERSION                            IMAGE
4.10.0-0.nightly-2021-11-10-212548 registry.ci.openshift.org/ocp/release@sha256:b15acfa35c303c15148e1032774c91df0b38ea2b3efee4d8c408777d64467c70
4.10.0-0.nightly-2021-11-11-072405 registry.ci.openshift.org/ocp/release@sha256:4a4004a27b74f1f9a229755d9cb77701823ddbba4377bf090a1bfa4579e80d37
4.10.0-0.nightly-2021-11-11-170956 registry.ci.openshift.org/ocp/release@sha256:3b5eeefd3ba57ae2ffe81b34516ab7330fe966067a5ca467fb40d9476905b400
4.10.0-0.nightly-2021-11-12-023027 registry.ci.openshift.org/ocp/release@sha256:7635f6abdcff00ea285d8f85a4cacafed564bd9c9ecbe783cdc3afbc746f1b89

oc adm upgrade only prompts co degraded, doesn't mention invalid version issue.

Trevor, it seems oc still cannot prompt all the conditions. Could you please take a look? Thanks!

Comment 9 Yang Yang 2021-11-16 06:26:55 UTC
Attempted with Client Version: 4.10.0-0.nightly-2021-11-14-184249 and got the same results as comment#8. Re-opening it.

Comment 10 W. Trevor King 2021-11-16 06:55:48 UTC
checkForUpgrade (the function I adjusted in oc#812) is only called when --to, --to-image, or --to-latest is used.  Simplest way to get there is probably to wait until I get the PR for bug 1992680 landed, and then set up the ClusterVersion object so it is Failing=True and Progressing=True at the same time.