Bug 1924947 - Degraded machine config during upgrade: pool master has not progressed to latest configuration
Summary: Degraded machine config during upgrade: pool master has not progressed to lat...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-04 00:04 UTC by Paige Rubendall
Modified: 2021-07-21 20:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-21 20:42:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Paige Rubendall 2021-02-04 00:04:18 UTC
Description of problem:
After chain of upgrades 4.2.36 -> 4.3.40 -> 4.4.33-> 4.5.29 machine-config operator is in DEGRADED state.


Version-Release number of selected component (if applicable):
Cloud AWS (IPI) 

# oc get machines -n openshift-machine-api
NAME                                         PHASE     TYPE        REGION      ZONE         AGE
qe-pr-aws421-42pdg-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   8h
qe-pr-aws421-42pdg-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   8h
qe-pr-aws421-42pdg-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   8h
qe-pr-aws421-42pdg-worker-us-east-2a-h2fft   Running   m4.large    us-east-2   us-east-2a   8h
qe-pr-aws421-42pdg-worker-us-east-2b-7r2fj   Running   m4.large    us-east-2   us-east-2b   8h
qe-pr-aws421-42pdg-worker-us-east-2c-wvdvz   Running   m4.large    us-east-2   us-east-2c   8h

How reproducible: Unsure


Steps to Reproduce:
1. Install an AWS cluster using quay.io/openshift-release-dev/ocp-release:4.2.36-x86_64 with 3 worker nodes 
2. Upgrade cluster

oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64 --force --allow-explicit-upgrade --> PASSED 

oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64 --force --allow-explicit-upgrade --> PASSED

oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.5.28-x86_64 --force --allow-explicit-upgrade --> FAILED

Actual results:
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.33    True        True          4h10m   Working towards 4.5.28: 29% complete


# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.28    True        False         False      8h
cloud-credential                           4.5.28    True        False         False      8h
cluster-autoscaler                         4.5.28    True        False         False      8h
config-operator                            4.5.28    True        False         False      3h50m
console                                    4.5.28    True        False         False      3h20m
csi-snapshot-controller                    4.5.28    True        False         False      3h42m
dns                                        4.5.28    True        False         False      8h
etcd                                       4.5.28    True        False         False      5h7m
image-registry                             4.5.28    True        False         False      3h19m
ingress                                    4.5.28    True        False         False      4h22m
insights                                   4.5.28    True        False         False      8h
kube-apiserver                             4.5.28    True        False         False      5h5m
kube-controller-manager                    4.5.28    True        False         False      5h3m
kube-scheduler                             4.5.28    True        False         False      5h4m
kube-storage-version-migrator              4.5.28    True        False         False      3h22m
machine-api                                4.5.28    True        False         False      8h
machine-approver                           4.5.28    True        False         False      3h45m
machine-config                             4.4.33    False       True          True       3h16m
marketplace                                4.5.28    True        False         False      3h44m
monitoring                                 4.5.28    True        False         False      3h43m
network                                    4.5.28    True        False         False      8h
node-tuning                                4.5.28    True        False         False      3h45m
openshift-apiserver                        4.5.28    True        False         True       3h24m
openshift-controller-manager               4.5.28    True        False         False      3h45m
openshift-samples                          4.5.28    True        False         False      3h45m
operator-lifecycle-manager                 4.5.28    True        False         False      8h
operator-lifecycle-manager-catalog         4.5.28    True        False         False      8h
operator-lifecycle-manager-packageserver   4.5.28    True        False         False      3h20m
service-ca                                 4.5.28    True        False         False      8h
service-catalog-apiserver                  4.4.33    True        False         False      8h
service-catalog-controller-manager         4.4.33    True        False         False      8h
storage                                    4.5.28    True        False         False      3h45m

Expected results:
Cluster Upgrades to 4.5.28 with no degraded cluster operators

Additional info:


# oc describe co machine-config 
Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-02-03T15:14:52Z
  Generation:          1
  Resource Version:    211644
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-config
  UID:                 907cec28-6632-11eb-bb77-02af3b724eee
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-02-03T20:36:22Z
    Message:               Cluster not available for 4.5.28
    Status:                False
    Type:                  Available
    Last Transition Time:  2021-02-03T20:22:40Z
    Message:               Working towards 4.5.28
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2021-02-03T20:36:22Z
    Message:               Unable to apply 4.5.28: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-8717be8dde8d46fe6b908649d1d93068 expected 68dff1c13317ca2756c490c520d029dc67994224 has c96f5b0bfa95eabf4e4fe64068b14eef965f5e22, retrying
    Reason:                RequiredPoolsFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-02-03T15:15:58Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
 ....

# oc describe co openshift-apiserver
Name:         openshift-apiserver
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-02-03T15:15:12Z
  Generation:          1
  Resource Version:    156958
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/openshift-apiserver
  UID:                 9c6fb66c-6632-11eb-bb77-02af3b724eee
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-02-03T20:47:47Z
    Message:               APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver
    Reason:                APIServerDeployment_UnavailablePod
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-02-03T20:02:32Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-02-03T20:27:51Z
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2021-02-03T15:15:12Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
 ....

Comment 2 John Kyros 2021-07-21 20:42:18 UTC
I'm unable to reproduce this. I went through the supplied upgrade sequence with two AWS IPI clusters and both of them upgraded flawlessly.

Given the age of this bug and that there is no additional information that can be gathered at this time, I'm going to close it.  

If you do manage to reproduce this, I'd very much appreciate it if you would take a must-gather and re-open this bug. Thank you.


Note You need to log in before you can comment on or make changes to this bug.