Bug 2000937
| Summary: | clusteroperator/machine-config condition/Degraded status/True: pool master has not progressed to latest configuration | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jan Chaloupka <jchaloup> |
| Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
| Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Rio Liu <rioliu> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, kewang, mkrejci, wking, xtian |
| Version: | 4.9 | Keywords: | Upgrades |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
jobs=periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade=all
|
|
| Last Closed: | 2022-05-18 19:47:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Chaloupka
2021-09-03 10:33:35 UTC
From the same job:
```
1 unexpected clusteroperator state transitions during e2e test run
Sep 03 09:00:00.489 - 289s E clusteroperator/machine-config condition/Available status/False reason/Cluster not available for 4.9.0-0.ci-2021-09-03-073535
```
From the machine-config-operator (through loki):
```
I0903 05:40:33.641450 1 event.go:282] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"20308572-e715-4af9-9e52-8aa6019ca4ae", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorNotAvailable' Cluster not available for 4.9.0-0.ci-2021-09-03-041553
```
From vendor/github.com/openshift/api/config/v1/types_cluster_operator.go:
```
// Available indicates that the operand (eg: openshift-apiserver for the
// openshift-apiserver-operator), is functional and available in the cluster.
// Available=False means at least part of the component is non-functional,
// and that the condition requires immediate administrator intervention.
OperatorAvailable ClusterStatusConditionType = "Available"
```
Mainly:
- Available=False means at least part of the component is non-functional, and that the condition requires immediate administrator intervention.
The component is obviously stil functional so it's incorrect to change the condition/Available to False
From https://github.com/openshift/machine-config-operator/blob/62d11bb969db4b43770383344826c58049df3803/pkg/operator/status.go#L109-L111:
```
available := configv1.ConditionTrue
if degraded {
available = configv1.ConditionFalse
message = fmt.Sprintf("Cluster not available for %s", optrVersion)
...
```
When the operator goes Degraded=True, it goes Available=false.
Fixing the condition/Degraded going True avoids the condition/Available going False.
Reproduced the bug, upgrade cluster IPI installed on AWS with FIPS on, OVN network from original build: 4.7.29-x86_64 to target_build: 4.8.0-0.nightly-2021-09-08-225533
$ oc get node
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
09-09 20:24:36.168 ip-10-0-136-50.us-east-2.compute.internal Ready,SchedulingDisabled master 4h17m v1.20.0+9689d22 10.0.136.50 <none> Red Hat Enterprise Linux CoreOS 47.84.202109010857-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-14.rhaos4.7.gitbce257b.el8
09-09 20:24:36.168 ip-10-0-150-141.us-east-2.compute.internal Ready worker 4h2m v1.21.1+9807387 10.0.150.141 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8
09-09 20:24:36.168 ip-10-0-162-91.us-east-2.compute.internal Ready master 4h13m v1.20.0+9689d22 10.0.162.91 <none> Red Hat Enterprise Linux CoreOS 47.84.202109010857-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-14.rhaos4.7.gitbce257b.el8
09-09 20:24:36.168 ip-10-0-174-70.us-east-2.compute.internal Ready worker 4h2m v1.21.1+9807387 10.0.174.70 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8
09-09 20:24:36.168 ip-10-0-196-207.us-east-2.compute.internal Ready worker 4h3m v1.21.1+9807387 10.0.196.207 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8
09-09 20:24:36.169 ip-10-0-213-142.us-east-2.compute.internal Ready master 4h13m v1.21.1+9807387 10.0.213.142 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8
$oc get co/machine-config
...
Status:
Conditions:
Last Transition Time: 2021-09-09T10:13:46Z
Message: Working towards 4.8.0-0.nightly-2021-09-08-225533
Status: True
Type: Progressing
Last Transition Time: 2021-09-09T10:52:31Z
Message: One or more machine config pools are updating, please see `oc get mcp` for further details
Reason: PoolUpdating
Status: False
Type: Upgradeable
Last Transition Time: 2021-09-09T11:05:16Z Message: Unable to apply 4.8.0-0.nightly-2021-09-08-225533: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-b2d7c4921f4e843f7549d8157c8bab94 expected a537783ea4a0cd3b4fe2a02626ab27887307ea51 has d6d26e1f4e1fc0ed49e4c443bf02bdc376e756b3: 1 (ready 1) out of 3 nodes are updating to latest configuration rendered-master-538c4f8e12cf9de1b57bed08f594719f, retrying
Reason: RequiredPoolsFailed
Status: True
Type: Degraded
Last Transition Time: 2021-09-09T10:23:47Z
Message: Cluster not available for 4.8.0-0.nightly-2021-09-08-225533
Status: False
Type: Available
Extension:
Master: 1 (ready 1) out of 3 nodes are updating to latest configuration rendered-master-538c4f8e12cf9de1b57bed08f594719f
Worker: all 3 nodes are at latest configuration rendered-worker-c7c5061a6524b097d13fc06601495272
Related Objects:
...˽
Versions:
Name: operator
Version: 4.7.29
...
Not every time reproduced, not blocker. *** This bug has been marked as a duplicate of bug 1955300 *** |