Bug 2000937
Summary: | clusteroperator/machine-config condition/Degraded status/True: pool master has not progressed to latest configuration | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jan Chaloupka <jchaloup> |
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Rio Liu <rioliu> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, kewang, mkrejci, wking, xtian |
Version: | 4.9 | Keywords: | Upgrades |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
jobs=periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade=all
|
|
Last Closed: | 2022-05-18 19:47:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Chaloupka
2021-09-03 10:33:35 UTC
From the same job: ``` 1 unexpected clusteroperator state transitions during e2e test run Sep 03 09:00:00.489 - 289s E clusteroperator/machine-config condition/Available status/False reason/Cluster not available for 4.9.0-0.ci-2021-09-03-073535 ``` From the machine-config-operator (through loki): ``` I0903 05:40:33.641450 1 event.go:282] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"20308572-e715-4af9-9e52-8aa6019ca4ae", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorNotAvailable' Cluster not available for 4.9.0-0.ci-2021-09-03-041553 ``` From vendor/github.com/openshift/api/config/v1/types_cluster_operator.go: ``` // Available indicates that the operand (eg: openshift-apiserver for the // openshift-apiserver-operator), is functional and available in the cluster. // Available=False means at least part of the component is non-functional, // and that the condition requires immediate administrator intervention. OperatorAvailable ClusterStatusConditionType = "Available" ``` Mainly: - Available=False means at least part of the component is non-functional, and that the condition requires immediate administrator intervention. The component is obviously stil functional so it's incorrect to change the condition/Available to False From https://github.com/openshift/machine-config-operator/blob/62d11bb969db4b43770383344826c58049df3803/pkg/operator/status.go#L109-L111: ``` available := configv1.ConditionTrue if degraded { available = configv1.ConditionFalse message = fmt.Sprintf("Cluster not available for %s", optrVersion) ... ``` When the operator goes Degraded=True, it goes Available=false. Fixing the condition/Degraded going True avoids the condition/Available going False. Reproduced the bug, upgrade cluster IPI installed on AWS with FIPS on, OVN network from original build: 4.7.29-x86_64 to target_build: 4.8.0-0.nightly-2021-09-08-225533 $ oc get node NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 09-09 20:24:36.168 ip-10-0-136-50.us-east-2.compute.internal Ready,SchedulingDisabled master 4h17m v1.20.0+9689d22 10.0.136.50 <none> Red Hat Enterprise Linux CoreOS 47.84.202109010857-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-14.rhaos4.7.gitbce257b.el8 09-09 20:24:36.168 ip-10-0-150-141.us-east-2.compute.internal Ready worker 4h2m v1.21.1+9807387 10.0.150.141 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8 09-09 20:24:36.168 ip-10-0-162-91.us-east-2.compute.internal Ready master 4h13m v1.20.0+9689d22 10.0.162.91 <none> Red Hat Enterprise Linux CoreOS 47.84.202109010857-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-14.rhaos4.7.gitbce257b.el8 09-09 20:24:36.168 ip-10-0-174-70.us-east-2.compute.internal Ready worker 4h2m v1.21.1+9807387 10.0.174.70 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8 09-09 20:24:36.168 ip-10-0-196-207.us-east-2.compute.internal Ready worker 4h3m v1.21.1+9807387 10.0.196.207 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8 09-09 20:24:36.169 ip-10-0-213-142.us-east-2.compute.internal Ready master 4h13m v1.21.1+9807387 10.0.213.142 <none> Red Hat Enterprise Linux CoreOS 48.84.202109080059-0 (Ootpa) 4.18.0-305.17.1.el8_4.x86_64 cri-o://1.21.2-15.rhaos4.8.gitcdc4f56.el8 $oc get co/machine-config ... Status: Conditions: Last Transition Time: 2021-09-09T10:13:46Z Message: Working towards 4.8.0-0.nightly-2021-09-08-225533 Status: True Type: Progressing Last Transition Time: 2021-09-09T10:52:31Z Message: One or more machine config pools are updating, please see `oc get mcp` for further details Reason: PoolUpdating Status: False Type: Upgradeable Last Transition Time: 2021-09-09T11:05:16Z Message: Unable to apply 4.8.0-0.nightly-2021-09-08-225533: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-b2d7c4921f4e843f7549d8157c8bab94 expected a537783ea4a0cd3b4fe2a02626ab27887307ea51 has d6d26e1f4e1fc0ed49e4c443bf02bdc376e756b3: 1 (ready 1) out of 3 nodes are updating to latest configuration rendered-master-538c4f8e12cf9de1b57bed08f594719f, retrying Reason: RequiredPoolsFailed Status: True Type: Degraded Last Transition Time: 2021-09-09T10:23:47Z Message: Cluster not available for 4.8.0-0.nightly-2021-09-08-225533 Status: False Type: Available Extension: Master: 1 (ready 1) out of 3 nodes are updating to latest configuration rendered-master-538c4f8e12cf9de1b57bed08f594719f Worker: all 3 nodes are at latest configuration rendered-worker-c7c5061a6524b097d13fc06601495272 Related Objects: ...˽ Versions: Name: operator Version: 4.7.29 ... Not every time reproduced, not blocker. *** This bug has been marked as a duplicate of bug 1955300 *** |