Bug 2000268

Summary: Mark cluster unupgradable if vcenter, esxi versions or HW versions are unsupported
Product: OpenShift Container Platform Reporter: Hemant Kumar <hekumar>
Component: StorageAssignee: Hemant Kumar <hekumar>
Storage sub component: Kubernetes QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, jsafrane
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:06:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hemant Kumar 2021-09-01 17:34:16 UTC
In 4.9 we are deprecating support for HW version 13, Vsphere < 6.7u3 and hence if host or vcenter is on older versions, we should mark the cluster as unupgradable.

Comment 30 Wei Duan 2021-09-13 11:24:38 UTC
I verified on a new fresh installed cluster with 4.10.0-0.nightly-2021-09-10-083647 for the HW check:

$ oc get clusterversion version -o yaml
  - lastTransitionTime: "2021-09-13T11:23:55Z"
    message: 'Cluster operator storage should not be upgraded between minor versions:
      VSphereProblemDetectorControllerUpgradeable: Marking cluster un-upgradeable
      because one or more VMs are on hardware version vmx-13'
    reason: VSphereProblemDetectorController_VSphereOlderVersionDetected
    status: "False"

It is excepted.

Comment 31 Wei Duan 2021-09-13 11:31:17 UTC
I verified on a upgrade cluster with 4.10.0-0.nightly-2021-09-10-083647 for the vSphere version check but failed:

From the clusterversion, I did not see the unupgrade mark, only have the MCP issue.
{
  "lastTransitionTime": "2021-09-13T07:27:54Z",
  "message": "Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading",
  "reason": "DegradedPool",
  "status": "False",
  "type": "Upgradeable"
}


When checking the cluster-storage-operator, after "Marking cluster un-upgradeable because host host-9 is on esxi version 6.7.2" for a short time, it changed to the "Upgradeable=true" quickly, then entering the cycle, and the Upgradeable=true status lasts more times:
I0913 11:07:47.505292       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"cluster-storage-operator", UID:"bbdde327-c005-4ca9-9354-2d0c41554c2b", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Upgradeable changed from True to False ("VSphereProblemDetectorControllerUpgradeable: Marking cluster un-upgradeable because host host-9 is on esxi version 6.7.2")
I0913 11:08:14.012975       1 controller.go:174] Existing StorageClass thin found, reconciling
I0913 11:08:14.013617       1 status_controller.go:211] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-07T03:50:17Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-13T07:49:15Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-13T07:49:54Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-13T11:08:14Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0913 11:08:14.091682       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"cluster-storage-operator", UID:"bbdde327-c005-4ca9-9354-2d0c41554c2b", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Upgradeable changed from False to True ("All is well")
I0913 11:09:28.356804       1 controller.go:174] Existing StorageClass thin found, reconciling
I0913 11:19:28.357094       1 controller.go:174] Existing StorageClass thin found, reconciling
I0913 11:19:28.372729       1 controller.go:174] Existing StorageClass thin found, reconciling
I0913 11:19:28.375001       1 controller.go:174] Existing StorageClass thin found, reconciling

Comment 32 Hemant Kumar 2021-09-13 16:41:13 UTC
Moving to assign. This failed ON_QA and has a bug.

Comment 33 Hemant Kumar 2021-09-16 19:01:39 UTC
This should be fixed by https://github.com/openshift/vsphere-problem-detector/pull/50

Comment 35 Wei Duan 2021-09-17 08:45:57 UTC
Verified passed on 4.10.0-0.nightly-2021-09-16-220009

I0917 07:23:42.950667       1 status_controller.go:211] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T05:54:15Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:23:40Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:23:42Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T06:58:55Z","message":"VSphereProblemDetectorControllerUpgradeable: Marking cluster un-upgradeable because one or more VMs are on hardware version vmx-13","reason":"VSphereProblemDetectorController_VSphereOlderVersionDetected","status":"False","type":"Upgradeable"}]}}
I0917 07:23:42.965558       1 status_controller.go:211] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T05:54:15Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:23:40Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:23:42Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T06:58:55Z","message":"VSphereProblemDetectorControllerUpgradeable: Marking cluster un-upgradeable because one or more VMs are on hardware version vmx-13","reason":"VSphereProblemDetectorController_VSphereOlderVersionDetected","status":"False","type":"Upgradeable"}]}}
I0917 07:23:42.966097       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"cluster-storage-operator", UID:"38dd962b-4807-42ec-b200-4b9319860edb", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Available changed from False to True ("All is well")
I0917 07:23:42.972647       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"cluster-storage-operator", UID:"38dd962b-4807-42ec-b200-4b9319860edb", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Available changed from False to True ("All is well")
I0917 07:29:39.924537       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:39:39.924927       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:39:40.062817       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:42:49.379783       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:49:39.925318       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:59:39.925785       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 07:59:40.065875       1 controller.go:174] Existing StorageClass thin found, reconciling
I0917 08:02:49.378837       1 controller.go:174] Existing StorageClass thin found, reconciling

Comment 38 errata-xmlrpc 2022-03-10 16:06:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056