Description of problem: The vsphere-problem-detector feature is causing upgrades to stall that worked previously, forcing users to update configuration solely to get around the problem detector. Depending on the policies around configuration updates, this can be a major hindrance for a user who needs the upgrade to complete and wants to keep the current vSphere settings since they've worked in the past. Version-Release number of selected component (if applicable): 4.7 How reproducible: Consistently Steps to Reproduce: 1. Attempt to upgrade a cluster to 4.7 with invalid vSphere credentials Actual results: The upgrade hangs since the storage operator is degraded due to the vsphere-problem-detector indicating a config problem Expected results: Opt out or bypass the vsphere-problem-detector if the user doesn't want to make a config change, since the setup is working, and upgrades like this succeeded for user previous to 4.7
*** Bug 1955260 has been marked as a duplicate of this bug. ***
Additional info: This also takes place if there is network segmentation blocking access back to the diesore host:port. Upgrades were able to complete by switching the operator to unmanaged/managed at several points of the upgrade however after completing the upgrade, the operator continues to show as degraded.
I found an issue that the message on Available condition is sometimes cleared.
Verified with 4.8.0-0.nightly-2021-05-18-033553. After change to a invalid password by: $ oc -n kube-system edit secret vsphere-creds Then check storage clusteroperator is AVAILABLE and not DEGRADED $ oc get co storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.8.0-0.nightly-2021-05-18-033553 True False False 92m Message from the clusteroperator: $ oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type=="Available")].message}' VSphereProblemDetectorControllerAvailable: failed to connect to vcenter.sddc-44-236-21-251.vmwarevmc.com: ServerFaultCode: Cannot complete login due to an incorrect user name or password. Check the vsphere_sync_errors metric and the alert raised: { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "vsphere_sync_errors", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.128.0.44:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-958d9f68c-w74tb", "service": "vsphere-problem-detector-metrics" }, "value": [ 1621335304.464, "1" ] } ] } } "alerts": [ { "labels": { "alertname": "VSphereOpenshiftConnectionFailure", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.128.0.44:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-958d9f68c-w74tb", "service": "vsphere-problem-detector-metrics", "severity": "warning" }, "annotations": { "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n", "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter." }, "state": "firing", "activeAt": "2021-05-18T10:08:52.396347327Z", "value": "1e+00" }, Marked as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438