Bug 1982300
Summary: | vsphere-problem-detector not showing wrong credentials event/alert on OCP Console | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Elior Erez <eerez> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Storage sub component: | Operators | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | aos-bugs, jsafrane, rfreiman |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:39:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Elior Erez
2021-07-14 16:22:04 UTC
Can you please attach must-gather? It should have all the logs we need. Looking at the alert definition in https://github.com/openshift/cluster-storage-operator/blob/release-4.7/assets/vsphere_problem_detector/12_prometheusrules.yaml#L37, there is an alert fired when the vspehere-problem-detector can't connect to vCenter for ~75 minutes. Reason why it can't connect to it does not really matter (wrong IP, firewall on the way, wrong credential, wrong protocol, wrong TLS config...) Detailed error message can be found in `oc get clusteroperator storage -o yaml` and look for `Available` condition in status. That message is set immediately after connection error, no need to wait for 75 minutes. After waiting ~75 minutes I succeeded to see the alert that was created by the vspehere-problem-detector. I'm wondering why not to validate the connection to vCenter right after installation? If client entered wrong credentials he will know it only 75 minutes after installation. We should probably reduce the alert time to something like ~10 minutes. Verified passed on 4.9.0-0.nightly-2021-08-07-175228 Alert became "firing" after 15 mins after "activeAt" time. { "labels": { "alertname": "VSphereOpenshiftConnectionFailure", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.129.0.43:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98", "reason": "InvalidCredentials", "service": "vsphere-problem-detector-metrics", "severity": "warning" }, "annotations": { "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n", "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter." }, "state": "firing", "activeAt": "2021-08-11T08:45:22.396347327Z", "value": "1e+00" }, { "labels": { "alertname": "VSphereOpenshiftConnectionFailure", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.129.0.43:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98", "reason": "SyncError", "service": "vsphere-problem-detector-metrics", "severity": "warning" }, "annotations": { "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n", "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter." }, "state": "firing", "activeAt": "2021-08-11T08:45:22.396347327Z", "value": "1e+00" }, Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |