Bug 1982300

Summary: vsphere-problem-detector not showing wrong credentials event/alert on OCP Console
Product: OpenShift Container Platform Reporter: Elior Erez <eerez>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, jsafrane, rfreiman
Version: 4.7   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:39:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elior Erez 2021-07-14 16:22:04 UTC
Description of problem:
When creating a cluster (platform == VSphere), I purposely entered wrong vSphere credentials (and a wrong vCenter address) and no event/alert appears in the console.


Version-Release number of selected component (if applicable):

How reproducible:

Consistently


Steps to Reproduce:
1. Install cluster with wrong credentials 

======== Install config ======== 
...
platform:
  vsphere:
    vCenter: vcenterplaceholder
    username: usernameplaceholder
    password: passwordplaceholder
    datacenter: datacenterplaceholder
    defaultDatastore: defaultdatastoreplaceholder
    network: networkplaceholder
    cluster: clusterplaceholder
    apiVIP: 10.19.115.210
    ingressVIP: 10.19.115.212
...
======== ======== ======== ======


Actual results:

- After Installation no relevant alerts/events exists on the Console 
- On PVC creation, an event is being created "Failed to provision volume with StorageClass "thin": Post "https://placeholder:443/sdk": dial tcp: lookup placeholder on 10.19.115.106:53: no such host".
After updating the vCenter address (on vsphere-creds and on cloud-provider-config) the following event appear: "Failed to provision volume with StorageClass "thin": ServerFaultCode: Cannot complete login due to an incorrect user name or password."


Expected results:

Alert on Console home page 


Additional info:
Related bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1959546

Comment 1 Jan Safranek 2021-07-16 13:45:14 UTC
Can you please attach must-gather? It should have all the logs we need.

Looking at the alert definition in https://github.com/openshift/cluster-storage-operator/blob/release-4.7/assets/vsphere_problem_detector/12_prometheusrules.yaml#L37, there is an alert fired when the vspehere-problem-detector can't connect to vCenter for ~75 minutes. Reason why it can't connect to it does not really matter (wrong IP, firewall on the way, wrong credential, wrong protocol, wrong TLS config...)

Detailed error message can be found in `oc get clusteroperator storage -o yaml` and look for `Available` condition in status. That message is set immediately after connection error, no need to wait for 75 minutes.

Comment 2 Elior Erez 2021-07-25 06:30:04 UTC
After waiting ~75 minutes I succeeded to see the alert that was created by the vspehere-problem-detector.
I'm wondering why not to validate the connection to vCenter right after installation? If client entered wrong credentials he will know it only 75 minutes after installation.

Comment 3 Jan Safranek 2021-07-27 14:46:47 UTC
We should probably reduce the alert time to something like ~10 minutes.

Comment 6 Wei Duan 2021-08-11 09:06:31 UTC
Verified passed on 4.9.0-0.nightly-2021-08-07-175228

Alert became "firing" after 15 mins after "activeAt" time.
      {
        "labels": {
          "alertname": "VSphereOpenshiftConnectionFailure",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.129.0.43:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98",
          "reason": "InvalidCredentials",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n",
          "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter."
        },
        "state": "firing",
        "activeAt": "2021-08-11T08:45:22.396347327Z",
        "value": "1e+00"
      },
      {
        "labels": {
          "alertname": "VSphereOpenshiftConnectionFailure",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.129.0.43:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98",
          "reason": "SyncError",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n",
          "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter."
        },
        "state": "firing",
        "activeAt": "2021-08-11T08:45:22.396347327Z",
        "value": "1e+00"
      },

Comment 9 errata-xmlrpc 2021-10-18 17:39:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759