1982300 – vsphere-problem-detector not showing wrong credentials event/alert on OCP Console

Bug 1982300 - vsphere-problem-detector not showing wrong credentials event/alert on OCP Console

Summary: vsphere-problem-detector not showing wrong credentials event/alert on OCP Con...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Jan Safranek
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-14 16:22 UTC by Elior Erez
Modified:	2021-10-18 17:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:39:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-storage-operator pull 195	0	None	None	None	2021-08-02 15:35:06 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:40:06 UTC

Description Elior Erez 2021-07-14 16:22:04 UTC

Description of problem:
When creating a cluster (platform == VSphere), I purposely entered wrong vSphere credentials (and a wrong vCenter address) and no event/alert appears in the console.


Version-Release number of selected component (if applicable):

How reproducible:

Consistently


Steps to Reproduce:
1. Install cluster with wrong credentials 

======== Install config ======== 
...
platform:
  vsphere:
    vCenter: vcenterplaceholder
    username: usernameplaceholder
    password: passwordplaceholder
    datacenter: datacenterplaceholder
    defaultDatastore: defaultdatastoreplaceholder
    network: networkplaceholder
    cluster: clusterplaceholder
    apiVIP: 10.19.115.210
    ingressVIP: 10.19.115.212
...
======== ======== ======== ======


Actual results:

- After Installation no relevant alerts/events exists on the Console 
- On PVC creation, an event is being created "Failed to provision volume with StorageClass "thin": Post "https://placeholder:443/sdk": dial tcp: lookup placeholder on 10.19.115.106:53: no such host".
After updating the vCenter address (on vsphere-creds and on cloud-provider-config) the following event appear: "Failed to provision volume with StorageClass "thin": ServerFaultCode: Cannot complete login due to an incorrect user name or password."


Expected results:

Alert on Console home page 


Additional info:
Related bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1959546

Comment 1 Jan Safranek 2021-07-16 13:45:14 UTC

Can you please attach must-gather? It should have all the logs we need.

Looking at the alert definition in https://github.com/openshift/cluster-storage-operator/blob/release-4.7/assets/vsphere_problem_detector/12_prometheusrules.yaml#L37, there is an alert fired when the vspehere-problem-detector can't connect to vCenter for ~75 minutes. Reason why it can't connect to it does not really matter (wrong IP, firewall on the way, wrong credential, wrong protocol, wrong TLS config...)

Detailed error message can be found in `oc get clusteroperator storage -o yaml` and look for `Available` condition in status. That message is set immediately after connection error, no need to wait for 75 minutes.

Comment 2 Elior Erez 2021-07-25 06:30:04 UTC

After waiting ~75 minutes I succeeded to see the alert that was created by the vspehere-problem-detector.
I'm wondering why not to validate the connection to vCenter right after installation? If client entered wrong credentials he will know it only 75 minutes after installation.

Comment 3 Jan Safranek 2021-07-27 14:46:47 UTC

We should probably reduce the alert time to something like ~10 minutes.

Comment 6 Wei Duan 2021-08-11 09:06:31 UTC

Verified passed on 4.9.0-0.nightly-2021-08-07-175228

Alert became "firing" after 15 mins after "activeAt" time.
      {
        "labels": {
          "alertname": "VSphereOpenshiftConnectionFailure",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.129.0.43:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98",
          "reason": "InvalidCredentials",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n",
          "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter."
        },
        "state": "firing",
        "activeAt": "2021-08-11T08:45:22.396347327Z",
        "value": "1e+00"
      },
      {
        "labels": {
          "alertname": "VSphereOpenshiftConnectionFailure",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.129.0.43:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f446f6f7d-bhv98",
          "reason": "SyncError",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "description": "vsphere-problem-detector cannot access vCenter. As consequence, other OCP components,\nsuch as storage or machine API, may not be able to access vCenter too and provide\ntheir services. Detailed error message can be found in Available condition of\nClusterOperator \"storage\", either in console\n(Administration -> Cluster settings -> Cluster operators tab -> storage) or on\ncommand line: oc get clusteroperator storage -o jsonpath='{.status.conditions[?(@.type==\"Available\")].message}'\n",
          "summary": "vsphere-problem-detector is unable to connect to vSphere vCenter."
        },
        "state": "firing",
        "activeAt": "2021-08-11T08:45:22.396347327Z",
        "value": "1e+00"
      },

Comment 9 errata-xmlrpc 2021-10-18 17:39:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.