Description of problem (please be detailed as possible and provide log snippests): KMSServerConnectionAlert gets correctly raised but the alert is not cleared when the connection is restored. Version of all relevant components (if applicable): OCS 4.13.0-179 OCP 4.13.0-0.nightly-2023-05-02-134729 Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproduce from the UI? yes Steps to Reproduce: 1. Install OCS cluster with enabled cluster-wide encryption with Vault KMS. 2. Edit ocs-kms-connection-details config map - set VAULT_ADDR to incorrect address. 3. Observe the alert in OCP console. 4. Edit ocs-kms-connection-details config map back to correct address. 5. Observe the alert in OCP console. Actual results: First edit of the config map triggers the alert but the alert stays and is not cleared when the address is set to correct value again. Expected results: The alert should be cleared when configuration is correct again. Additional info: During testing was not tested if the cluster actually resolves the connection (only the alert). The severity of the bug should be raised if the cluster actually can not restore it's connection when there is a downtime with vault kms server.
Following RCA, Alert: KMSServerConnectionAlert Depends on query: ocs_storagecluster_kms_connection_status{job="ocs-metrics-exporter"} == 1 Metric used here: ocs_storagecluster_kms_connection_status and kms connection status-es are 0: Connected 1: Not Connected 2: KMS not enabled Connection status is determined (in the code) by checking StorageCluster object's `Status.KMSServerConnection.KMSServerConnectionError` string field and this error-field is set when KMS is unreachable. But nowhere (in the code) this field is unset/reset when the connection is (re-)established. That means once populated/set this field will remain.
Submitting a PR: https://github.com/red-hat-storage/ocs-operator/pull/2108
Please follow up on reviews.
Updated the PR
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832