Bug 2192852 - KMSServerConnectionAlert is not cleared when connection to kms is restored
Summary: KMSServerConnectionAlert is not cleared when connection to kms is restored
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph-monitoring
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.14.0
Assignee: arun kumar mohan
QA Contact: Parag Kamble
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-03 10:34 UTC by Filip Balák
Modified: 2023-11-08 18:50 UTC (History)
5 users (show)

Fixed In Version: 4.14.0-111
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 18:50:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2108 0 None open Fix KMSServerConnectionAlert stays even when KMS connection is re-established 2023-07-13 08:41:33 UTC
Github red-hat-storage ocs-operator pull 2144 0 None open Bug 2192852:[release-4.14] Fix KMSServerConnectionAlert stays even when KMS connection is re-established 2023-08-16 15:35:32 UTC

Internal Links: 1944687

Description Filip Balák 2023-05-03 10:34:26 UTC
Description of problem (please be detailed as possible and provide log
snippests):
KMSServerConnectionAlert gets correctly raised but the alert is not cleared when the connection is restored.

Version of all relevant components (if applicable):
OCS 4.13.0-179
OCP 4.13.0-0.nightly-2023-05-02-134729

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproduce from the UI?
yes

Steps to Reproduce:
1. Install OCS cluster with enabled cluster-wide encryption with Vault KMS.
2. Edit ocs-kms-connection-details config map - set VAULT_ADDR to incorrect address.
3. Observe the alert in OCP console.
4. Edit ocs-kms-connection-details config map back to correct address.
5. Observe the alert in OCP console.

Actual results:
First edit of the config map triggers the alert but the alert stays and is not cleared when the address is set to correct value again.

Expected results:
The alert should be cleared when configuration is correct again.

Additional info:
During testing was not tested if the cluster actually resolves the connection (only the alert). The severity of the bug should be raised if the cluster actually can not restore it's connection when there is a downtime with vault kms server.

Comment 3 arun kumar mohan 2023-07-13 07:27:38 UTC
Following RCA,

Alert: KMSServerConnectionAlert
Depends on query: ocs_storagecluster_kms_connection_status{job="ocs-metrics-exporter"} == 1
Metric used here: ocs_storagecluster_kms_connection_status
and 
kms connection status-es are
0: Connected
1: Not Connected
2: KMS not enabled

Connection status is determined (in the code) by checking StorageCluster object's `Status.KMSServerConnection.KMSServerConnectionError` string field and this error-field is set when KMS is unreachable.
But nowhere (in the code) this field is unset/reset when the connection is (re-)established. That means once populated/set this field will remain.

Comment 4 arun kumar mohan 2023-07-13 08:41:34 UTC
Submitting a PR: https://github.com/red-hat-storage/ocs-operator/pull/2108

Comment 6 Mudit Agarwal 2023-08-09 16:03:28 UTC
Please follow up on reviews.

Comment 9 arun kumar mohan 2023-08-16 12:24:17 UTC
Updated the PR

Comment 15 errata-xmlrpc 2023-11-08 18:50:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832


Note You need to log in before you can comment on or make changes to this bug.