Description of problem (please be detailed as possible and provide log snippests): KMSServerConnectionAlert is raised correctly for Vault KMS but if cluster uses Thales enterprise key management (KMIP) and the connection in ocs-kms-connection-details is invalid then the alert is not raised. Version of all relevant components (if applicable): ODF 4.15.0-158 OCP 4.15 Can this issue reproducible? yes Can this issue reproduce from the UI? yes - the config can be changed in UI Steps to Reproduce: 1. Install cluster with Thales enterprise key management 2. Edit ocs-kms-connection-details with invalid value. E.g.: oc -n openshift-storage patch ConfigMap ocs-kms-connection-details -n openshift-storage -p '{"data": {"KMIP_ENDPOINT": {some_invalid_endpoint}}}' --type merge 3. Wait few minutes if the alert is raised Actual results: Alert is not raised. Expected results: Alert KMSServerConnectionAlert is raised and user is notified that connection to KMS is unavailable. Additional info: Test run with failed test case - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/35425/ The configuration in ocs-kms-connection-details is little different and different parameters are used (e.g. VAULT_ADDR is missing but there is KMIP_ENDPOINT)
Hi Filip, Can you please provide me the following metric value (from the setup, which has the issue): `ocs_storagecluster_kms_connection_status` Some explanation: According to the query, the alert "KMSServerConnectionAlert" will only be triggered under following condition: `ocs_storagecluster_kms_connection_status{job="ocs-metrics-exporter"} == 1` From the implementation (ocs-operator/metrics/internal/collectors/storage-cluster.go#33), we understand `KMS Connection Status; 0: Connected, 1: Not Connected, 2: KMS not enabled` So we should check what value KMS status is providing during the misconfiguration.
Filip had shared the needed info needed. Thanks Filip. The value of ocs_storagecluster_kms_connection_status stays at 0 (in an invalid kms configured cluster setup, see comment#1). According to the above comment, the value 0 means that it is connected. Need some more time to check this, meanwhile reducing the severity as this will happen only on a misconfigured cluster only. We can move this out of 4.16
Are there any blockers to provide devel ack for this bz? If not, please provide the devel ack.
are we blocked on anything to provide devel ack on this bz?
Please update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676