Please add doc text
Created attachment 1796115 [details] rgw alert test we ran ocs ci test "tests/manage/monitoring/prometheus/test_rgw.py::test_rgw_unavailable" on IBM Z. Alert itself looks to be working but test is failing due to label mismatch in the alert. Attached the log for your reference.
(In reply to Abdul Kandathil (IBM) from comment #14) > Created attachment 1796115 [details] > rgw alert test > > we ran ocs ci test > "tests/manage/monitoring/prometheus/test_rgw.py::test_rgw_unavailable" on > IBM Z. > Alert itself looks to be working but test is failing due to label mismatch > in the alert. Can you please elaborate on what is expected here? Also, if the alert is working as expected, shouldn't be the test fixed in this case?
@asachan, I am not able to see what info I need to provide. Looks like I don't have permission to view many comments.
Hi Anmol, By fixing this BZ, any chance the alert was changed too? if so could you please confirm it was the intention and we will"fix" the test to align with the new alert
Hi Abdul, Are you talking about the mismatch in the alert message? I can't see from the logs why the test is exactly failing, can you please elaborate. Target label is the alert name which is not changed. Expected message in ci >> Cluster Object Store is in unhealthy state for more than 15s. Please check Ceph cluster health or RGW connection." Message we are getting while running the test: >> Cluster Object Store is in unhealthy state for more than 15s. Please check Ceph cluster health. Is this the issue or I am missing something here?
Hi mudit, Yes, it looks like your observation is right. It is the alert message difference as you noticed. The test expects message: 'Cluster Object Store is in unhealthy state for more than 15s. Please check Ceph cluster health or RGW connection.' And the alert message generated is: 'Cluster Object Store is in unhealthy state. Please check Ceph cluster health.''
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.7.2 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2632
From test runs it seems that this bug was never fixed: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3140
After further investigation I see that the alert is correctly raised in last two 4.7 runs: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3175/ https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3229/ --> Putting back to CLOSED