Bug 1948378

Summary: Alert 'ClusterObjectStoreState' is not triggered when RGW interface is unavailable
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Sravika <sbalusu>
Component: ceph-monitoringAssignee: Anmol Sachan <asachan>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: fbalak, jthottan, muagarwa, nthomas, ocs-bugs, olakra
Target Milestone: ---   
Target Release: OCS 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
.`ClusterObjectStoreState` alert message is generated when RADOS Object Gateway (RGW) is not available or is unhealthy. Previously, the `ClusterObjectStoreState` alert message was not generated if the RADOS Object Gateway (RGW) was not available or was unhealthy. With a fix implemented in the OpenShift Container Storage operator, users can now see the ClusterObjectStoreState alert when RADOS Object Gateway (RGW) is not available or is unhealthy.
Story Points: ---
Clone Of:
: 1962161 (view as bug list) Environment:
Last Closed: 2021-08-03 18:15:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1962161    
Attachments:
Description Flags
ocs-ci testcase log
none
Must Gather Logs none

Description Sravika 2021-04-12 06:24:20 UTC
Created attachment 1771272 [details]
ocs-ci testcase log

Description of problem (please be detailed as possible and provide log
snippests):

During the ocs-ci tier4a tests, the following test fails as the "ClusterObjectStoreState" alerts are not generated when the RGW interface is unavailable.

"tests/manage/monitoring/prometheus/test_rgw.py::test_rgw_unavailable "

Version of all relevant components (if applicable):

OCP - 4.7.3
OCS -4.7.0-344.ci 
ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)
OCS-CI : commit 0d371476e5949ecc118ab3fad142889ef4ccb860

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

The tier4a test execution results in failures

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCS 4.7.0-344.ci 
2. Perform downscaling of deployment rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a 

oc -n openshift-storage scale --replicas=0 deployment/rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a

3. Check for the alertname "ClusterObjectStoreState" that should be generated when the rgw interface is unavailable


Actual results:

No alert generated 

Expected results:

Alert should be generated when RGW interface is unavailable

Additional info:

Comment 2 Sravika 2021-04-12 06:56:27 UTC
Created attachment 1771275 [details]
Must Gather Logs

Comment 3 Sravika 2021-04-26 13:19:46 UTC
One additional info is that the Object gateway is not supported on IBM Z in OCS 4.7, however this alert was generated in ocs 4.6.2 when the RGW interface was unavailable.

Comment 4 Anmol Sachan 2021-05-04 15:07:59 UTC
*** Bug 1953615 has been marked as a duplicate of this bug. ***

Comment 6 Mudit Agarwal 2021-05-19 11:54:04 UTC
Sure, please create a clone of the BZ

Comment 9 Mudit Agarwal 2021-06-01 10:44:14 UTC
Not backported to 4.8 yet.

Comment 12 Filip Balák 2021-06-11 13:34:54 UTC
Alert is correctly triggered in version ocs-operator.v4.8.0-409.ci --> VERIFIED

Comment 13 Olive Lakra 2021-07-09 04:28:38 UTC
@Mudit - Please review the revised doc text and share feedback

Comment 14 Mudit Agarwal 2021-07-12 06:16:34 UTC
LGTM, thanks

Comment 16 errata-xmlrpc 2021-08-03 18:15:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003