2022693 – [RFE] ODF health should reflect the health of Ceph + NooBaa

Bug 2022693 - [RFE] ODF health should reflect the health of Ceph + NooBaa

Summary: [RFE] ODF health should reflect the health of Ceph + NooBaa

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	management-console
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.10.0
Assignee:	Bipul Adhikari
QA Contact:	Mugdha Soni
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2056571
TreeView+	depends on / blocked

Reported:	2021-11-12 11:02 UTC by Bipin Kunal
Modified:	2023-08-09 16:46 UTC (History)
CC List:	14 users (show)
Fixed In Version:	4.10.0-156
Doc Type:	Enhancement
Doc Text:	.View the Block and File or Object Service subcomponents on the ODF Dashboard With this update, you can view the information of the ODF subcomponents, Block and File or Object Service, whenever any of it is down on the ODF Dashboard.
Clone Of:
Environment:
Last Closed:	2022-04-13 18:50:37 UTC
Embargoed:

Attachments	(Terms of Use)
screencast (2.08 MB, video/webm) 2021-11-12 11:02 UTC, Bipin Kunal	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-console pull 117	None	open	Bug 2022693: Add subcomponent status in the ODF Overview dahsboard	2022-02-11 17:00:21 UTC
Github	red-hat-storage odf-console pull 118	None	open	[release-4.10-compatibility] Bug 2022693: Add subcomponent status in the ODF Overview dahsboard	2022-02-14 02:52:50 UTC
Red Hat Product Errata	RHSA-2022:1372	None	None	None	2022-04-13 18:51:04 UTC

Description Bipin Kunal 2021-11-12 11:02:00 UTC

Created attachment 1841421 [details]
screencast

Description of problem (please be detailed as possible and provide log
snippests):

Right now the ODF cluster health is just the relection of ceph health and doesn't reflect NooBaa health. Ideally it should reflect the status taking in account both Ceph and NooBaa

Version of all relevant components (if applicable):
OCP-4.9.0
OCS-quay.io/rhceph-dev/ocs-registry:4.9.0-233.ci

But this is applicable for all versions. 

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

ODF health says Green despite NooBaa/Object is unhealthy

screencast attached

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Comment 5 Bipul Adhikari 2021-12-09 06:39:12 UTC

The fix for this will be part of ODF 4.10.0(stretch goal). This bug needs major changes on how ODF dashboard works. Just changing query is not enough. We need to also make changes on the UXD side. Multiple PRs will be sent to fix this issue.

Comment 6 Bipul Adhikari 2022-01-05 08:05:23 UTC

UX changes that are planned: 

Show muted text under the status.
This muted text would say which SubSystem(Noobaa/Ceph) is down. 
When everything is Okay we will not show this muted text.

We will take both SubSystem's health aggregated via extension points in UI( no changes in std metrics ).

UI requires extensive changes we are trying to achieve this by the FF date.

Comment 7 Yuval 2022-01-12 09:10:17 UTC

So some comments: 
1. I agree that its a bit wired when you see in the list that the storage system is in an error state because of an issue on the MCG side and when you drill down to the system you see the block and file overview and everything is ok there. 
2. I agree with bipul suggestion to add a descriptive text to explain what is wrong, maybe we can make the status clikcable in 4.11 and add more clear text about the subsystem and point the user to the right overview.

Comment 8 Bipin Kunal 2022-02-01 07:41:07 UTC

Bipul,
  Any update on the progress?

-Bipin

Comment 9 Bipul Adhikari 2022-02-15 15:08:36 UTC

The fix is now available.

Comment 15 Mugdha Soni 2022-03-22 09:05:49 UTC

Tested with the following builds:-

OCP : 4.10.0-0.nightly-2022-03-19-230512
ODF : 4.10.0-198

Following were the steps taken:

(a) Successfully deployed ODF cluster and bought down one worker node .

    NAME              STATUS     ROLES    AGE   VERSION
compute-0         NotReady   worker   24h   v1.23.3+e419edf
compute-1         Ready      worker   24h   v1.23.3+e419edf
compute-2         Ready      worker   24h   v1.23.3+e419edf
control-plane-0   Ready      master   25h   v1.23.3+e419edf
control-plane-1   Ready      master   25h   v1.23.3+e419edf
control-plane-2   Ready      master   25h   v1.23.3+e419edf

The alerts were present in data foundation details page and the screenshot for the same are present in comment #13 and comment #14.

Do we need to validate the fix with some other scenarios or can we move it verified based on this test scenario ?


     
Thanks and Regards
Mugdha

Comment 17 Bipul Adhikari 2022-03-22 11:24:51 UTC

Can you test for MCG as well? You could bring Noobaa into error state by creating a backing store and messing it up.

Comment 18 Mugdha Soni 2022-03-29 13:27:06 UTC

Tested the step mentioned in comment#17 with the following builds :-

(a) OCP : 4.10.0-0.nightly-2022-03-27-074444
(b) ODF : 4.10.0-210

THe following steps were performed:-

(a) Deleted the target bucket of the default backing store.


**Observations**

(a) Alert genertated "A NooBaa bucket first.bucket is in error state for more than 5m"
    Alert Name : NooBaaBucketErrorState


Screenshots are available at "https://docs.google.com/document/d/1fHUupVhplWKjNr1BUuErcRg22wMYnwrTeihM0jzrUm4/edit?usp=sharing".

Since the alerts are triggering for MCG also i believe the bug is good to be verified .


Thanks and Regards

Mugdha Soni

Comment 19 Mugdha Soni 2022-03-30 05:53:46 UTC

Based on comment #15 and comment #18 moving the bug to verified state .


Thanks and Regards
Mugdha Soni

Comment 20 Mudit Agarwal 2022-03-31 15:00:31 UTC

Pls add doc text

Comment 24 errata-xmlrpc 2022-04-13 18:50:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Note You need to log in before you can comment on or make changes to this bug.