Bug 2022693
| Summary: | [RFE] ODF health should reflect the health of Ceph + NooBaa | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Bipin Kunal <bkunal> | ||||
| Component: | management-console | Assignee: | Bipul Adhikari <badhikar> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Mugdha Soni <musoni> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.9 | CC: | afrahman, badhikar, etamir, jefbrown, madam, mmuench, muagarwa, nthomas, ocs-bugs, odf-bz-bot, rcyriac, rperiyas, shilpsha, ygalanti | ||||
| Target Milestone: | --- | Keywords: | AutomationBackLog, FutureFeature | ||||
| Target Release: | ODF 4.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 4.10.0-156 | Doc Type: | Enhancement | ||||
| Doc Text: |
.View the Block and File or Object Service subcomponents on the ODF Dashboard
With this update, you can view the information of the ODF subcomponents, Block and File or Object Service, whenever any of it is down on the ODF Dashboard.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-04-13 18:50:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2056571 | ||||||
| Attachments: |
|
||||||
The fix for this will be part of ODF 4.10.0(stretch goal). This bug needs major changes on how ODF dashboard works. Just changing query is not enough. We need to also make changes on the UXD side. Multiple PRs will be sent to fix this issue. UX changes that are planned: Show muted text under the status. This muted text would say which SubSystem(Noobaa/Ceph) is down. When everything is Okay we will not show this muted text. We will take both SubSystem's health aggregated via extension points in UI( no changes in std metrics ). UI requires extensive changes we are trying to achieve this by the FF date. So some comments: 1. I agree that its a bit wired when you see in the list that the storage system is in an error state because of an issue on the MCG side and when you drill down to the system you see the block and file overview and everything is ok there. 2. I agree with bipul suggestion to add a descriptive text to explain what is wrong, maybe we can make the status clikcable in 4.11 and add more clear text about the subsystem and point the user to the right overview. Bipul, Any update on the progress? -Bipin The fix is now available. Tested with the following builds:-
OCP : 4.10.0-0.nightly-2022-03-19-230512
ODF : 4.10.0-198
Following were the steps taken:
(a) Successfully deployed ODF cluster and bought down one worker node .
NAME STATUS ROLES AGE VERSION
compute-0 NotReady worker 24h v1.23.3+e419edf
compute-1 Ready worker 24h v1.23.3+e419edf
compute-2 Ready worker 24h v1.23.3+e419edf
control-plane-0 Ready master 25h v1.23.3+e419edf
control-plane-1 Ready master 25h v1.23.3+e419edf
control-plane-2 Ready master 25h v1.23.3+e419edf
The alerts were present in data foundation details page and the screenshot for the same are present in comment #13 and comment #14.
Do we need to validate the fix with some other scenarios or can we move it verified based on this test scenario ?
Thanks and Regards
Mugdha
Can you test for MCG as well? You could bring Noobaa into error state by creating a backing store and messing it up. Tested the step mentioned in comment#17 with the following builds :- (a) OCP : 4.10.0-0.nightly-2022-03-27-074444 (b) ODF : 4.10.0-210 THe following steps were performed:- (a) Deleted the target bucket of the default backing store. **Observations** (a) Alert genertated "A NooBaa bucket first.bucket is in error state for more than 5m" Alert Name : NooBaaBucketErrorState Screenshots are available at "https://docs.google.com/document/d/1fHUupVhplWKjNr1BUuErcRg22wMYnwrTeihM0jzrUm4/edit?usp=sharing". Since the alerts are triggering for MCG also i believe the bug is good to be verified . Thanks and Regards Mugdha Soni Based on comment #15 and comment #18 moving the bug to verified state . Thanks and Regards Mugdha Soni Pls add doc text Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372 |
Created attachment 1841421 [details] screencast Description of problem (please be detailed as possible and provide log snippests): Right now the ODF cluster health is just the relection of ceph health and doesn't reflect NooBaa health. Ideally it should reflect the status taking in account both Ceph and NooBaa Version of all relevant components (if applicable): OCP-4.9.0 OCS-quay.io/rhceph-dev/ocs-registry:4.9.0-233.ci But this is applicable for all versions. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ODF health says Green despite NooBaa/Object is unhealthy screencast attached Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No