Bug 2108022

Summary: [ODF] RFE ODF/Ceph topology in OpenShift Console
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: jpeyrard
Component: ceph-monitoringAssignee: Nishanth Thomas <nthomas>
Status: CLOSED NEXTRELEASE QA Contact: Harish NV Rao <hnallurv>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.8CC: etamir, muagarwa, nthomas, ocs-bugs, odf-bz-bot
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-30 17:20:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jpeyrard 2022-07-18 10:06:59 UTC
Description of problem (please be detailed as possible and provide log
snippests):

We would like to see the Ceph cluster topology in the OpenShift console when ODF is installed. Certainly, we would like to have a new tab inside the storage subsection in the OCP Console with the following information:

* Node names in the Ceph cluster segregated per zone (AZ1, AZ2, ...)
* What devices are working as OSD per node:
  ** In Bare Metal/Virtual environments:
       - Device logical location: /dev/sdb, /dev/sdc, ...
       - ID or UUID: /dev/disk/by-id/XXX or /dev/disk/by-uuid/XXX
       - If Local Storage Operator, PVs consumed from the LSO: (local-pv-XXX)
  ** In cloud environments:
       - Device logical location: /dev/sdb, /dev/sdc, ...
       - Device cloud UUID

The "Bare Metal/Virtual environments" is the installation of OCP/ODF in UPI/IPI in a controlled environment. Which mean the data centre is known.

The "cloud environments" is from Amazon, Azure or any other cloud provider. In this case it is not controlled and the drive are less known and less usefull to know in the end. Because the operator won't be from the customer side.



Version of all relevant components (if applicable):

OCP/ODF 4.8

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


This is important especially in Bare Metal deployments as we can get information directly from the ODF/Ceph cluster on what devices are being used as OSDs in the Ceph cluster.

If other ODF components such as Ceph monitors, managers, mds, rgw, ... or MCG services are also represented in the architecture diagram it would be helpful as well, but the most important thing is knowing what devices in what hosts are being used as OSDs in the ODF/Ceph cluster.


Is there any workaround available to the best of your knowledge?

The only workaround today is using the CLI and toolbox to get enough information to be sure of which drive need to be removed. The process is quite painfull and not easy to have a clear picture of the location of component (soft/hard).


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Here is an example of what might happen.
On a 3 node OCP/ODF with 4 drive used as OSD on each node.
A drive is failing and trig messages on dmesg which cause the filesystem to get slow down.
sosreport get the information from dmesg that something goes bad.
But the OCP/ODF console does not show a clear view of what is happening.
Next a mapping need to be done manually from the dmesg output to the pod location to find the right block device involved. 

Then the operation can begin which require manual operation.
We would like to have a better overview of what is going on from the web console. Particulary the drive location, osd number, what service run on which server. 

Can this issue reproducible?

It's an RFE.

Can this issue reproduce from the UI?

It's an RFE.

If this is a regression, please provide more details to justify this:

No regression, but using ceph directly without OCP/ODF make this thing more clear from the beginning on what to expect.