1352108 – Details of a cluster state (what went wrong and why) is not available anywhere in the RHSC 2.0 ui

Bug 1352108 - Details of a cluster state (what went wrong and why) is not available anywhere in the RHSC 2.0 ui

Summary: Details of a cluster state (what went wrong and why) is not available anywher...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	UI
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3
Assignee:	sankarshan
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1370991 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-01 15:42 UTC by Martin Bukatovic
Modified:	2017-03-23 04:05 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 04:05:48 UTC
Embargoed:

Attachments	(Terms of Use)
screenshot 1: cluster state on the cluster list (16.18 KB, image/png) 2016-07-01 15:46 UTC, Martin Bukatovic	no flags	Details
screenshot 2: alert details (25.25 KB, image/png) 2016-07-01 15:47 UTC, Martin Bukatovic	no flags	Details
screenshot 3: pool list of given cluster (51.71 KB, image/png) 2016-07-01 15:48 UTC, Martin Bukatovic	no flags	Details
screenshot 4: RBD list of given cluster (43.08 KB, image/png) 2016-07-01 15:51 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1370991	0	high	CLOSED	Event: Health of cluster 'ceph' degraded from HEALTH_OK to HEALTH_WARN but Reason of HEALTH_WARN is not covered	2021-02-22 00:41:40 UTC

Internal Links: 1370991

Description Martin Bukatovic 2016-07-01 15:42:17 UTC

Description of problem
======================

When a ceph cluster changes it's state, RHSC 2.0 will make this clear by
changing an icon shown next to a cluster name (eg. in a Cluster list page
or a Cluster Dashboard page). So far so good.

The problem is that the details behind this state (why did the cluster enter
this state in the first place?) are not directly available anywhere in the
RHSC 2.0. This leaves the user with no other option but either to use command
line ceph tools on the machines directly or manually log checking.

Version-Release
=============== 

On RHSC 2.0 machine:

rhscon-core-selinux-0.0.28-1.el7scon.noarch
rhscon-core-0.0.28-1.el7scon.x86_64
rhscon-ui-0.0.42-1.el7scon.noarch
rhscon-ceph-0.0.27-1.el7scon.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha'.

Note: now we are going to get the cluster to the HEALTH_WARN state. To do that,
we will create new RADOS block device along with new ceph pool, which we would
configure with a stringent quota so that we will hit it outright.

4. Start "Add Storage" wizard and select cluster alpha and "RBD storage" type
5. On "Add Block Storage" form, set:
    * device name: full_dev,
    * 1 devices to create
    * any target size you without overcommitting (1 GB is enough to reproduce
      this issue)
6. On the same "Add Block Storage" form, select "Create new pool" option, and
   set:
    * pool name: full_pool,
    * while using default values (type, replicas, storage profile ...)
    * but enable quota and set max number of objects to 1 (crazy low number)
7. Wait for the task to finish so that the RBD with new pool is created.
8. Go to the Cluster list page

Actual results
==============

The cluster is warning state, as reported by RHSC 2.0:

* on the Cluster list page: there is a warning icon with description "Warning"
  (moreover there is a one alert shown in the alerts collumn), see
  screenshot 1
* on the Cluster Dashboard, there is a warning icon next to the cluster name

But when I would like to check *what went wrong*, I'm unable to do so.

When I check the details of the alert, I don't see much details, the page
states:

 * Cluster health changed
 * Health of cluster 'alpha' degraded from HEALTH_OK to HEALTH_WARN

But no more details are available, see screenshot 2.

Checking further, there are no other alerts:

 * list of Pools tab of the cluster (screenshot 3)
 * list of RBDs of RBDs tab of the cluster (screenshot 4)
 * anywhere else on the cluster dashboad

So to sum it up, so far, the admin have no idea what went exactly wrong and
why.

On the other hand, when I use ceph command line tools to check the status, I
see the issue immediatelly:

~~~
# ceph -c /etc/ceph/alpha.conf -s
    cluster c402a6ae-960c-4ccf-9543-47f731038a33
     health HEALTH_WARN
            pool 'full_pool' is full
     monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0}
            election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81
     osdmap e76: 4 osds: 4 up, 4 in
            flags sortbitwise
      pgmap v968: 384 pgs, 3 pools, 572 bytes data, 18 objects
            152 MB used, 40763 MB / 40915 MB avail
                 384 active+clean
~~~

Expected results
================

The cluster is warning state, as reported by RHSC 2.0.

It's possible to quickly figure out the reason behind the cluster state
(maybe in the details of the alert and/or somewhere on the cluster dashboard).
See comments from the design team under this bugizlla for a reference.

Comment 1 Martin Bukatovic 2016-07-01 15:46:58 UTC

Created attachment 1174985 [details]
screenshot 1: cluster state on the cluster list

Comment 2 Martin Bukatovic 2016-07-01 15:47:31 UTC

Created attachment 1174986 [details]
screenshot 2: alert details

Comment 3 Martin Bukatovic 2016-07-01 15:48:28 UTC

Created attachment 1174987 [details]
screenshot 3: pool list of given cluster

Comment 4 Martin Bukatovic 2016-07-01 15:51:43 UTC

Created attachment 1175001 [details]
screenshot 4: RBD list of given cluster

Comment 5 Vikhyat Umrao 2016-09-01 13:28:07 UTC

*** Bug 1370991 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.