Description of problem ====================== When a ceph cluster changes it's state, RHSC 2.0 will make this clear by changing an icon shown next to a cluster name (eg. in a Cluster list page or a Cluster Dashboard page). So far so good. The problem is that the details behind this state (why did the cluster enter this state in the first place?) are not directly available anywhere in the RHSC 2.0. This leaves the user with no other option but either to use command line ceph tools on the machines directly or manually log checking. Version-Release =============== On RHSC 2.0 machine: rhscon-core-selinux-0.0.28-1.el7scon.noarch rhscon-core-0.0.28-1.el7scon.x86_64 rhscon-ui-0.0.42-1.el7scon.noarch rhscon-ceph-0.0.27-1.el7scon.x86_64 How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Accept few nodes for the ceph cluster. 3. Create new ceph cluster named 'alpha'. Note: now we are going to get the cluster to the HEALTH_WARN state. To do that, we will create new RADOS block device along with new ceph pool, which we would configure with a stringent quota so that we will hit it outright. 4. Start "Add Storage" wizard and select cluster alpha and "RBD storage" type 5. On "Add Block Storage" form, set: * device name: full_dev, * 1 devices to create * any target size you without overcommitting (1 GB is enough to reproduce this issue) 6. On the same "Add Block Storage" form, select "Create new pool" option, and set: * pool name: full_pool, * while using default values (type, replicas, storage profile ...) * but enable quota and set max number of objects to 1 (crazy low number) 7. Wait for the task to finish so that the RBD with new pool is created. 8. Go to the Cluster list page Actual results ============== The cluster is warning state, as reported by RHSC 2.0: * on the Cluster list page: there is a warning icon with description "Warning" (moreover there is a one alert shown in the alerts collumn), see screenshot 1 * on the Cluster Dashboard, there is a warning icon next to the cluster name But when I would like to check *what went wrong*, I'm unable to do so. When I check the details of the alert, I don't see much details, the page states: * Cluster health changed * Health of cluster 'alpha' degraded from HEALTH_OK to HEALTH_WARN But no more details are available, see screenshot 2. Checking further, there are no other alerts: * list of Pools tab of the cluster (screenshot 3) * list of RBDs of RBDs tab of the cluster (screenshot 4) * anywhere else on the cluster dashboad So to sum it up, so far, the admin have no idea what went exactly wrong and why. On the other hand, when I use ceph command line tools to check the status, I see the issue immediatelly: ~~~ # ceph -c /etc/ceph/alpha.conf -s cluster c402a6ae-960c-4ccf-9543-47f731038a33 health HEALTH_WARN pool 'full_pool' is full monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0} election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81 osdmap e76: 4 osds: 4 up, 4 in flags sortbitwise pgmap v968: 384 pgs, 3 pools, 572 bytes data, 18 objects 152 MB used, 40763 MB / 40915 MB avail 384 active+clean ~~~ Expected results ================ The cluster is warning state, as reported by RHSC 2.0. It's possible to quickly figure out the reason behind the cluster state (maybe in the details of the alert and/or somewhere on the cluster dashboard). See comments from the design team under this bugizlla for a reference.
Created attachment 1174985 [details] screenshot 1: cluster state on the cluster list
Created attachment 1174986 [details] screenshot 2: alert details
Created attachment 1174987 [details] screenshot 3: pool list of given cluster
Created attachment 1175001 [details] screenshot 4: RBD list of given cluster
*** Bug 1370991 has been marked as a duplicate of this bug. ***