Created attachment 1476638 [details] screenshot 1: Description of Brick Status panel without status code "3" Description of problem ====================== Description of "Brick Status" panel of At-a-Glance section of Volume dashboard states: > The Brick Status panel displays the status code of each brick for a given > volume. > > 0 = Started > 8 = Stopped But I noticed that the panel reports also status code 3, which is not covered in the description. Version-Release number of selected component ============================================ tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch Steps to Reproduce ================== To see the description of Brick Status panel, you need to just install WA and import any cluster with at least one volume. But to get the table there actually report status code 3, I don't have a 100% reproducer. The following it's mere description of what I was doing when I noticed this: 1. Instal RHGS WA using tendrl-ansible 2. Make sure you have only 2 GiB of ram on storage nodes 3. Import Trusted storage pool with at least one volume, profiling enabled 4. Run test_setup.wiki_tarball.yml based workload from dedicated client machine: 5. Let it running for few days Actual results ============== There is no description of brick status code 3. See screenshot #1. Expected results ================ Description of brick status code 3 is provided. Additional info =============== Code inspection for all possible status code is necessary to check if we are not missing other status codes which could be reported here.
It's also possible that status code 3 should not be reported here at all and it's a bug, but it's not clear which resolution is correct without further analysis.
Please provide get-state output from a node which has affected brick. and also helpful if we have brick detail from etcd.
I no longer have the machines to immediately provide the details here.
Atin, could you provide us a reference of complete list of brick states? This BZ shows that there is at least one more state WA dashboard is not aware about, and I would like to make sure the fix of this BZ incorporates all possible states gluster brick could be in.
Since we are not sure about full list of brick states, this BZ blocks BZ 1613526.
GlusterD maintains 4 different states of brick status which is described below: typedef enum gf_brick_status { GF_BRICK_STOPPED, GF_BRICK_STARTED, GF_BRICK_STOPPING, GF_BRICK_STARTING } gf_brick_status_t; Point to note is GF_BRICK_STOPPING & GF_BRICK_STARTING are the intermediate stages from STARTED->STOPPED and vice versa which do not get reflected in the output. So technically for an user there's STARTED & STOPPED. However I'm not sure what the status code is represented as in the bug description. I don't think a brickinfo->status can have any value greater than 1 in the get-state output.
(In reply to gowtham from comment #3) > Please provide get-state output from a node which has affected brick. and > also helpful if we have brick detail from etcd. I noticed this again and figured where the problem is: the value displayed is some sort of average of 0 or 8 values for the selected time range. You can get any value between 0 and 8 as well, if the circumstances are right. So when you have brick started for some time and the you stop it and wait again, you can get 4 as a status if you select time range which contains both states, but will get 0 if you select only recent time range when the brick was stopped. The description should be updated to include polished explanation and reasoning for this behavior. I'm not creating new ENG BZ for this, as the correct solution here is to address BZ 1505769 and use different plugin for this panel, which will allow to display human readable status based on given enumeration (up, down in this case).