Description of problem:
The issue happens when we import the gluster cluster to tendrl. After importing the cluster, in an hour or two, the bricks used to crash one after the other.
The reason behind the crash was investigated and found that the get-state command causes the crash. (the crash was fixed with the bz https://bugzilla.redhat.com/show_bug.cgi?id=1572075) While doing the RCA it was found that the get-state while calling the server_priv_to_dict doesnt't have the client values in the xprt filled.
The reason why it was null has to be RCA and fixed.
Version-Release number of selected component (if applicable):
not reproducible manually. happens only on importing the cluster to tendrl. Once imported, it happens atleast once in two hours on the cluster.
Steps to Reproduce:
1. Install and configure Gluster Cluster
In my case 6 storage nodes and at least 7 spare disks for bricks peer node.
2. Create one or more volumes.
In my case: volume_alpha_distrep_6x2 and prospectively
volume_beta_arbiter_2_plus_1x2 and volume_gama_disperse_4_plus_2x2.
3. Install and configure RHGS WA (aka Tendrl).
4. Import Gluster cluster into RHGS WA.
5. Watch the status of volumes and bricks for a while.
things are working fine. (there might be something that might have broken and missed the observation. In this, might end up the tendrl WA to show wrong values.)
Things are working fine.
The get-state is called a number of times by the tendrl. most of the times, it works fine. Once in a while this issue happens. (the previous crash mentioned in the bug was a result of this) Looks more like a race condition.
Hari - is this still seen? Can you check with WA QE and see if we can replicate it? If not, ideally this bug needs to be closed.
It was consistently reproducible back when i worked on it. but we weren't able to find when it go through this path(the value is filled right most of the times and only a few times its NULL).
We got to know it was NULL because of the crash, without the crash, its not easy for the QA to find if this happens.
They will have to use GDB to see on that path which is hit often, and then for every hit they need to check if its NULL. which sounds impossible as the path is often crossed.
As they haven't seen any irregularities in the values that WA displays, I'm fine closing this bug.
It needs to be checked if we find the output of WA faulty. Which hasn't happened.
If you feel its fine to be closed, I can close it. Do let me know your thoughts.
Based on the justification available at comment 5, closing this bug.