Bug 1576726

Summary: Get-state calling server_priv_to_dict with the client's value in xprt as NULL
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: hari gowtham <hgowtham>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED WORKSFORME QA Contact: Bala Konda Reddy M <bmekala>
Severity: low Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, hgowtham, rhs-bugs, sankarshan, sheggodu, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 05:52:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description hari gowtham 2018-05-10 09:14:30 UTC
Description of problem:
The issue happens when we import the gluster cluster to tendrl. After importing the cluster, in an hour or two, the bricks used to crash one after the other.
The reason behind the crash was investigated and found that the get-state command causes the crash. (the crash was fixed with the bz https://bugzilla.redhat.com/show_bug.cgi?id=1572075) While doing the RCA it was found that the get-state while calling the server_priv_to_dict doesnt't have the  client values in the xprt filled.

The reason why it was null has to be RCA and fixed.

Version-Release number of selected component (if applicable):
3.12.2

How reproducible:
not reproducible manually. happens only on importing the cluster to tendrl. Once imported, it happens atleast once in two hours on the cluster.

Steps to Reproduce:
1. Install and configure Gluster Cluster
  In my case 6 storage nodes and at least 7 spare disks for bricks peer node.
2. Create one or more volumes.
  In my case: volume_alpha_distrep_6x2[1] and prospectively
  volume_beta_arbiter_2_plus_1x2[2] and volume_gama_disperse_4_plus_2x2[3].
3. Install and configure RHGS WA (aka Tendrl).
4. Import Gluster cluster into RHGS WA.
5. Watch the status of volumes and bricks for a while.

Actual results:
things are working fine. (there might be something that might have broken and missed the observation. In this, might end up the tendrl WA to show wrong values.) 

Expected results:
Things are working fine.

Additional info:
The get-state is called a number of times by the tendrl. most of the times, it works fine. Once in a while this issue happens. (the previous crash mentioned in the bug was a result of this) Looks more like a race condition.

Comment 4 Atin Mukherjee 2018-10-31 12:02:58 UTC
Hari - is this still seen? Can you check with WA QE and see if we can replicate it? If not, ideally this bug needs to be closed.

Comment 5 hari gowtham 2018-10-31 12:32:53 UTC
It was consistently reproducible back when i worked on it. but we weren't able to find when it go through this path(the value is filled right most of the times and only a few times its NULL).

We got to know it was NULL because of the crash, without the crash, its not easy for the QA to find if this happens.

They will have to use GDB to see on that path which is hit often, and then for every hit they need to check if its NULL. which sounds impossible as the path is often crossed.

As they haven't seen any irregularities in the values that WA displays, I'm fine closing this bug.

It needs to be checked if we find the output of WA faulty. Which hasn't happened. 

If you feel its fine to be closed, I can close it. Do let me know your thoughts.

Comment 6 Atin Mukherjee 2018-11-02 05:52:21 UTC
Based on the justification available at comment 5, closing this bug.