Bug 1576726 - Get-state calling server_priv_to_dict with the client's value in xprt as NULL
Summary: Get-state calling server_priv_to_dict with the client's value in xprt as NULL
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-10 09:14 UTC by hari gowtham
Modified: 2018-11-02 05:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-02 05:52:21 UTC
Embargoed:


Attachments (Terms of Use)

Description hari gowtham 2018-05-10 09:14:30 UTC
Description of problem:
The issue happens when we import the gluster cluster to tendrl. After importing the cluster, in an hour or two, the bricks used to crash one after the other.
The reason behind the crash was investigated and found that the get-state command causes the crash. (the crash was fixed with the bz https://bugzilla.redhat.com/show_bug.cgi?id=1572075) While doing the RCA it was found that the get-state while calling the server_priv_to_dict doesnt't have the  client values in the xprt filled.

The reason why it was null has to be RCA and fixed.

Version-Release number of selected component (if applicable):
3.12.2

How reproducible:
not reproducible manually. happens only on importing the cluster to tendrl. Once imported, it happens atleast once in two hours on the cluster.

Steps to Reproduce:
1. Install and configure Gluster Cluster
  In my case 6 storage nodes and at least 7 spare disks for bricks peer node.
2. Create one or more volumes.
  In my case: volume_alpha_distrep_6x2[1] and prospectively
  volume_beta_arbiter_2_plus_1x2[2] and volume_gama_disperse_4_plus_2x2[3].
3. Install and configure RHGS WA (aka Tendrl).
4. Import Gluster cluster into RHGS WA.
5. Watch the status of volumes and bricks for a while.

Actual results:
things are working fine. (there might be something that might have broken and missed the observation. In this, might end up the tendrl WA to show wrong values.) 

Expected results:
Things are working fine.

Additional info:
The get-state is called a number of times by the tendrl. most of the times, it works fine. Once in a while this issue happens. (the previous crash mentioned in the bug was a result of this) Looks more like a race condition.

Comment 4 Atin Mukherjee 2018-10-31 12:02:58 UTC
Hari - is this still seen? Can you check with WA QE and see if we can replicate it? If not, ideally this bug needs to be closed.

Comment 5 hari gowtham 2018-10-31 12:32:53 UTC
It was consistently reproducible back when i worked on it. but we weren't able to find when it go through this path(the value is filled right most of the times and only a few times its NULL).

We got to know it was NULL because of the crash, without the crash, its not easy for the QA to find if this happens.

They will have to use GDB to see on that path which is hit often, and then for every hit they need to check if its NULL. which sounds impossible as the path is often crossed.

As they haven't seen any irregularities in the values that WA displays, I'm fine closing this bug.

It needs to be checked if we find the output of WA faulty. Which hasn't happened. 

If you feel its fine to be closed, I can close it. Do let me know your thoughts.

Comment 6 Atin Mukherjee 2018-11-02 05:52:21 UTC
Based on the justification available at comment 5, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.