Bug 1202237

Summary: glusterd crashed on one of the RH Gluster Node
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.0CC: amukherj, anekkunt, annair, bmohanra, cbuissar, lkoranda, mmalhotr, mzink, nlevinki, olim, rcyriac, vagarwal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, in a multi node cluster, if gluster volume status and gluster volume rebalance status are executed from two different nodes concurrently, glusterd daemon could crash. With this fix, this issue is resolved.
Story Points: ---
Clone Of:
: 1202745 1235524 (view as bug list) Environment:
Last Closed: 2015-07-29 04:39:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1230525    
Bug Blocks: 1202745, 1202842    
Attachments:
Description Flags
coredump from one of the cluster
none
glusterd log files
none
test script
none
run.sh none

Description SATHEESARAN 2015-03-16 07:58:01 UTC
Created attachment 1002150 [details]
coredump from one of the cluster

Description of problem:
-----------------------
2 RH Gluster Storage Nodes are managed using RHEVM.
Distributed-replicate volume was created and used to hosting VM Images ( virt-store ). App VMs were created and were running for last 4 days.

I observed that the App VMs are paused, due to lack of server-side quorum, which was caused by glusterd crash on one of the RH Gluster Storage Node.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.6.0.51-1.el6rhs

How reproducible:
-----------------
Happened once in the testing

Steps to Reproduce:
--------------------
1. Manage 2 RH Gluster Storage Nodes

2. Create a distributed-replicate volume and start it.

3. Optimize the volume for virt-store
    # gluster volume set <vol-name> group virt
    # gluster volume set <vol-name> storage.owner-uid 36
    # gluster volume set <vol-name> storage.owner-gid 36

4. use this volume as the DataStore for RHEVM

5. Create few AppVMs, Install OS, and Run them for 2 or 3 days

Actual results:
----------------
In one of the RH Gluster Storage Node, glusterd crashed

Expected results:
-----------------
'glusterd' should not crash.

Comment 1 SATHEESARAN 2015-03-16 08:02:14 UTC
Created attachment 1002151 [details]
glusterd log files

Comment 5 Atin Mukherjee 2015-03-17 07:08:20 UTC
Analysis goes like this:

Trans 1 - gluster v rebalance <volname> status originated in N1 at T1
Trans 2 - gluster v <volname> status originated in N2 at T2
T1 & T2 have a minimal time gap in milisecs

In N2 Trans2 kicks in syncop framework and for Trans1 op-sm is invoked since N2 is the receiver. op-sm & syncop currently uses global opinfo structure to maintain the state of the transaction. In this case for opinfo of Trans2 got overwritten by Trans1 which caused an incorrect state of state machine. Because of this incorrect opinfo the crash was observed.

Since this is a race condition, it could happen rarely.

Comment 7 Atin Mukherjee 2015-03-24 06:24:30 UTC
Upstream patch http://review.gluster.org/#/c/9908/ is merged now

Comment 8 Bipin Kunal 2015-05-05 12:19:43 UTC
*** Bug 1209161 has been marked as a duplicate of this bug. ***

Comment 15 SATHEESARAN 2015-07-08 05:49:34 UTC
I have tested this bug with RHGS 3.1 Nightly build ( glusterfs-3.7.1-7.el6rhs )
with the following test :

1. Added 6 RHGS nodes to gluster cluster in RHEVM
2. Created 2 distributed-replicate & 6 distributed volumes ad started all
3. Added a few files from fuse mount (15 files of 10G )
4. Added brick(s) to a volume and triggered rebalance operation from RHEVM
5. RHEVM polls 'gluster volume status' & also 'gluster volume rebalance status' periodically

After sometime, I witnessed glusterd crash in one of the node, and after 12 hours, I witnessed 2 more glusterd's crashing.

Here is the backtrace :

backtrace :
(gdb) bt
#0  0x00007fb7dace5400 in ?? ()
#1  0x00007fb7dbc2bd85 in __gf_free (free_ptr=0x7fb7b85e7640) at mem-pool.c:316
#2  0x00007fb7dbbef215 in data_destroy (data=0x7fb7b85f386c) at dict.c:235
#3  0x00007fb7dbbef4be in dict_get_str (this=<value optimized out>, key=<value optimized out>, str=0x7fb7cccf90f0) at dict.c:2213
#4  0x00007fb7d06228e8 in glusterd_volume_rebalance_use_rsp_dict (aggr=<value optimized out>, rsp_dict=0x7fb7c01a078c) at glusterd-utils.c:8000
#5  0x00007fb7d063a4c5 in __glusterd_commit_op_cbk (req=<value optimized out>, iov=0x7fb7de17c1ac, count=<value optimized out>, myframe=0x7fb7d95e7384) at glusterd-rpc-ops.c:1413
#6  0x00007fb7d0637660 in glusterd_big_locked_cbk (req=0x7fb7de17c16c, iov=0x7fb7de17c1ac, count=1, myframe=0x7fb7d95e7384, fn=0x7fb7d0639d80 <__glusterd_commit_op_cbk>)
    at glusterd-rpc-ops.c:215
#7  0x00007fb7db9c4445 in rpc_clnt_handle_reply (clnt=0x7fb7de13e900, pollin=0x7fb7c0175800) at rpc-clnt.c:766
#8  0x00007fb7db9c58f2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7fb7de13e930, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:894
#9  0x00007fb7db9c0ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543
#10 0x00007fb7cec84255 in socket_event_poll_in (this=0x7fb7de17f010) at socket.c:2290
#11 0x00007fb7cec85e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb7de17f010, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403
#12 0x00007fb7dbc59970 in event_dispatch_epoll_handler (data=0x7fb7de0d13b0) at event-epoll.c:575
#13 event_dispatch_epoll_worker (data=0x7fb7de0d13b0) at event-epoll.c:678
#14 0x00007fb7dace0a51 in ?? ()
#15 0x00007fb7cccfa700 in ?? ()
#16 0x0000000000000000 in ?? ()

Comment 16 SATHEESARAN 2015-07-08 05:50:52 UTC
Since glusterd crashed again - while 'gluster volume status' & 'gluster volume rebalance status' are obtained in parallel - marking this bug as FailedQA

Comment 17 SATHEESARAN 2015-07-08 05:54:38 UTC
1. This issue was already RCA'ed and patch/fix is available upstream
2. When RHGS nodes are managed using RHEV/RHGS-C, there are high chances of 'rebalance status' and 'volume status' to go concurrent.
3. Even this is a race and rare, the more the volumes and more the rebalance operation happens, higher the chances to hit this bug.

Based on the above reasons, I propose to take this fix/patch for RHGS 3.1

Comment 20 Anand Nekkunti 2015-07-08 13:27:02 UTC
 Run below scripts to hit this crash:

My setup :

Volume Name: VOL1
Type: Distribute
Volume ID: 4c158adc-ebc8-429f-a1fd-f2560b0cc715
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: host:/tmp/BRICK1
Brick2: host3:/tmp/BRICK1
Brick3: host4:/tmp/BRICK1
Options Reconfigured:
performance.readdir-ahead: on
 
Volume Name: VOL2
Type: Distribute
Volume ID: 8db3bd64-328d-42ae-b1a7-52cce220eacd
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: host:/tmp/BRICK2
Brick2: host3:/tmp/BRICK2
Brick3: host4:/tmp/BRICK2
Options Reconfigured:
performance.readdir-ahead: on
 
Volume Name: VOL3
Type: Distribute
Volume ID: 97982012-efdf-41e2-8805-66fb75638ae4
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: host:/tmp/BRICK3
Brick2: host3:/tmp/BRICK3
Brick3: host4:/tmp/BRICK3
Options Reconfigured:
performance.readdir-ahead: on
 
Volume Name: VOL4
Type: Distribute
Volume ID: 67651ed2-36aa-49b0-9a01-9665b845f394
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: host:/tmp/BRICK4
Brick2: host3:/tmp/BRICK4
Brick3: host4:/tmp/BRICK4
Options Reconfigured:
performance.readdir-ahead: on


Node1      : run run1.sh 
Node2 and 2: run run.sh

most of the times i am hitting crash within ~20min

Comment 21 Anand Nekkunti 2015-07-08 13:29:54 UTC
Created attachment 1049875 [details]
test script

Comment 22 Anand Nekkunti 2015-07-08 13:32:26 UTC
Created attachment 1049878 [details]
run.sh

Comment 29 SATHEESARAN 2015-07-14 03:07:03 UTC
Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-9.el6rhs )

1. Managed 6 nodes in the cluster using RHEVM 3.5.4
2. Created 12 distributed volumes and started them
3. Added more bricks to the volume and initiated rebalance from RHEVM
4. After a full day, there were no crashes seen.

Marking this bug as VERIFIED.

Comment 30 Bhavana 2015-07-26 14:00:21 UTC
minor updates to the doc text.

Comment 31 Atin Mukherjee 2015-07-27 04:31:58 UTC
Doc text looks good to me.

Comment 33 errata-xmlrpc 2015-07-29 04:39:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html