Created attachment 1002150 [details] coredump from one of the cluster Description of problem: ----------------------- 2 RH Gluster Storage Nodes are managed using RHEVM. Distributed-replicate volume was created and used to hosting VM Images ( virt-store ). App VMs were created and were running for last 4 days. I observed that the App VMs are paused, due to lack of server-side quorum, which was caused by glusterd crash on one of the RH Gluster Storage Node. Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.6.0.51-1.el6rhs How reproducible: ----------------- Happened once in the testing Steps to Reproduce: -------------------- 1. Manage 2 RH Gluster Storage Nodes 2. Create a distributed-replicate volume and start it. 3. Optimize the volume for virt-store # gluster volume set <vol-name> group virt # gluster volume set <vol-name> storage.owner-uid 36 # gluster volume set <vol-name> storage.owner-gid 36 4. use this volume as the DataStore for RHEVM 5. Create few AppVMs, Install OS, and Run them for 2 or 3 days Actual results: ---------------- In one of the RH Gluster Storage Node, glusterd crashed Expected results: ----------------- 'glusterd' should not crash.
Created attachment 1002151 [details] glusterd log files
Analysis goes like this: Trans 1 - gluster v rebalance <volname> status originated in N1 at T1 Trans 2 - gluster v <volname> status originated in N2 at T2 T1 & T2 have a minimal time gap in milisecs In N2 Trans2 kicks in syncop framework and for Trans1 op-sm is invoked since N2 is the receiver. op-sm & syncop currently uses global opinfo structure to maintain the state of the transaction. In this case for opinfo of Trans2 got overwritten by Trans1 which caused an incorrect state of state machine. Because of this incorrect opinfo the crash was observed. Since this is a race condition, it could happen rarely.
Upstream patch http://review.gluster.org/#/c/9908/ is merged now
*** Bug 1209161 has been marked as a duplicate of this bug. ***
I have tested this bug with RHGS 3.1 Nightly build ( glusterfs-3.7.1-7.el6rhs ) with the following test : 1. Added 6 RHGS nodes to gluster cluster in RHEVM 2. Created 2 distributed-replicate & 6 distributed volumes ad started all 3. Added a few files from fuse mount (15 files of 10G ) 4. Added brick(s) to a volume and triggered rebalance operation from RHEVM 5. RHEVM polls 'gluster volume status' & also 'gluster volume rebalance status' periodically After sometime, I witnessed glusterd crash in one of the node, and after 12 hours, I witnessed 2 more glusterd's crashing. Here is the backtrace : backtrace : (gdb) bt #0 0x00007fb7dace5400 in ?? () #1 0x00007fb7dbc2bd85 in __gf_free (free_ptr=0x7fb7b85e7640) at mem-pool.c:316 #2 0x00007fb7dbbef215 in data_destroy (data=0x7fb7b85f386c) at dict.c:235 #3 0x00007fb7dbbef4be in dict_get_str (this=<value optimized out>, key=<value optimized out>, str=0x7fb7cccf90f0) at dict.c:2213 #4 0x00007fb7d06228e8 in glusterd_volume_rebalance_use_rsp_dict (aggr=<value optimized out>, rsp_dict=0x7fb7c01a078c) at glusterd-utils.c:8000 #5 0x00007fb7d063a4c5 in __glusterd_commit_op_cbk (req=<value optimized out>, iov=0x7fb7de17c1ac, count=<value optimized out>, myframe=0x7fb7d95e7384) at glusterd-rpc-ops.c:1413 #6 0x00007fb7d0637660 in glusterd_big_locked_cbk (req=0x7fb7de17c16c, iov=0x7fb7de17c1ac, count=1, myframe=0x7fb7d95e7384, fn=0x7fb7d0639d80 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:215 #7 0x00007fb7db9c4445 in rpc_clnt_handle_reply (clnt=0x7fb7de13e900, pollin=0x7fb7c0175800) at rpc-clnt.c:766 #8 0x00007fb7db9c58f2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7fb7de13e930, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:894 #9 0x00007fb7db9c0ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #10 0x00007fb7cec84255 in socket_event_poll_in (this=0x7fb7de17f010) at socket.c:2290 #11 0x00007fb7cec85e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb7de17f010, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #12 0x00007fb7dbc59970 in event_dispatch_epoll_handler (data=0x7fb7de0d13b0) at event-epoll.c:575 #13 event_dispatch_epoll_worker (data=0x7fb7de0d13b0) at event-epoll.c:678 #14 0x00007fb7dace0a51 in ?? () #15 0x00007fb7cccfa700 in ?? () #16 0x0000000000000000 in ?? ()
Since glusterd crashed again - while 'gluster volume status' & 'gluster volume rebalance status' are obtained in parallel - marking this bug as FailedQA
1. This issue was already RCA'ed and patch/fix is available upstream 2. When RHGS nodes are managed using RHEV/RHGS-C, there are high chances of 'rebalance status' and 'volume status' to go concurrent. 3. Even this is a race and rare, the more the volumes and more the rebalance operation happens, higher the chances to hit this bug. Based on the above reasons, I propose to take this fix/patch for RHGS 3.1
Run below scripts to hit this crash: My setup : Volume Name: VOL1 Type: Distribute Volume ID: 4c158adc-ebc8-429f-a1fd-f2560b0cc715 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: host:/tmp/BRICK1 Brick2: host3:/tmp/BRICK1 Brick3: host4:/tmp/BRICK1 Options Reconfigured: performance.readdir-ahead: on Volume Name: VOL2 Type: Distribute Volume ID: 8db3bd64-328d-42ae-b1a7-52cce220eacd Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: host:/tmp/BRICK2 Brick2: host3:/tmp/BRICK2 Brick3: host4:/tmp/BRICK2 Options Reconfigured: performance.readdir-ahead: on Volume Name: VOL3 Type: Distribute Volume ID: 97982012-efdf-41e2-8805-66fb75638ae4 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: host:/tmp/BRICK3 Brick2: host3:/tmp/BRICK3 Brick3: host4:/tmp/BRICK3 Options Reconfigured: performance.readdir-ahead: on Volume Name: VOL4 Type: Distribute Volume ID: 67651ed2-36aa-49b0-9a01-9665b845f394 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: host:/tmp/BRICK4 Brick2: host3:/tmp/BRICK4 Brick3: host4:/tmp/BRICK4 Options Reconfigured: performance.readdir-ahead: on Node1 : run run1.sh Node2 and 2: run run.sh most of the times i am hitting crash within ~20min
Created attachment 1049875 [details] test script
Created attachment 1049878 [details] run.sh
Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-9.el6rhs ) 1. Managed 6 nodes in the cluster using RHEVM 3.5.4 2. Created 12 distributed volumes and started them 3. Added more bricks to the volume and initiated rebalance from RHEVM 4. After a full day, there were no crashes seen. Marking this bug as VERIFIED.
minor updates to the doc text.
Doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html