Description of problem: On a brick-mux enabled setup. Seen a brick crash, Having a base volume 2X3 and while creating and deleting two volumes in a loop. Version-Release number of selected component (if applicable): 3.12.2-8 How reproducible: 1/1 Steps to Reproduce: 1. On a three node cluster, with brick mux enabled and bricks-per-process to default. 2. Created a 2X3 volume and started it. 3. Have a script to create two volumes (pikachu_1 and pikachu_2), start, stop and delete in a continuous loop. Actual results: Seen a brick crash on one of the node and core generated. Expected results: No crash should be seen Additional info: [root@dhcp37-107 ~]# gluster vol info Volume Name: deadpool Type: Distributed-Replicate Volume ID: 9a2be3bc-139c-4037-9ebe-8204614b5d65 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick0/deadpool_1 Brick2: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick0/deadpool_1 Brick3: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick0/deadpool_1 Brick4: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick1/deadpool_1 Brick5: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick1/deadpool_1 Brick6: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/deadpool_1 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable cluster.max-bricks-per-process: 0 Volume Name: pikachu_1 Type: Distributed-Replicate Volume ID: 83fa7d64-b1b6-40be-8d38-cd22faec821f Status: Stopped Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick0/testvol_1 Brick2: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick0/testvol_1 Brick3: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick0/testvol_1 Brick4: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick1/testvol_1 Brick5: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick1/testvol_1 Brick6: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/testvol_1 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable cluster.max-bricks-per-process: 0 Volume Name: pikachu_2 Type: Distributed-Replicate Volume ID: c3bbc872-757c-4c50-b86b-c686be8ee6f6 Status: Stopped Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick0/testvol_2 Brick2: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick0/testvol_2 Brick3: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick0/testvol_2 Brick4: dhcp37-107.lab.eng.blr.redhat.com:/bricks/brick1/testvol_2 Brick5: dhcp37-102.lab.eng.blr.redhat.com:/bricks/brick1/testvol_2 Brick6: dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/testvol_2 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable cluster.max-bricks-per-process: 0 Here is the bt of the core file bt #0 0x00007f656d2b4de7 in __inode_get_xl_index (xlator=0x7f6558029ff0, inode=0x7f64e00024a0) at inode.c:455 #1 __inode_unref (inode=inode@entry=0x7f64e00024a0) at inode.c:489 #2 0x00007f656d2b5641 in inode_unref (inode=0x7f64e00024a0) at inode.c:559 #3 0x00007f656d2cb533 in fd_destroy (bound=_gf_true, fd=0x7f6504005dd0) at fd.c:532 #4 fd_unref (fd=0x7f6504005dd0) at fd.c:569 #5 0x00007f655c4b00d9 in free_state (state=0x7f6504008580) at server-helpers.c:185 #6 0x00007f655c4ab5fa in server_submit_reply (frame=frame@entry=0x7f6504002370, req=0x7f65580afd30, arg=arg@entry=0x7f650effc910, payload=payload@entry=0x0, payloadcount=payloadcount@entry=0, iobref=0x7f65040015e0, iobref@entry=0x0, xdrproc=0x7f656ce4f6b0 <xdr_gfs3_opendir_rsp>) at server.c:212 #7 0x00007f655c4bfd54 in server_opendir_cbk (frame=frame@entry=0x7f6504002370, cookie=<optimized out>, this=0x7f6558029ff0, op_ret=op_ret@entry=0, op_errno=op_errno@entry=0, fd=fd@entry=0x7f6504005dd0, xdata=xdata@entry=0x0) at server-rpc-fops.c:710 #8 0x00007f655c91f111 in io_stats_opendir_cbk (frame=0x7f6504009650, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, fd=0x7f6504005dd0, xdata=0x0) at io-stats.c:2315 #9 0x00007f655cd6019d in index_opendir (frame=frame@entry=0x7f6504004920, this=this@entry=0x7f652c09b920, loc=loc@entry=0x7f6504008598, fd=fd@entry=0x7f6504005dd0, xdata=xdata@entry=0x0) at index.c:2113 #10 0x00007f656d3262bb in default_opendir (frame=0x7f6504004920, this=<optimized out>, loc=0x7f6504008598, fd=0x7f6504005dd0, ---Type <return> to continue, or q <return> to quit--- xdata=0x0) at defaults.c:2956 #11 0x00007f655c90e1bb in io_stats_opendir (frame=frame@entry=0x7f6504009650, this=this@entry=0x7f652c09e190, loc=loc@entry=0x7f6504008598, fd=fd@entry=0x7f6504005dd0, xdata=xdata@entry=0x0) at io-stats.c:3311 #12 0x00007f656d3262bb in default_opendir (frame=0x7f6504009650, this=<optimized out>, loc=0x7f6504008598, fd=0x7f6504005dd0, xdata=0x0) at defaults.c:2956 #13 0x00007f655c4c7ff2 in server_opendir_resume (frame=0x7f6504002370, bound_xl=0x7f652c09f7a0) at server-rpc-fops.c:2672 #14 0x00007f655c4aec99 in server_resolve_done (frame=0x7f6504002370) at server-resolve.c:587 #15 0x00007f655c4aed3d in server_resolve_all (frame=frame@entry=0x7f6504002370) at server-resolve.c:622 #16 0x00007f655c4af755 in server_resolve (frame=0x7f6504002370) at server-resolve.c:571 #17 0x00007f655c4aed7e in server_resolve_all (frame=frame@entry=0x7f6504002370) at server-resolve.c:618 #18 0x00007f655c4af4eb in server_resolve_inode (frame=frame@entry=0x7f6504002370) at server-resolve.c:425 #19 0x00007f655c4af780 in server_resolve (frame=0x7f6504002370) at server-resolve.c:559 #20 0x00007f655c4aed5e in server_resolve_all (frame=frame@entry=0x7f6504002370) at server-resolve.c:611 #21 0x00007f655c4af814 in resolve_and_resume (frame=frame@entry=0x7f6504002370, fn=fn@entry=0x7f655c4c7e00 <server_opendir_resume>) at server-resolve.c:642 #22 0x00007f655c4c97c1 in server3_3_opendir (req=<optimized out>) at server-rpc-fops.c:4938 ---Type <return> to continue, or q <return> to quit--- #23 0x00007f656d06666e in rpcsvc_request_handler (arg=0x7f655803f8f0) at rpcsvc.c:1909 #24 0x00007f656c103dd5 in start_thread () from /lib64/libpthread.so.0 #25 0x00007f656b9ccb3d in clone () from /lib64/libc.so.6 bt full (gdb) bt full #0 0x00007f79cb82bde7 in __inode_get_xl_index (xlator=0x7f79b8029ff0, inode=0x7f795c002370) at inode.c:455 set_idx = -1 #1 __inode_unref (inode=inode@entry=0x7f795c002370) at inode.c:489 index = 0 this = 0x7f79b8029ff0 __FUNCTION__ = "__inode_unref" #2 0x00007f79cb82c641 in inode_unref (inode=0x7f795c002370) at inode.c:559 table = 0x7f79b80b3890 #3 0x00007f79cb842533 in fd_destroy (bound=_gf_true, fd=0x7f793c002930) at fd.c:532 xl = <optimized out> i = <optimized out> old_THIS = <optimized out> #4 fd_unref (fd=0x7f793c002930) at fd.c:569 refcount = <optimized out> bound = _gf_true __FUNCTION__ = "fd_unref" #5 0x00007f79b68d30d9 in free_state (state=0x7f793c0013d0) at server-helpers.c:185 No locals. #6 0x00007f79b68ce5fa in server_submit_reply (frame=frame@entry=0x7f793c0025b0, req=0x7f79680018e0, arg=arg@entry=0x7f79427fb910, payload=payload@entry=0x0, payloadcount=payloadcount@entry=0, iobref=0x7f793c005c70, iobref@entry=0x0, xdrproc=0x7f79cb3c66b0 <xdr_gfs3_opendir_rsp>) at server.c:212 iob = <optimized out> ret = -1 rsp = {iov_base = 0x7f79cbd00d00, iov_len = 20} state = 0x7f793c0013d0 new_iobref = 1 '\001' client = 0x7f79381448b0 lk_heal = _gf_false __FUNCTION__ = "server_submit_reply" #7 0x00007f79b68e2d54 in server_opendir_cbk (frame=frame@entry=0x7f793c0025b0, cookie=<optimized out>, this=0x7f79b8029ff0, op_ret=op_ret@entry=0, op_errno=op_errno@entry=0, fd=fd@entry=0x7f793c002930, xdata=xdata@entry=0x0) at server-rpc-fops.c:710 state = <optimized out> req = <optimized out> rsp = {op_ret = 0, op_errno = 0, fd = 0, xdata = {xdata_len = 0, xdata_val = 0x0}} __FUNCTION__ = "server_opendir_cbk" ---Type <return> to continue, or q <return> to quit--- #8 0x00007f79b6d42111 in io_stats_opendir_cbk (frame=0x7f793c0012a0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, fd=0x7f793c002930, xdata=0x0) at io-stats.c:2315 fn = 0x7f79b68e2c80 <server_opendir_cbk> _parent = 0x7f793c0025b0 old_THIS = 0x7f79b8026cc0 iosstat = 0x0 ret = <optimized out> __FUNCTION__ = "io_stats_opendir_cbk" #9 0x00007f79b718319d in index_opendir (frame=frame@entry=0x7f793c002280, this=this@entry=0x7f79b80235e0, loc=loc@entry=0x7f793c0013e8, fd=fd@entry=0x7f793c002930, xdata=xdata@entry=0x0) at index.c:2113 fn = 0x7f79b6d41f20 <io_stats_opendir_cbk> _parent = 0x7f793c0012a0 old_THIS = 0x7f79b80235e0 __FUNCTION__ = "index_opendir" #10 0x00007f79cb89d2bb in default_opendir (frame=0x7f793c002280, this=<optimized out>, loc=0x7f793c0013e8, fd=0x7f793c002930, xdata=0x0) at defaults.c:2956 old_THIS = 0x7f79b80251e0 next_xl = 0x7f79b80235e0 next_xl_fn = 0x7f79b7183040 <index_opendir> __FUNCTION__ = "default_opendir" #11 0x00007f79b6d311bb in io_stats_opendir (frame=frame@entry=0x7f793c0012a0, this=this@entry=0x7f79b8026cc0, loc=loc@entry=0x7f793c0013e8, fd=fd@entry=0x7f793c002930, xdata=xdata@entry=0x0) at io-stats.c:3311 _new = 0x7f793c002280 old_THIS = 0x7f79b8026cc0 tmp_cbk = 0x7f79b6d41f20 <io_stats_opendir_cbk> __FUNCTION__ = "io_stats_opendir" #12 0x00007f79cb89d2bb in default_opendir (frame=0x7f793c0012a0, this=<optimized out>, loc=0x7f793c0013e8, fd=0x7f793c002930, xdata=0x0) at defaults.c:2956 old_THIS = 0x7f79b80289e0 next_xl = 0x7f79b8026cc0 next_xl_fn = 0x7f79b6d30fb0 <io_stats_opendir> __FUNCTION__ = "default_opendir" #13 0x00007f79b68eaff2 in server_opendir_resume (frame=0x7f793c0025b0, bound_xl=0x7f79b80289e0) at server-rpc-fops.c:2672 _new = 0x7f793c0012a0 old_THIS = 0x7f79b8029ff0 ---Type <return> to continue, or q <return> to quit--- tmp_cbk = 0x7f79b68e2c80 <server_opendir_cbk> state = 0x7f793c0013d0 __FUNCTION__ = "server_opendir_resume" #14 0x00007f79b68d1c99 in server_resolve_done (frame=0x7f793c0025b0) at server-resolve.c:587 state = 0x7f793c0013d0 #15 0x00007f79b68d1d3d in server_resolve_all (frame=frame@entry=0x7f793c0025b0) at server-resolve.c:622 state = <optimized out> this = <optimized out> __FUNCTION__ = "server_resolve_all" #16 0x00007f79b68d2755 in server_resolve (frame=0x7f793c0025b0) at server-resolve.c:571 state = 0x7f793c0013d0 resolve = 0x7f793c0014f0 __FUNCTION__ = "server_resolve" #17 0x00007f79b68d1d7e in server_resolve_all (frame=frame@entry=0x7f793c0025b0) at server-resolve.c:618 state = <optimized out> this = <optimized out> __FUNCTION__ = "server_resolve_all" #18 0x00007f79b68d24eb in server_resolve_inode (frame=frame@entry=0x7f793c0025b0) at server-resolve.c:425 state = <optimized out> ret = <optimized out> loc = 0x7f793c0013e8 #19 0x00007f79b68d2780 in server_resolve (frame=0x7f793c0025b0) at server-resolve.c:559 state = 0x7f793c0013d0 resolve = 0x7f793c001468 __FUNCTION__ = "server_resolve" #20 0x00007f79b68d1d5e in server_resolve_all (frame=frame@entry=0x7f793c0025b0) at server-resolve.c:611 state = <optimized out> this = <optimized out> __FUNCTION__ = "server_resolve_all" #21 0x00007f79b68d2814 in resolve_and_resume (frame=frame@entry=0x7f793c0025b0, fn=fn@entry=0x7f79b68eae00 <server_opendir_resume>) at server-resolve.c:642 state = <optimized out> #22 0x00007f79b68ec7c1 in server3_3_opendir (req=<optimized out>) at server-rpc-fops.c:4938 state = 0x7f793c0013d0 frame = 0x7f793c0025b0 ---Type <return> to continue, or q <return> to quit--- args = {gfid = "\017\257\263\226\250\306El\243\215r\b\251\034\331\377", xdata = {xdata_len = 0, xdata_val = 0x0}} ret = 0 op_errno = 0 __FUNCTION__ = "server3_3_opendir" #23 0x00007f79cb5dd66e in rpcsvc_request_handler (arg=0x7f79b803f9b0) at rpcsvc.c:1909 program = 0x7f79b803f9b0 req = 0x7f79680018e0 actor = <optimized out> done = _gf_false ret = <optimized out> __FUNCTION__ = "rpcsvc_request_handler" #24 0x00007f79ca67add5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #25 0x00007f79c9f43b3d in clone () from /lib64/libc.so.6 No symbol table info available.
Build: 3.12.2-10 Followed the steps mentioned in the description. On a brick mux setup, Had a base volume of replicate(2X3) then used the below script use to verify the bug host1=hostname1 host2=hostname2 host3=hostname3 count=1 while true do for i in {1..2} do gluster vol create pikachu_$i replica 3 $host1:/bricks/brick0/testvol_$i $host2:/bricks/brick0/testvol_$i $host3:/bricks/brick0/testvol_$i $host1:/bricks/brick1/testvol_$i $host2:/bricks/brick1/testvol_$i $host3:/bricks/brick1/testvol_$i gluster vol start pikachu_$i done sleep 3 for i in {1..2} do gluster vol stop pikachu_$i --mode=script sleep 3 gluster vol delete pikachu_$i --mode=script python delete_dirs.py # which delete the brick directories done count=$((count+1)) if ls /tmp/cores/core* 1> /dev/null 2>&1; then exit; fi done Haven't seen any brick crash Hence marking it as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607