Description of problem: ======================= While trying to remove-brick with replica count 2 from the existing volume(replica 2), glusterd crashes with following bt: #0 0x00007fcdd03e681c in subvol_matcher_update (req=0x25989cc) at glusterd-brick-ops.c:662 #1 __glusterd_handle_remove_brick (req=0x25989cc) at glusterd-brick-ops.c:985 #2 0x00007fcdd03542bf in glusterd_big_locked_handler (req=0x25989cc, actor_fn=0x7fcdd03e5f90 <__glusterd_handle_remove_brick>) at glusterd-handler.c:83 #3 0x0000003b0d8655b2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375 #4 0x0000003b028438f0 in ?? () from /lib64/libc.so.6 #5 0x0000000000000000 in ?? () (gdb) Logs suggest: ============= [2015-06-10 14:18:01.134630] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-10 14:18:01.137158] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-10 14:18:28.239515] I [glusterd-brick-ops.c:779:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-06-10 14:18:28.239593] I [glusterd-brick-ops.c:849:__glusterd_handle_remove_brick] 0-management: request to change replica-count to 2 pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-06-10 14:18:28 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.1 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3b0d824b66] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3b0d84359f] /lib64/libc.so.6[0x3b028326a0] /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_remove_brick+0x88c)[0x7fcdd03e681c] /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fcdd03542bf] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3b0d8655b2] /lib64/libc.so.6[0x3b028438f0] --------- (END) Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.1-1.el6rhs.x86_64 How reproducible: ================== Always Steps to Reproduce: =================== 1. Create 2x2 volume 2. Remove 2 bricks, one from each subvolume and use replica count as 2 Actual results: =============== Glusterd crash Expected results: ================= Removing brick with replica count 2 from replica count 2 is a failure case, it should print usage or fail gracefully.
I have tried to reproduce the issue. Its reproducible only with the following case : 1. Created 2X3 distributed-replicate volume 2. Shrink it to 2X2 distributed-replicate volume 3. Shrink it to 2X2 to 2X1 distribute volume Here are few more observations : 1. There is no crash observed when creating a 2X2 volume and shrinking it to 2X1 2. There is no crash observed when creating a 2X3 volume and shrinking it to 2X2 3. There is no crash observed when trying to remove each brick from all replica sets and proper error message is thrown
Upstream patch http://review.gluster.org/#/c/11165 is in review
Marking this bug as BLOCKER, as this required for RHGS 3.1 ( Everglades )
Verified with RHGS 3.1 Nightly build - glusterfs-3.7.1-6.el6rhs with the steps mentioned in comment4. There were no issues and marking this bug as VERIFIED
Hi Gaurav, The doc text is updated. Please review the same and share your technical review comments. If it looks ok, then sign-off on the same. Regards, Bhavana
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html