+++ This bug was initially created as a clone of Bug #1230101 +++ Description of problem: ======================= While trying to remove-brick with replica count 2 from the existing volume(replica 2), glusterd crashes with following bt: #0 0x00007fcdd03e681c in subvol_matcher_update (req=0x25989cc) at glusterd-brick-ops.c:662 #1 __glusterd_handle_remove_brick (req=0x25989cc) at glusterd-brick-ops.c:985 #2 0x00007fcdd03542bf in glusterd_big_locked_handler (req=0x25989cc, actor_fn=0x7fcdd03e5f90 <__glusterd_handle_remove_brick>) at glusterd-handler.c:83 #3 0x0000003b0d8655b2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375 #4 0x0000003b028438f0 in ?? () from /lib64/libc.so.6 #5 0x0000000000000000 in ?? () (gdb) Logs suggest: ============= [2015-06-10 14:18:01.134630] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-10 14:18:01.137158] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-10 14:18:28.239515] I [glusterd-brick-ops.c:779:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-06-10 14:18:28.239593] I [glusterd-brick-ops.c:849:__glusterd_handle_remove_brick] 0-management: request to change replica-count to 2 pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-06-10 14:18:28 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.1 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3b0d824b66] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3b0d84359f] /lib64/libc.so.6[0x3b028326a0] /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_remove_brick+0x88c)[0x7fcdd03e681c] /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fcdd03542bf] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3b0d8655b2] /lib64/libc.so.6[0x3b028438f0] --------- (END) How reproducible: ================== Always Steps to Reproduce: =================== 1. Create 2X3 distributed-replicate volume 2. Start the volume 3. Shrink it to 2X2 distributed-replicate volume by explicitly mentioning replica 2 in 'remove-brick force' 4. Shrink the volume again to 2X1 distribute volume by explicitly mentioning replica 1 in 'remove-brick force' Actual results: =============== Glusterd crash Expected results: ================= Removing brick with replica count 2 from replica count 2 is a failure case, it should print usage or fail gracefully. --- Additional comment from Red Hat Bugzilla Rules Engine on 2015-06-10 05:05:33 EDT --- This bug is automatically being proposed for Red Hat Gluster Storage 3.1.0 by setting the release flag 'rhgs‑3.1.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Rahul Hinduja on 2015-06-10 05:07:22 EDT --- [root@georep1 scripts]# gluster volume info Volume Name: master Type: Distributed-Replicate Volume ID: 7156c64c-a44b-40a4-98db-247a06d1f41e Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.46.96:/rhs/brick1/b1 Brick2: 10.70.46.97:/rhs/brick1/b1 Brick3: 10.70.46.96:/rhs/brick2/b2 Brick4: 10.70.46.97:/rhs/brick2/b2 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on [root@georep1 scripts]# gluster volume remove-brick master replica 2 10.70.46.97:/rhs/brick1/b1 10.70.46.97:/rhs/brick2/b2 start Connection failed. Please check if gluster daemon is operational. [root@georep1 scripts]# --- Additional comment from Rahul Hinduja on 2015-06-10 05:15:20 EDT --- sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1230101/ Additional Info: this volume is part of geo-rep master cluster. --- Additional comment from SATHEESARAN on 2015-06-10 05:44:58 EDT --- I have tried to reproduce the issue. Its reproducible only with the following case : 1. Created 2X3 distributed-replicate volume 2. Shrink it to 2X2 distributed-replicate volume 3. Shrink it to 2X2 to 2X1 distribute volume Here are few more observations : 1. There is no crash observed when creating a 2X2 volume and shrinking it to 2X1 2. There is no crash observed when creating a 2X3 volume and shrinking it to 2X2 3. There is no crash observed when trying to remove each brick from all replica sets and proper error message is thrown
REVIEW: http://review.gluster.org/11165 (glusterd: subvol_count value for replicate volume should be calculate correctly) posted (#1) for review on master by Gaurav Kumar Garg (ggarg)
REVIEW: http://review.gluster.org/11165 (glusterd: subvol_count value for replicate volume should be calculate correctly) posted (#2) for review on master by Gaurav Kumar Garg (ggarg)
REVIEW: http://review.gluster.org/11165 (glusterd: subvol_count value for replicate volume should be calculate correctly) posted (#4) for review on master by Gaurav Kumar Garg (ggarg)
COMMIT: http://review.gluster.org/11165 committed in master by Krishnan Parthasarathi (kparthas) ------ commit 3fb18451311c34aeced1054472b6f81fc13dd679 Author: Gaurav Kumar Garg <ggarg> Date: Wed Jun 10 15:11:39 2015 +0530 glusterd: subvol_count value for replicate volume should be calculate correctly glusterd was crashing while trying to remove bricks from replica set after shrinking nx3 replica to nx2 replica to nx1 replica. This is because volinfo->subvol_count is calculating value from old replica count value. Change-Id: I1084a71e29c9cfa1cd85bdb4e82b943b1dc44372 BUG: 1230121 Signed-off-by: Gaurav Kumar Garg <ggarg> Reviewed-on: http://review.gluster.org/11165 Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Ravishankar N <ravishankar> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Krishnan Parthasarathi <kparthas>
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user