Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1230101 - [glusterd] glusterd crashed while trying to remove a bricks - one selected from each replica set - after shrinking nX3 to nX2 to nX1
[glusterd] glusterd crashed while trying to remove a bricks - one selected fr...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.1
x86_64 Linux
high Severity high
: ---
: RHGS 3.1.0
Assigned To: Gaurav Kumar Garg
SATHEESARAN
glusterd
: Patch, Triaged
Depends On:
Blocks: 1202842 1230121 1231646
  Show dependency treegraph
 
Reported: 2015-06-10 05:05 EDT by Rahul Hinduja
Modified: 2016-06-05 19:38 EDT (History)
10 users (show)

See Also:
Fixed In Version: glusterfs-3.7.1-4
Doc Type: Bug Fix
Doc Text:
Previously, glusterd crashed when performing a remove brick operation on a replicate volume after shrinking the volume from replica nx3 to nx2 and from nx2 to nx1. This was due to an issue with the subvol count (replica set) calculation. With this fix glusterd does not crash after shrinking the replicate volume from replica nx3 to nx2 and from nx2 to nx1.
Story Points: ---
Clone Of:
: 1230121 (view as bug list)
Environment:
Last Closed: 2015-07-29 01:00:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 04:26:26 EDT

  None (edit)
Description Rahul Hinduja 2015-06-10 05:05:32 EDT
Description of problem:
=======================

While trying to remove-brick with replica count 2 from the existing volume(replica 2), glusterd crashes with following bt:

#0  0x00007fcdd03e681c in subvol_matcher_update	(req=0x25989cc)	at glusterd-brick-ops.c:662
#1  __glusterd_handle_remove_brick (req=0x25989cc) at glusterd-brick-ops.c:985
#2  0x00007fcdd03542bf in glusterd_big_locked_handler (req=0x25989cc, actor_fn=0x7fcdd03e5f90 <__glusterd_handle_remove_brick>)	at glusterd-handler.c:83
#3  0x0000003b0d8655b2 in synctask_wrap	(old_task=<value optimized out>) at syncop.c:375
#4  0x0000003b028438f0 in ?? ()	from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
(gdb) 

Logs suggest:
=============

[2015-06-10 14:18:01.134630] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-06-10 14:18:01.137158] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-06-10 14:18:28.239515] I [glusterd-brick-ops.c:779:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-06-10 14:18:28.239593] I [glusterd-brick-ops.c:849:__glusterd_handle_remove_brick] 0-management: request to change replica-count to 2
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-06-10 14:18:28
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.1
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3b0d824b66]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3b0d84359f]
/lib64/libc.so.6[0x3b028326a0]
/usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_remove_brick+0x88c)[0x7fcdd03e681c]
/usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fcdd03542bf]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3b0d8655b2]
/lib64/libc.so.6[0x3b028438f0]
---------
(END) 



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-1.el6rhs.x86_64


How reproducible:
==================

Always


Steps to Reproduce:
===================
1. Create 2x2 volume
2. Remove 2 bricks, one from each subvolume and use replica count as 2


Actual results:
===============

Glusterd crash


Expected results:
=================

Removing brick with replica count 2 from replica count 2 is a failure case, it should print usage or fail gracefully.
Comment 4 SATHEESARAN 2015-06-10 05:44:58 EDT
I have tried to reproduce the issue.

Its reproducible only with the following case :

1. Created 2X3 distributed-replicate volume
2. Shrink it to 2X2 distributed-replicate volume
3. Shrink it to 2X2 to 2X1 distribute volume

Here are few more observations :
1. There is no crash observed when creating a 2X2 volume and shrinking it to 2X1
2. There is no crash observed when creating a 2X3 volume and shrinking it to 2X2
3. There is no crash observed when trying to remove each brick from all replica sets and proper error message is thrown
Comment 5 Atin Mukherjee 2015-06-11 03:00:34 EDT
Upstream patch http://review.gluster.org/#/c/11165 is in review
Comment 7 SATHEESARAN 2015-06-12 02:35:47 EDT
Marking this bug as BLOCKER, as this required for RHGS 3.1 ( Everglades )
Comment 13 SATHEESARAN 2015-06-29 13:37:45 EDT
Verified with RHGS 3.1 Nightly build - glusterfs-3.7.1-6.el6rhs with the steps mentioned in comment4.

There were no issues and marking this bug as VERIFIED
Comment 14 Bhavana 2015-07-15 02:16:51 EDT
Hi Gaurav,

The doc text is updated. Please review the same and share your technical review comments. If it looks ok, then sign-off on the same.

Regards,
Bhavana
Comment 15 errata-xmlrpc 2015-07-29 01:00:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.