Bug 1047747
Summary: | glusterd crashed, after initiating 'remove brick start' | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||||
Component: | glusterfs | Assignee: | Ravishankar N <ravishankar> | ||||||
Status: | CLOSED ERRATA | QA Contact: | senaik | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 2.1 | CC: | grajaiya, vagarwal, vbellur | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | RHGS 2.1.2 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.4.0.54rhs | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1047955 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-02-25 08:13:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1047955 | ||||||||
Attachments: |
|
Description
SATHEESARAN
2014-01-02 06:23:29 UTC
error snip from glusterd log file (/var/log/glusterd/etc-glusterfs-glusterd.vol.log) in 10.70.37.187. where the glusterd got crashed : [2014-01-02 06:57:02.090532] I [rpc-clnt.c:977:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-01-02 06:57:02.090606] I [socket.c:3505:socket_init] 0-management: SSL support is NOT enabled [2014-01-02 06:57:02.090630] I [socket.c:3520:socket_init] 0-management: using system polling thread [2014-01-02 06:57:03.092282] E [glusterd-utils.c:4006:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/c0dfc1a7171d0c097f 48b95e254f0809.socket error: No such file or directory [2014-01-02 06:57:03.100297] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2014-01-02 06:57:03.100376] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2014-01-02 06:57:03.100461] I [rpc-clnt.c:977:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-01-02 06:57:03.100528] I [socket.c:3505:socket_init] 0-management: SSL support is NOT enabled [2014-01-02 06:57:03.100544] I [socket.c:3520:socket_init] 0-management: using system polling thread [2014-01-02 06:57:03.100545] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now [2014-01-02 06:57:03.100790] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2014-01-02 06:57:03.100805] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2014-01-02 06:57:03.101018] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now [2014-01-02 06:57:17.383116] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:57:17.384307] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:57:22.555336] I [glusterd-handler.c:916:__glusterd_handle_cli_deprobe] 0-glusterd: Received CLI deprobe req [2014-01-02 06:57:25.938661] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:57:26.228262] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:57:26.229347] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:58:22.128594] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:58:22.454366] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:58:22.455447] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:06.511463] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:59:06.836698] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:06.837477] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:28.571665] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:59:28.855855] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:28.856770] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:34.040168] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:59:34.329708] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:34.330747] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:39.534002] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-01-02 06:59:39.850115] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:39.850858] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2014-01-02 06:59:52.557460] I [glusterd-brick-ops.c:663:__glusterd_handle_remove_brick] 0-management: Received rem brick req pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-01-02 06:59:52configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.52rhs /lib64/libc.so.6(+0x32960)[0x7fe1494a8960] /usr/lib64/glusterfs/3.4.0.52rhs/xlator/mgmt/glusterd.so(__glusterd_handle_remove_brick+0x78a)[0x7fe145c8230a] /usr/lib64/glusterfs/3.4.0.52rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fe145c1278f] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fe14a4410c2] /lib64/libc.so.6(+0x43bb0)[0x7fe1494b9bb0] --------- (END) Created attachment 844372 [details]
core dump
Core dump that was available in the RHSS Node ( 10.70.37.187 ) where glusterd crashed
Created attachment 844374 [details]
gluster log file
glusterd log file available with the RHSS Node where glusterd crashed
Per triage 1/2, removing from list for corbett remove-brick operation as such doesn't causes glusterd to crash but following the steps as described in comment 0 lead to glusterd crash. So, removing blocker for this bug Crash occurs in the following scenario: 1. Create a dist-rep volume on a trusted storage pool. 2. Add a new node to the pool by peer-probing it. 3. Perform remove-brick operation from this new node. 4. This causes the glusterd in the new node to crash. Downstream patch https://code.engineering.redhat.com/gerrit/17984 Based on https://bugzilla.redhat.com/show_bug.cgi?id=1047747#c7, adding this to back the list for u2 Tested with glusterfs-3.4.0.55rhs-1 with following steps 1. Created trusted storage pool with 2 RHSS Nodes 2. Created a distributed replicate volume with 2X2 3. Started the volume 4. Fuse mounted the volume and started writing few files on to the mount (i.e) mount.glusterfs <RHSS Node>:<vol-name> <mount-point> for i in {1..100}; do dd if=/dev/urandom of=<mount>/file$i bs=1024k count=100;done 5. Added a pair of bricks to make the volume as 3X2 distribute replicate 6. Started rebalance (i.e) gluster volume rebalance <vol-name> start 7. After rebalance has been completed successfully, tried to peer probe a new node (i.e) gluster peer probe <RHSS-Node> 8. Immediately after peer probe returns success, tried to remove a pair of bricks from the newly probed node (i.e) gluster volume remove-brick <vol-name> <brick1> <brick2> start remove brick completed successfully and commiting the removed-bricks also succeeded No glusterd crash was seen Steps to reproduce, provided in comment 0, I was totally unaware that the iprules to block all incoming glusterd traffic was existing and that simulated the steps as described by Ravi in comment 7. Performing the steps mentioned in comment 0, for verification of this bug Tested the following with glusterfs-3.4.0.55rhs-1 Performed the steps as follows, 1. Created a 4 Node Trusted Storage Pool 2. Created a 2 distributed replicate volume. one with 3x2 and other with 2X2 3. Blocked all glusterd traffic from all other nodes. (i.e) iptables -I INPUT 1 -p tcp --dport 24007 -j DROP 4. Removed/Deleted one of the volume after stopping it 5. Flushed iptables rules in the RHSS Node 6. Started remove-brick which includes brick from the node where iptables rules are just flushed 7. Remove brick was successful and no glusterd crashes were found Apart from the following performed the test steps in comment 10, 1. Tested the same scenario, with following operations on the newly probed peer, a. remove-brick b. remove-brick start c. remove-brick commit d. rebalance e. add-brick There is no glusterd crash Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |