Bug 990125
Summary: | glusterd crash while stopping the volume | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | senaik |
Component: | glusterfs | Assignee: | Kaushal <kaushal> |
Status: | CLOSED ERRATA | QA Contact: | senaik |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 2.1 | CC: | amarts, rhs-bugs, sasundar, surs, vbellur |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0.27rhs-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-09-23 22:29:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
senaik
2013-07-30 12:49:56 UTC
are equal check sum on the mount point failed with error : [root@localhost dir2]# /opt/qa/tools/arequal-checksum /mnt/vol13 ftw (/mnt/vol13) returned -1 (No such file or directory), terminating sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/990125/ From the logs this seems to be a crash caused by the volume stop command. There appears to be race in the cleanup of the rpc transport glusterd uses to connect with the brick, leading to a double free and the crash. proposed fix @ http://review.gluster.org/5512 Downstream fix at https://code.engineering.redhat.com/gerrit/11341 Version : ======== Found this crash while trying to verify this bug . Followed the same steps as mentioned in steps to reproduce (Fuse and NFS mount) [root@junior glusterfs]# service glusterd status glusterd dead but pid file exists ---------------Part of the log--------------------- [2013-08-16 09:48:41.311540] I [glusterd-utils.c:3560:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-08-16 09:48:41.311728] I [glusterd-utils.c:3565:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-08-16 09:48:41.311908] I [glusterd-utils.c:3570:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-08-16 09:48:41.312088] I [glusterd-utils.c:3575:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-08-16 09:48:41.312268] I [glusterd-utils.c:3580:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-08-16 09:48:41.312503] I [glusterd-utils.c:3585:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-08-16 09:48:42.319659] E [glusterd-utils.c:3526:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/0d536d1ec2d14cfee8af0da42b3a6df3.socket error: No such file or directory [2013-08-16 09:48:42.319945] E [glusterd-hooks.c:291:glusterd_hooks_run_hooks] 0-management: Failed to open dir /var/lib/glusterd/hooks/1/stop/post, due to No such file or directory pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-08-16 09:48:42configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.20rhs /lib64/libc.so.6[0x397d232920] /usr/lib64/glusterfs/3.4.0.20rhs/xlator/mgmt/glusterd.so(__glusterd_brick_rpc_notify+0x92)[0x7fb288d9b222] /usr/lib64/glusterfs/3.4.0.20rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb288d8e450] /usr/lib64/libglusterfs.so.0(gf_timer_proc+0xd0)[0x7fb28c827180] /lib64/libpthread.so.0[0x397da07851] /lib64/libc.so.6(clone+0x6d)[0x397d2e890d] --------------------------------------------------------------------- Missed specifying the version in comment 7 : 3.4.0.20rhs-2.el6rhs.x86_64 Version : 3.4.0.20rhs-2.el6rhs.x86_64 ======== Faced glusterd crash again while stopping the volume . Steps : ------ 1)Created a 2x2 distributed replicate volume 2)Create some files on the mount point for i in {100..1000} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3)While file creation is in progress , bring down one brick in the replica pair 4) After file creation is completed , bring back the brick online gluster v start <vol_name> force 5)Execute heal command gluster v heal Vol3 full gluster v heal Vol3 info 6)on the mount point , deleted all files , 7) Stop the volume gluster v stop Vol3 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Connection failed. Please check if gluster daemon is operational. service glusterd status glusterd dead but pid file exists ---------------------Part of Log -------------------------- [2013-08-20 10:16:14.645926] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-08-20 10:16:14.645943] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-08-20 10:16:14.646027] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-08-20 10:16:14.646062] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=2 [2013-08-20 10:16:14.646075] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=2 [2013-08-20 10:16:14.646188] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-08-20 10:16:14configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.20rhs /lib64/libc.so.6[0x3b0ea32920] /usr/lib64/glusterfs/3.4.0.20rhs/xlator/mgmt/glusterd.so(__glusterd_brick_rpc_notify+0x92)[0x7f3861bd0222] /usr/lib64/glusterfs/3.4.0.20rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f3861bc3450] /usr/lib64/libglusterfs.so.0(gf_timer_proc+0xd0)[0x7f386565c180] /lib64/libpthread.so.0[0x3b0f207851] /lib64/libc.so.6(clone+0x6d)[0x3b0eae890d] -------------------------------------------------------------------- sosreports for comment 10 : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/990125/20_Aug_990125/ Version : glusterfs-3.4.0.22rhs-1 ======== Repeated the steps as mentioned in 'Steps to Reproduce' and Comment 10 , did not face glusterd crash . Marking the bug as Verified Followed the same steps as mentioned in the bug and Comment 10 , after which I stopped the volume and deleted it and glusterd crashed which did not occur last time . Moving the bug back to 'Assigned' ------------Part of log------------------ [2013-08-26 08:35:42.537781] E [glusterd-utils.c:1335:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/e010aa1a569ea85f32dd5 9cc65072e7f.socket error: No such file or directory [2013-08-26 08:35:43.683117] E [glusterd-utils.c:3526:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/69da6e6bc4924ccbcf6 33c59b5fe3d25.socket error: Permission denied [2013-08-26 08:35:43.683467] I [glusterd-utils.c:3560:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-08-26 08:35:43.683702] I [glusterd-utils.c:3565:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-08-26 08:35:43.683890] I [glusterd-utils.c:3570:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-08-26 08:35:43.684087] I [glusterd-utils.c:3575:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-08-26 08:35:43.684288] I [glusterd-utils.c:3580:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-08-26 08:35:43.684493] I [glusterd-utils.c:3585:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-08-26 08:35:44.684800] E [glusterd-utils.c:3526:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/0d536d1ec2d14cfee8a f0da42b3a6df3.socket error: No such file or directory [2013-08-26 08:35:44.684998] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /rhs/brick1/a5 on port 49272 pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-08-26 08:35:44configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.22rhs /lib64/libc.so.6[0x397d232920] /usr/lib64/glusterfs/3.4.0.22rhs/xlator/mgmt/glusterd.so(__glusterd_brick_rpc_notify+0x92)[0x7fb5a17d42f2] /usr/lib64/glusterfs/3.4.0.22rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb5a17c7520] /usr/lib64/libglusterfs.so.0(gf_timer_proc+0xd0)[0x7fb5a5261120] /lib64/libpthread.so.0[0x397da07851] /lib64/libc.so.6(clone+0x6d)[0x397d2e890d] ----------------------------------------------------------------------- sos reports at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/990125/990125_26_Aug/ Version : ============ gluster --version glusterfs 3.4.0.30rhs built on Aug 30 2013 08:15:37 Repeated the steps as mentioned in 'Steps to Reproduce' and Comment 10 , did not face glusterd crash . Marking the bug as 'Verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |