Bug 1246432

Summary: ./tests/basic/volume-snapshot.t spurious fail causing glusterd crash.
Product: [Community] GlusterFS Reporter: Anand Nekkunti <anekkunt>
Component: testsAssignee: Anand Nekkunti <anekkunt>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, gluster-bugs, nsathyan
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1247917 (view as bug list) Environment:
Last Closed: 2016-06-16 13:26:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1247917    

Description Anand Nekkunti 2015-07-24 10:17:16 UTC
(gdb) bt 
#0  0x00007f4078ca4e2c in vfprintf () from ./lib64/libc.so.6
#1  0x00007f4078ccc752 in vsnprintf () from ./lib64/libc.so.6
#2  0x00007f4078cac223 in snprintf () from ./lib64/libc.so.6
#3  0x00007f406f582e19 in glusterd_volume_stop_glusterfs (volinfo=0x1e93d90, brickinfo=0x1e9fdc0, del_brick=_gf_false) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-utils.c:1754
#4  0x00007f406f58fda4 in glusterd_brick_stop (volinfo=0x1e93d90, brickinfo=0x1e9fdc0, del_brick=_gf_false) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-utils.c:5458
#5  0x00007f406f61a84c in glusterd_snap_volume_remove (rsp_dict=0x7f405800100c, snap_vol=0x1e93d90, remove_lvm=_gf_false, force=_gf_false)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot.c:2897
#6  0x00007f406f61adf7 in glusterd_snap_remove (rsp_dict=0x7f405800100c, snap=0x1e8bab0, remove_lvm=_gf_false, force=_gf_false)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot.c:3005
#7  0x00007f406f646ea0 in glusterd_compare_and_update_snap (peer_data=0x7f405800176c, snap_count=2, peername=0x7f40580015e0 "127.1.1.3", peerid=0x7f4058001650 "8\370\365\253\313\vN7\226\067\246\020\212\211'W\340\025")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c:1849
#8  0x00007f406f647167 in glusterd_compare_friend_snapshots (peer_data=0x7f405800176c, peername=0x7f40580015e0 "127.1.1.3", peerid=0x7f4058001650 "8\370\365\253\313\vN7\226\067\246\020\212\211'W\340\025")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c:1904
#9  0x00007f406f5689f3 in glusterd_ac_handle_friend_add_req (event=0x7f4058001640, ctx=0x7f40580016d0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-sm.c:831
#10 0x00007f406f569290 in glusterd_friend_sm () at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-sm.c:1253
#11 0x00007f406f55ee14 in __glusterd_handle_incoming_friend_req (req=0x7f405800511c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:2541
#12 0x00007f406f5576ea in glusterd_big_locked_handler (req=0x7f405800511c, actor_fn=0x7f406f55ec78 <__glusterd_handle_incoming_friend_req>)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:79
#13 0x00007f406f55ee4a in glusterd_handle_incoming_friend_req (req=0x7f405800511c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:2551
#14 0x00007f4079ebb06d in rpcsvc_handle_rpc_call (svc=0x1e1e430, trans=0x7f4058004570, msg=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:699
#15 0x00007f4079ebb3e0 in rpcsvc_notify (trans=0x7f4058004570, mydata=0x1e1e430, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:793
#16 0x00007f4079ec0aeb in rpc_transport_notify (this=0x7f4058004570, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:538
#17 0x00007f406dbe587b in socket_event_poll_in (this=0x7f4058004570) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2285
#18 0x00007f406dbe5dd1 in socket_event_handler (fd=16, idx=7, data=0x7f4058004570, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2398
#19 0x00007f407a1749f0 in event_dispatch_epoll_handler (event_pool=0x1e04c90, event=0x7f4063ffee70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:570
#20 0x00007f407a174dde in event_dispatch_epoll_worker (data=0x1eb4610) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:673
#21 0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#22 0x00007f4078d458fd in clone () from ./lib64/libc.so.6
(gdb) t a a bt

Thread 9 (LWP 2819):
#0  0x00007f40793e23f5 in __lll_unlock_wake () from ./lib64/libpthread.so.0
#1  0x00007f40793de877 in _L_unlock_657 () from ./lib64/libpthread.so.0
#2  0x00007f40793de7df in pthread_mutex_unlock () from ./lib64/libpthread.so.0
#3  0x00007f407a1541c0 in synclock_unlock (lock=0x7f407a4637d8) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:1069
#4  0x00007f406f5576ff in glusterd_big_locked_handler (req=0x7f405c00093c, actor_fn=0x7f406f55ec78 <__glusterd_handle_incoming_friend_req>)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:80
#5  0x00007f406f55ee4a in glusterd_handle_incoming_friend_req (req=0x7f405c00093c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:2551
#6  0x00007f4079ebb06d in rpcsvc_handle_rpc_call (svc=0x1e1e430, trans=0x7f4058006b70, msg=0x7f405c005fb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:699
#7  0x00007f4079ebb3e0 in rpcsvc_notify (trans=0x7f4058006b70, mydata=0x1e1e430, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f405c005fb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:793
#8  0x00007f4079ec0aeb in rpc_transport_notify (this=0x7f4058006b70, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f405c005fb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:538
#9  0x00007f406dbe587b in socket_event_poll_in (this=0x7f4058006b70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2285
#10 0x00007f406dbe5dd1 in socket_event_handler (fd=20, idx=4, data=0x7f4058006b70, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2398
#11 0x00007f407a1749f0 in event_dispatch_epoll_handler (event_pool=0x1e04c90, event=0x7f406b25ae70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:570
#12 0x00007f407a174dde in event_dispatch_epoll_worker (data=0x1e0f7f0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:673
#13 0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#14 0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 8 (LWP 2818):
#0  0x00007f4078d45ef3 in epoll_wait () from ./lib64/libc.so.6
#1  0x00007f407a174dac in event_dispatch_epoll_worker (data=0x1eb3e50) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:663
#2  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#3  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 7 (LWP 2686):
#0  0x00007f40793df98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0
#1  0x00007f407a1532af in syncenv_task (proc=0x1e0bb20) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:603
#2  0x00007f407a153556 in syncenv_processor (thdata=0x1e0bb20) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:695
#3  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#4  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 6 (LWP 2684):
#0  0x00007f40793e34b5 in sigwait () from ./lib64/libpthread.so.0
#1  0x0000000000409705 in glusterfs_sigwaiter (arg=0x7ffedf2effa0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:1984
#2  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#3  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 5 (LWP 2683):
#0  0x00007f40793e2f3d in nanosleep () from ./lib64/libpthread.so.0
#1  0x00007f407a122b80 in gf_timer_proc (ctx=0x1de6010) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/timer.c:200
#2  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#3  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 4 (LWP 2815):
#0  0x00007f40793df5bc in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0
#1  0x00007f406f60c3ed in hooks_worker (args=0x1e13cb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-hooks.c:529
#2  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#3  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 3 (LWP 2685):
#0  0x00007f40793df98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0
#1  0x00007f407a1532af in syncenv_task (proc=0x1e0b760) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:603
#2  0x00007f407a153556 in syncenv_processor (thdata=0x1e0b760) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:695
#3  0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#4  0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Thread 2 (LWP 2682):
#0  0x00007f40793dc22d in pthread_join () from ./lib64/libpthread.so.0
#1  0x00007f407a175006 in event_dispatch_epoll (event_pool=0x1e04c90) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:757
#2  0x00007f407a13da06 in event_dispatch (event_pool=0x1e04c90) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event.c:123
#3  0x000000000040a272 in main (argc=9, argv=0x7ffedf2f1208) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:2328

Thread 1 (LWP 2816):
#0  0x00007f4078ca4e2c in vfprintf () from ./lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1  0x00007f4078ccc752 in vsnprintf () from ./lib64/libc.so.6
#2  0x00007f4078cac223 in snprintf () from ./lib64/libc.so.6
#3  0x00007f406f582e19 in glusterd_volume_stop_glusterfs (volinfo=0x1e93d90, brickinfo=0x1e9fdc0, del_brick=_gf_false) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-utils.c:1754
#4  0x00007f406f58fda4 in glusterd_brick_stop (volinfo=0x1e93d90, brickinfo=0x1e9fdc0, del_brick=_gf_false) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-utils.c:5458
#5  0x00007f406f61a84c in glusterd_snap_volume_remove (rsp_dict=0x7f405800100c, snap_vol=0x1e93d90, remove_lvm=_gf_false, force=_gf_false)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot.c:2897
#6  0x00007f406f61adf7 in glusterd_snap_remove (rsp_dict=0x7f405800100c, snap=0x1e8bab0, remove_lvm=_gf_false, force=_gf_false)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot.c:3005
#7  0x00007f406f646ea0 in glusterd_compare_and_update_snap (peer_data=0x7f405800176c, snap_count=2, peername=0x7f40580015e0 "127.1.1.3", peerid=0x7f4058001650 "8\370\365\253\313\vN7\226\067\246\020\212\211'W\340\025")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c:1849
#8  0x00007f406f647167 in glusterd_compare_friend_snapshots (peer_data=0x7f405800176c, peername=0x7f40580015e0 "127.1.1.3", peerid=0x7f4058001650 "8\370\365\253\313\vN7\226\067\246\020\212\211'W\340\025")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c:1904
#9  0x00007f406f5689f3 in glusterd_ac_handle_friend_add_req (event=0x7f4058001640, ctx=0x7f40580016d0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-sm.c:831
#10 0x00007f406f569290 in glusterd_friend_sm () at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-sm.c:1253
#11 0x00007f406f55ee14 in __glusterd_handle_incoming_friend_req (req=0x7f405800511c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:2541
#12 0x00007f406f5576ea in glusterd_big_locked_handler (req=0x7f405800511c, actor_fn=0x7f406f55ec78 <__glusterd_handle_incoming_friend_req>)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:79
#13 0x00007f406f55ee4a in glusterd_handle_incoming_friend_req (req=0x7f405800511c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/mgmt/glusterd/src/glusterd-handler.c:2551
#14 0x00007f4079ebb06d in rpcsvc_handle_rpc_call (svc=0x1e1e430, trans=0x7f4058004570, msg=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:699
#15 0x00007f4079ebb3e0 in rpcsvc_notify (trans=0x7f4058004570, mydata=0x1e1e430, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:793
#16 0x00007f4079ec0aeb in rpc_transport_notify (this=0x7f4058004570, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4058001140) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:538
#17 0x00007f406dbe587b in socket_event_poll_in (this=0x7f4058004570) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2285
#18 0x00007f406dbe5dd1 in socket_event_handler (fd=16, idx=7, data=0x7f4058004570, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2398
#19 0x00007f407a1749f0 in event_dispatch_epoll_handler (event_pool=0x1e04c90, event=0x7f4063ffee70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:570
#20 0x00007f407a174dde in event_dispatch_epoll_worker (data=0x1eb4610) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:673
#21 0x00007f40793db9d1 in start_thread () from ./lib64/libpthread.so.0
#22 0x00007f4078d458fd in clone () from ./lib64/libc.so.6

Comment 1 Anand Avati 2015-07-24 10:55:23 UTC
REVIEW: http://review.gluster.org/11757 (glusterd: glusterd crash due to race between handshake and snapshot remove threads) posted (#2) for review on master by Anand Nekkunti (anekkunt)

Comment 2 Anand Avati 2015-07-28 13:26:48 UTC
COMMIT: http://review.gluster.org/11757 committed in master by Raghavendra Talur (rtalur) 
------
commit 51f48bc9a41a5e2004d9051ff90517b01626b08f
Author: anand <anekkunt>
Date:   Fri Jul 24 15:48:50 2015 +0530

    glusterd: glusterd crash due to race between handshake and snapshot remove threads
    
    Issue : glusterd was crashing due to race between handshake thread and snapshot
    remove
    RCA : Snapshot  thread referring  voinfo and same time volinfo is modified during handshake,
    glusterd was crashing  due to this inconsistent data of volinfo .
    
    Note: Sending commands without checking cluster status may lead to crash
    
    Fix:.Wait for handshake complete/cluster ready before proceeding commands.
    
    Change-Id: Iefd986664bd9dd225f0abf8f85476d6afd206914
    BUG: 1246432
    Signed-off-by: anand <anekkunt>
    Reviewed-on: http://review.gluster.org/11757
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Atin Mukherjee <amukherj>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra Talur <rtalur>

Comment 3 Nagaprasad Sathyanarayana 2015-10-25 14:49:42 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 4 Niels de Vos 2016-06-16 13:26:51 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user