Bug 1005553

Summary: Dist-geo-rep: glusterd crashed when started with -LDEBUG and ran geo-rep status detail
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: geo-replicationAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: aavati, amarts, csaba, grajaiya, kparthas, mzywusko, rhs-bugs, shaines, vagarwal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.34rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1006249 (view as bug list) Environment:
Last Closed: 2013-11-27 15:37:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1006249    

Description M S Vishwanath Bhat 2013-09-08 10:23:23 UTC
Description of problem:
I was running geo-rep with the code coverage, and I had compiled the gluster from source. So I started glusterd with LDEBUG. After starting gluster volume and starting the geo-rep session, I ran gluster v geo status detail, glusterd crashed.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
Haven't tried reproducing again, but it happened in 2 nodes.

Steps to Reproduce:
1. Compile and install gluster from source. with code coverage compiler flags.
2. Now create two volumes and start a geo-rep session between them.
3. Now after the session is started, run geo-rep status detail.

Actual results:
glusterd crashed with following backtrace.

(gdb) bt
#0  0x00000039f48328a5 in raise () from /lib64/libc.so.6
#1  0x00000039f4834085 in abort () from /lib64/libc.so.6
#2  0x00000039f482ba1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000039f482bae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f808ca0e67d in glusterd_mountbroker_check (slave_ip=0x18480e8, op_errstr=0x0) at glusterd-geo-rep.c:1745
#5  0x00007f808ca1829f in glusterd_get_slave_info (slave=0x7f808800eac6 "euclid", slave_ip=0x184a168, slave_vol=0x184a160, op_errstr=0x0) at glusterd-geo-rep.c:3674
#6  0x00007f808ca0a985 in _get_status_mst_slv (this=0x7f808ec95394, key=0x7f8088019270 "slave1", value=0x7f808eab3594, data=0x184a240) at glusterd-geo-rep.c:953
#7  0x00007f80904a8210 in dict_foreach (dict=0x7f808ec95394, fn=0x7f808ca0a746 <_get_status_mst_slv>, data=0x184a240) at dict.c:1123
#8  0x00007f808ca1412a in glusterd_get_gsync_status_mst (volinfo=0x7f807c0014d0, rsp_dict=0x7f808ec95880, node=0x184a2f0 "ramanujan.blr.redhat.com") at glusterd-geo-rep.c:2927
#9  0x00007f808ca14273 in glusterd_get_gsync_status_all (rsp_dict=0x7f808ec95880, node=0x184a2f0 "ramanujan.blr.redhat.com") at glusterd-geo-rep.c:2946
#10 0x00007f808ca14468 in glusterd_get_gsync_status (dict=0x7f808ec95ce0, op_errstr=0x184bdc8, rsp_dict=0x7f808ec95880) at glusterd-geo-rep.c:2977
#11 0x00007f808ca17052 in glusterd_op_gsync_set (dict=0x7f808ec95ce0, op_errstr=0x184bdc8, rsp_dict=0x7f808ec95880) at glusterd-geo-rep.c:3456
#12 0x00007f808c9a609d in glusterd_op_commit_perform (op=GD_OP_GSYNC_SET, dict=0x7f808ec95ce0, op_errstr=0x184bdc8, rsp_dict=0x7f808ec95880) at glusterd-op-sm.c:3920
#13 0x00007f808ca3fb74 in gd_commit_op_phase (peers=0x12b9dd0, op=GD_OP_GSYNC_SET, op_ctx=0x7f808ec95d6c, req_dict=0x7f808ec95ce0, op_errstr=0x184bdc8, npeers=0) at glusterd-syncop.c:958
#14 0x00007f808ca40f52 in gd_sync_task_begin (op_ctx=0x7f808ec95d6c, req=0x7f808c8e404c) at glusterd-syncop.c:1230
#15 0x00007f808ca4112d in glusterd_op_begin_synctask (req=0x7f808c8e404c, op=GD_OP_GSYNC_SET, dict=0x7f808ec95d6c) at glusterd-syncop.c:1264
#16 0x00007f808ca07cc3 in __glusterd_handle_gsync_set (req=0x7f808c8e404c) at glusterd-geo-rep.c:318
#17 0x00007f808c982a06 in glusterd_big_locked_handler (req=0x7f808c8e404c, actor_fn=0x7f808ca07594 <__glusterd_handle_gsync_set>) at glusterd-handler.c:77
#18 0x00007f808ca07e46 in glusterd_handle_gsync_set (req=0x7f808c8e404c) at glusterd-geo-rep.c:346
#19 0x00007f8090513808 in synctask_wrap (old_task=0x12bbff0) at syncop.c:131
#20 0x00000039f4843b70 in ?? () from /lib64/libc.so.6
#21 0x0000000000000000 in ?? ()



(gdb) f 4
#4  0x00007f808ca0e67d in glusterd_mountbroker_check (slave_ip=0x18480e8, op_errstr=0x0) at glusterd-geo-rep.c:1745
1745            GF_ASSERT (op_errstr);




Expected results:
glusterd should not crash.

Additional info:


part of the glusterd log before crashing


[2013-09-08 03:45:19.046211] I [glusterd-geo-rep.c:283:__glusterd_handle_gsync_set] 0-management: slave not found, whilehandling geo-replication options
[2013-09-08 03:45:19.046237] D [glusterd-utils.c:157:glusterd_lock] 0-management: Cluster lock held by d6694e36-99b7-49bc-8bee-687203be714d
[2013-09-08 03:45:19.046255] W [glusterd-geo-rep.c:1404:glusterd_op_gsync_args_get] 0-: master not found
[2013-09-08 03:45:19.046264] D [glusterd-geo-rep.c:1430:glusterd_op_gsync_args_get] 0-: Returning -2
[2013-09-08 03:45:19.046283] D [glusterd-geo-rep.c:1387:glusterd_verify_gsync_status_opts] 0-: Returning 0
[2013-09-08 03:45:19.046293] D [glusterd-geo-rep.c:2248:glusterd_op_stage_gsync_set] 0-: Returning 0
[2013-09-08 03:45:19.046300] D [glusterd-op-sm.c:3856:glusterd_op_stage_validate] 0-management: OP = 15. Returning 0
[2013-09-08 03:45:19.046310] D [glusterd-op-sm.c:4937:glusterd_op_bricks_select] 0-management: Returning 0
[2013-09-08 03:45:19.046317] D [glusterd-syncop.c:1164:gd_brick_op_phase] 0-management: Sent op req to 0 bricks
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2013-09-08 03:45:19configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.32rhs
glusterd(glusterfsd_print_trace+0x31)[0x40aac8]
/lib64/libc.so.6[0x39f4832920]
/lib64/libc.so.6(gsignal+0x35)[0x39f48328a5]
/lib64/libc.so.6(abort+0x175)[0x39f4834085]
/lib64/libc.so.6[0x39f482ba1e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x39f482bae0]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_mountbroker_check+0x11c)[0x7f808ca0e67d]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0xd029f)[0x7f808ca1829f]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0xc2985)[0x7f808ca0a985]
/usr/local/lib/libglusterfs.so.0(dict_foreach+0xe3)[0x7f80904a8210]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0xcc12a)[0x7f808ca1412a]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0xcc273)[0x7f808ca14273]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0xcc468)[0x7f808ca14468]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_op_gsync_set+0x342)[0x7f808ca17052]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x3a1)[0x7f808c9a609d]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(gd_commit_op_phase+0x138)[0x7f808ca3fb74]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x4e0)[0x7f808ca40f52]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0xe5)[0x7f808ca4112d]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x72f)[0x7f808ca07cc3]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x84)[0x7f808c982a06]
/usr/local/lib/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(glusterd_handle_gsync_set+0x34)[0x7f808ca07e46]
/usr/local/lib/libglusterfs.so.0(synctask_wrap+0x5c)[0x7f8090513808]
/lib64/libc.so.6[0x39f4843b70]

Comment 3 Gowrishankar Rajaiyan 2013-10-08 08:42:32 UTC
Fixed in version please.

Comment 4 M S Vishwanath Bhat 2013-10-22 17:11:13 UTC
This time around, there were no crashes

[root@spitfire ]# gluster v geo master falcon::slave status
NODE                       MASTER    SLAVE            HEALTH    UPTIME         
---------------------------------------------------------------------------
spitfire.blr.redhat.com    master    falcon::slave    Stable    00:09:04       
typhoon.blr.redhat.com     master    falcon::slave    Stable    00:09:00       
mustang.blr.redhat.com     master    falcon::slave    Stable    00:09:00       
harrier.blr.redhat.com     master    falcon::slave    Stable    00:09:00       


[root@spitfire ]# gluster v geo master falcon::slave status detail
 
                                        MASTER: master  SLAVE: falcon::slave
 
NODE                         HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
--------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stable    00:09:10    1116           0                0Bytes           0                 
mustang.blr.redhat.com       Stable    00:09:06    0              0                0Bytes           0                 
harrier.blr.redhat.com       Stable    00:09:07    1081           0                0Bytes           0                 
typhoon.blr.redhat.com       Stable    00:09:07    0              0                0Bytes           0                 


[root@spitfire ]# gluster v geo master falcon::slave status
NODE                       MASTER    SLAVE            HEALTH    UPTIME         
---------------------------------------------------------------------------
spitfire.blr.redhat.com    master    falcon::slave    Stable    00:09:04       
typhoon.blr.redhat.com     master    falcon::slave    Stable    00:09:00       
mustang.blr.redhat.com     master    falcon::slave    Stable    00:09:00       
harrier.blr.redhat.com     master    falcon::slave    Stable    00:09:00       



[root@spitfire ]# gluster v geo master falcon::slave status detail
 
                                        MASTER: master  SLAVE: falcon::slave
 
NODE                         HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
--------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stable    00:09:10    1116           0                0Bytes           0                 
mustang.blr.redhat.com       Stable    00:09:06    0              0                0Bytes           0                 
harrier.blr.redhat.com       Stable    00:09:07    1081           0                0Bytes           0                 
typhoon.blr.redhat.com       Stable    00:09:07    0              0                0Bytes           0                 


Moving the bug to Verified.

Comment 6 errata-xmlrpc 2013-11-27 15:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html