Bug 1075842

Summary: glusterd segfault with large number of hosts
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Harshavardhana <fharshav>
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: cww, jvance, nlevinki, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1284380 (view as bug list) Environment:
Last Closed: 2015-12-03 17:11:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1284380    
Attachments:
Description Flags
Coredump
none
sosreport from the glusterd crash none

Description Harshavardhana 2014-03-13 01:34:50 UTC
Description of problem:

[2014-03-13 01:17:41.262366] E [glusterd-syncop.c:161:gd_syncop_submit_request] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf) [0x7f10ad4d9d4f] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0x24e) [0x7f10ad4d9b9e] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_mgmt_unlock+0xa3) [0x7f10ad4d7d33]))) 0-: Assertion failed: rpc
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-03-13 01:17:41configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.44rhs
/lib64/libc.so.6[0x3961032960]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_submit_request+0xd2)[0x7f10ad4d73a2]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_mgmt_unlock+0xa3)[0x7f10ad4d7d33]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0x24e)[0x7f10ad4d9b9e]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7f10ad4d9d4f]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f10ad4da06b]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_start_volume+0x1b6)[0x7f10ad4ce176]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f10ad465fff]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3961c49ad2]
/lib64/libc.so.6[0x3961043bb0]
---------
Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f10ad4d73a2 in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
(gdb)  bt
#0  0x00007f10ad4d73a2 in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#1  0x00007f10ad4d7d33 in gd_syncop_mgmt_unlock () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#2  0x00007f10ad4d9b9e in gd_unlock_op_phase () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#3  0x00007f10ad4d9d4f in gd_sync_task_begin () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#4  0x00007f10ad4da06b in glusterd_op_begin_synctask () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#5  0x00007f10ad4ce176 in __glusterd_handle_cli_start_volume () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#6  0x00007f10ad465fff in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#7  0x0000003961c49ad2 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#8  0x0000003961043bb0 in ?? () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) 

After the segfault glusterd fails to start

[2014-03-13 01:22:35.691970] E [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-03-13 01:22:35.691996] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-03-13 01:22:35.692006] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-03-13 01:22:35.692013] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed

[2014-03-13 01:23:51.040766] D [glusterd-utils.c:5166:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: dhcp47-4.lab.bos.redhat.com
[2014-03-13 01:23:51.041399] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.045983] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.046028] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.046061] D [common-utils.c:2825:gf_is_local_addr] 0-management: dhcp47-4.lab.bos.redhat.com is not local
[2014-03-13 01:23:51.046069] D [glusterd-utils.c:5201:glusterd_hostname_to_uuid] 0-management: returning -1
[2014-03-13 01:23:51.046075] D [glusterd-utils.c:615:glusterd_resolve_brick] 0-management: Returning -1
[2014-03-13 01:23:51.046080] E [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-03-13 01:23:51.046086] D [glusterd-store.c:2555:glusterd_resolve_all_bricks] 0-: Returning with -1
[2014-03-13 01:23:51.046091] D [glusterd-store.c:2588:glusterd_restore] 0-: Returning -1
[2014-03-13 01:23:51.046100] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-03-13 01:23:51.046109] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-03-13 01:23:51.046115] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed

Version-Release number of selected component (if applicable):
3.4.0.44rhs

How reproducible:
Often

Topology for volume HadoopVol:

Distribute set
 |     
 +---- Replica set 0
 |      |     
 |      +---- Brick 0: dhcp71-70.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-65.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 1
 |      |     
 |      +---- Brick 0: dhcp71-69.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-66.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 2
 |      |     
 |      +---- Brick 0: dhcp71-62.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-64.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 3
 |      |     
 |      +---- Brick 0: dhcp71-63.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-59.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 4
 |      |     
 |      +---- Brick 0: dhcp71-60.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-61.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 5
 |      |     
 |      +---- Brick 0: dhcp47-11.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-8.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 6
 |      |     
 |      +---- Brick 0: dhcp47-9.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-10.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 7
 |      |     
 |      +---- Brick 0: dhcp47-7.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-6.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 8
 |      |     
 |      +---- Brick 0: dhcp47-5.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-4.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 9
        |     
        +---- Brick 0: dhcp47-3.lab.bos.redhat.com:/mnt/brick1/HadoopVol
        |     
        +---- Brick 1: dhcp47-240.lab.bos.redhat.com:/mnt/brick1/HadoopVol

Comment 2 Harshavardhana 2014-03-13 01:38:00 UTC
Created attachment 873777 [details]
Coredump

Comment 3 Harshavardhana 2014-03-13 01:38:52 UTC
Created attachment 873779 [details]
sosreport from the glusterd crash

Comment 6 Vivek Agarwal 2015-12-03 17:11:36 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.