Bug 1075842 - glusterd segfault with large number of hosts
Summary: glusterd segfault with large number of hosts
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: glusterd
Depends On:
Blocks: 1284380
TreeView+ depends on / blocked
 
Reported: 2014-03-13 01:34 UTC by Harshavardhana
Modified: 2015-12-03 17:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1284380 (view as bug list)
Environment:
Last Closed: 2015-12-03 17:11:36 UTC
Embargoed:


Attachments (Terms of Use)
Coredump (576.16 KB, application/x-gzip)
2014-03-13 01:38 UTC, Harshavardhana
no flags Details
sosreport from the glusterd crash (5.63 MB, application/x-xz)
2014-03-13 01:38 UTC, Harshavardhana
no flags Details

Description Harshavardhana 2014-03-13 01:34:50 UTC
Description of problem:

[2014-03-13 01:17:41.262366] E [glusterd-syncop.c:161:gd_syncop_submit_request] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf) [0x7f10ad4d9d4f] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0x24e) [0x7f10ad4d9b9e] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_mgmt_unlock+0xa3) [0x7f10ad4d7d33]))) 0-: Assertion failed: rpc
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-03-13 01:17:41configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.44rhs
/lib64/libc.so.6[0x3961032960]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_submit_request+0xd2)[0x7f10ad4d73a2]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_syncop_mgmt_unlock+0xa3)[0x7f10ad4d7d33]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0x24e)[0x7f10ad4d9b9e]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7f10ad4d9d4f]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f10ad4da06b]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_start_volume+0x1b6)[0x7f10ad4ce176]
/usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f10ad465fff]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3961c49ad2]
/lib64/libc.so.6[0x3961043bb0]
---------
Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f10ad4d73a2 in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
(gdb)  bt
#0  0x00007f10ad4d73a2 in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#1  0x00007f10ad4d7d33 in gd_syncop_mgmt_unlock () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#2  0x00007f10ad4d9b9e in gd_unlock_op_phase () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#3  0x00007f10ad4d9d4f in gd_sync_task_begin () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#4  0x00007f10ad4da06b in glusterd_op_begin_synctask () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#5  0x00007f10ad4ce176 in __glusterd_handle_cli_start_volume () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#6  0x00007f10ad465fff in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.4.0.44rhs/xlator/mgmt/glusterd.so
#7  0x0000003961c49ad2 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#8  0x0000003961043bb0 in ?? () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) 

After the segfault glusterd fails to start

[2014-03-13 01:22:35.691970] E [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-03-13 01:22:35.691996] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-03-13 01:22:35.692006] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-03-13 01:22:35.692013] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed

[2014-03-13 01:23:51.040766] D [glusterd-utils.c:5166:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: dhcp47-4.lab.bos.redhat.com
[2014-03-13 01:23:51.041399] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.045983] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.046028] D [common-utils.c:2812:gf_is_local_addr] 0-management: 10.16.47.4 
[2014-03-13 01:23:51.046061] D [common-utils.c:2825:gf_is_local_addr] 0-management: dhcp47-4.lab.bos.redhat.com is not local
[2014-03-13 01:23:51.046069] D [glusterd-utils.c:5201:glusterd_hostname_to_uuid] 0-management: returning -1
[2014-03-13 01:23:51.046075] D [glusterd-utils.c:615:glusterd_resolve_brick] 0-management: Returning -1
[2014-03-13 01:23:51.046080] E [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-03-13 01:23:51.046086] D [glusterd-store.c:2555:glusterd_resolve_all_bricks] 0-: Returning with -1
[2014-03-13 01:23:51.046091] D [glusterd-store.c:2588:glusterd_restore] 0-: Returning -1
[2014-03-13 01:23:51.046100] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-03-13 01:23:51.046109] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-03-13 01:23:51.046115] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed

Version-Release number of selected component (if applicable):
3.4.0.44rhs

How reproducible:
Often

Topology for volume HadoopVol:

Distribute set
 |     
 +---- Replica set 0
 |      |     
 |      +---- Brick 0: dhcp71-70.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-65.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 1
 |      |     
 |      +---- Brick 0: dhcp71-69.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-66.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 2
 |      |     
 |      +---- Brick 0: dhcp71-62.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-64.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 3
 |      |     
 |      +---- Brick 0: dhcp71-63.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-59.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 4
 |      |     
 |      +---- Brick 0: dhcp71-60.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp71-61.rhts.eng.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 5
 |      |     
 |      +---- Brick 0: dhcp47-11.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-8.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 6
 |      |     
 |      +---- Brick 0: dhcp47-9.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-10.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 7
 |      |     
 |      +---- Brick 0: dhcp47-7.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-6.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 8
 |      |     
 |      +---- Brick 0: dhcp47-5.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |      |     
 |      +---- Brick 1: dhcp47-4.lab.bos.redhat.com:/mnt/brick1/HadoopVol
 |     
 +---- Replica set 9
        |     
        +---- Brick 0: dhcp47-3.lab.bos.redhat.com:/mnt/brick1/HadoopVol
        |     
        +---- Brick 1: dhcp47-240.lab.bos.redhat.com:/mnt/brick1/HadoopVol

Comment 2 Harshavardhana 2014-03-13 01:38:00 UTC
Created attachment 873777 [details]
Coredump

Comment 3 Harshavardhana 2014-03-13 01:38:52 UTC
Created attachment 873779 [details]
sosreport from the glusterd crash

Comment 6 Vivek Agarwal 2015-12-03 17:11:36 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.


Note You need to log in before you can comment on or make changes to this bug.