1028732 – dist-geo-rep: glusterd crashed while running geo-rep status detail

Bug 1028732 - dist-geo-rep: glusterd crashed while running geo-rep status detail

Summary: dist-geo-rep: glusterd crashed while running geo-rep status detail

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Avra Sengupta
QA Contact:	M S Vishwanath Bhat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-11-10 07:56 UTC by M S Vishwanath Bhat
Modified:	2016-06-01 01:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0.43rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-11-27 15:47:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterd log file from the node where it crashed (2.20 MB, text/x-log) 2013-11-10 07:56 UTC, M S Vishwanath Bhat	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1769	0	normal	SHIPPED_LIVE	Red Hat Storage 2.1 enhancement and bug fix update #1	2013-11-27 20:17:39 UTC

Description M S Vishwanath Bhat 2013-11-10 07:56:10 UTC

Created attachment 822041 [details]
glusterd log file from the node where it crashed

Description of problem:
I was running geo-rep status details in while loop and after some time glusterd crashed. glusterd was started with -LDEBUG mode.


Version-Release number of selected component (if applicable):
glusterfs-3.4.0.42rhs-1.el6rhs.x86_64

How reproducible:
Hit twice in 2 tries.

Steps to Reproduce:
1. Create and start geo-rep session between 282 dist-rep master and 2*2 slave nodes.
2. Turn on the use_tarssh option to sync the files. 
3. In a while loop keep running geo-rep status detail with 20-30 seconds sleep.

Actual results:


Core was generated by `glusterd -LDEBUG'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003b3a081361 in __strlen_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install device-mapper-event-libs-1.02.77-9.el6.x86_64 device-mapper-libs-1.02.77-9.el6.x86_64 glibc-2.12-1.107.el6_4.5.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libsepol-2.0.41-4.el6.x86_64 libudev-147-2.46.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64 lvm2-libs-2.02.98-9.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000003b3a081361 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007f20832919be in glusterd_parse_gsync_status (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave",
    conf_path=<value optimized out>, dict=0x7f20854cb7d4, node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2576
#2  glusterd_read_status_file (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave", conf_path=<value optimized out>, dict=0x7f20854cb7d4,
    node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2733
#3  0x00007f2083293a66 in glusterd_get_gsync_status_mst_slv (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave",
    conf_path=0x7f206c10c040 "/var/lib/glusterd/geo-replication/master_falcon_slave/gsyncd.conf", rsp_dict=0x7f20854cb7d4,
    node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2966
#4  0x00007f2083293d9c in glusterd_get_gsync_status (dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4) at glusterd-geo-rep.c:3067
#5  0x00007f20832943b6 in glusterd_op_gsync_set (dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4) at glusterd-geo-rep.c:3518
#6  0x00007f2083254066 in glusterd_op_commit_perform (op=GD_OP_GSYNC_SET, dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4)
    at glusterd-op-sm.c:3933
#7  0x00007f20832acc7e in gd_commit_op_phase (peers=0x18da740, op=GD_OP_GSYNC_SET, op_ctx=0x7f20854cc120, req_dict=0x7f20854cb400,
    op_errstr=0x1ce42b8, npeers=3) at glusterd-syncop.c:958
#8  0x00007f20832ae8c2 in gd_sync_task_begin (op_ctx=0x7f20854cc120, req=0x7f2082ce1920) at glusterd-syncop.c:1240
#9  0x00007f20832ae9fb in glusterd_op_begin_synctask (req=0x7f2082ce1920, op=<value optimized out>, dict=0x7f20854cc120) at glusterd-syncop.c:1274
#10 0x00007f2083295d6f in __glusterd_handle_gsync_set (req=0x7f2082ce1920) at glusterd-geo-rep.c:319
#11 0x00007f208323ae7f in glusterd_big_locked_handler (req=0x7f2082ce1920, actor_fn=0x7f2083295c00 <__glusterd_handle_gsync_set>)
    at glusterd-handler.c:77
#12 0x00007f2086cfaad2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132
#13 0x0000003b3a043bb0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()


Expected results:
glusterd should not crash.

Additional info:

Part of log file

pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-11-08 22:39:46configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.42rhs
/lib64/libc.so.6[0x3b3a032960]
/lib64/libc.so.6[0x3b3a081361]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_read_status_file+0xc2e)[0x7f20832919be]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(+0x7da66)[0x7f2083293a66]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(+0x7dd9c)[0x7f2083293d9c]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_gsync_set+0x396)[0x7f20832943b6]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x2b6)[0x7f2083254066]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xbe)[0x7f20832acc7e]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x2c2)[0x7f20832ae8c2]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f20832ae9fb]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x16f)[0x7f2083295d6f]
/usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f208323ae7f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7f2086cfaad2]
/lib64/libc.so.6[0x3b3a043bb0]


I have attached the glusterd log file.

Comment 2 Avra Sengupta 2013-11-12 06:47:17 UTC

Fixed with https://code.engineering.redhat.com/gerrit/#/c/15501/

Comment 3 M S Vishwanath Bhat 2013-11-12 08:47:45 UTC

I ran the same steps again many times. So far glusterd hasn't crashed. So moving the bug to verified.

Comment 4 errata-xmlrpc 2013-11-27 15:47:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html

Note You need to log in before you can comment on or make changes to this bug.