Created attachment 822041 [details] glusterd log file from the node where it crashed Description of problem: I was running geo-rep status details in while loop and after some time glusterd crashed. glusterd was started with -LDEBUG mode. Version-Release number of selected component (if applicable): glusterfs-3.4.0.42rhs-1.el6rhs.x86_64 How reproducible: Hit twice in 2 tries. Steps to Reproduce: 1. Create and start geo-rep session between 282 dist-rep master and 2*2 slave nodes. 2. Turn on the use_tarssh option to sync the files. 3. In a while loop keep running geo-rep status detail with 20-30 seconds sleep. Actual results: Core was generated by `glusterd -LDEBUG'. Program terminated with signal 11, Segmentation fault. #0 0x0000003b3a081361 in __strlen_sse2 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install device-mapper-event-libs-1.02.77-9.el6.x86_64 device-mapper-libs-1.02.77-9.el6.x86_64 glibc-2.12-1.107.el6_4.5.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libsepol-2.0.41-4.el6.x86_64 libudev-147-2.46.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64 lvm2-libs-2.02.98-9.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x0000003b3a081361 in __strlen_sse2 () from /lib64/libc.so.6 #1 0x00007f20832919be in glusterd_parse_gsync_status (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave", conf_path=<value optimized out>, dict=0x7f20854cb7d4, node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2576 #2 glusterd_read_status_file (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave", conf_path=<value optimized out>, dict=0x7f20854cb7d4, node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2733 #3 0x00007f2083293a66 in glusterd_get_gsync_status_mst_slv (volinfo=0x7f207c017730, slave=0x7f206c116f00 "falcon::slave", conf_path=0x7f206c10c040 "/var/lib/glusterd/geo-replication/master_falcon_slave/gsyncd.conf", rsp_dict=0x7f20854cb7d4, node=0x1ce35d0 "spitfire.blr.redhat.com") at glusterd-geo-rep.c:2966 #4 0x00007f2083293d9c in glusterd_get_gsync_status (dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4) at glusterd-geo-rep.c:3067 #5 0x00007f20832943b6 in glusterd_op_gsync_set (dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4) at glusterd-geo-rep.c:3518 #6 0x00007f2083254066 in glusterd_op_commit_perform (op=GD_OP_GSYNC_SET, dict=0x7f20854cb400, op_errstr=0x1ce42b8, rsp_dict=0x7f20854cb7d4) at glusterd-op-sm.c:3933 #7 0x00007f20832acc7e in gd_commit_op_phase (peers=0x18da740, op=GD_OP_GSYNC_SET, op_ctx=0x7f20854cc120, req_dict=0x7f20854cb400, op_errstr=0x1ce42b8, npeers=3) at glusterd-syncop.c:958 #8 0x00007f20832ae8c2 in gd_sync_task_begin (op_ctx=0x7f20854cc120, req=0x7f2082ce1920) at glusterd-syncop.c:1240 #9 0x00007f20832ae9fb in glusterd_op_begin_synctask (req=0x7f2082ce1920, op=<value optimized out>, dict=0x7f20854cc120) at glusterd-syncop.c:1274 #10 0x00007f2083295d6f in __glusterd_handle_gsync_set (req=0x7f2082ce1920) at glusterd-geo-rep.c:319 #11 0x00007f208323ae7f in glusterd_big_locked_handler (req=0x7f2082ce1920, actor_fn=0x7f2083295c00 <__glusterd_handle_gsync_set>) at glusterd-handler.c:77 #12 0x00007f2086cfaad2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132 #13 0x0000003b3a043bb0 in ?? () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () Expected results: glusterd should not crash. Additional info: Part of log file pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-11-08 22:39:46configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.42rhs /lib64/libc.so.6[0x3b3a032960] /lib64/libc.so.6[0x3b3a081361] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_read_status_file+0xc2e)[0x7f20832919be] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(+0x7da66)[0x7f2083293a66] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(+0x7dd9c)[0x7f2083293d9c] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_gsync_set+0x396)[0x7f20832943b6] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x2b6)[0x7f2083254066] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xbe)[0x7f20832acc7e] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x2c2)[0x7f20832ae8c2] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f20832ae9fb] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x16f)[0x7f2083295d6f] /usr/lib64/glusterfs/3.4.0.42rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f208323ae7f] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7f2086cfaad2] /lib64/libc.so.6[0x3b3a043bb0] I have attached the glusterd log file.
Fixed with https://code.engineering.redhat.com/gerrit/#/c/15501/
I ran the same steps again many times. So far glusterd hasn't crashed. So moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html