Bug 1119223 - [glusterd] glusterd crashed when it failed to create geo-rep status file.
Summary: [glusterd] glusterd crashed when it failed to create geo-rep status file.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Avra Sengupta
QA Contact: Bhaskar Bandari
URL:
Whiteboard:
Depends On:
Blocks: 1119256
TreeView+ depends on / blocked
 
Reported: 2014-07-14 10:13 UTC by Vijaykumar Koppad
Modified: 2015-05-13 16:54 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0.27-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1119256 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:44:19 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Vijaykumar Koppad 2014-07-14 10:13:09 UTC
Description of problem:  glusterd crashed when it failed to create geo-rep status file. 

bt from core file
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  __gf_free (free_ptr=0x7f06c4045c70) at mem-pool.c:252
#1  0x00007f06d8159cd4 in _local_gsyncd_start (this=<value optimized out>, key=<value optimized out>, 
    value=<value optimized out>, data=0xda8110) at glusterd-utils.c:6958
#2  0x0000003723a18ce5 in dict_foreach (dict=0x7f06e028e308, fn=0x7f06d8159ad0 <_local_gsyncd_start>, data=0xda8110)
    at dict.c:1127
#3  0x00007f06d813e6af in glusterd_volume_restart_gsyncds (volinfo=0xda8110) at glusterd-utils.c:6984
#4  0x00007f06d813e718 in glusterd_restart_gsyncds (conf=0xd9e980) at glusterd-utils.c:6995
#5  0x00007f06d8156dd8 in glusterd_spawn_daemons (opaque=<value optimized out>) at glusterd-utils.c:3579
#6  0x0000003723a5b742 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#7  0x0000003e09c43bf0 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

bt in log file 
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
[2014-07-14 08:41:24.424692] D [socket.c:499:__socket_rwv] 0-socket.management: EOF on socket
[2014-07-14 08:41:24.424732] D [socket.c:2246:socket_event_handler] 0-transport: disconnecting now
[2014-07-14 08:41:24.529252] E [glusterd-geo-rep.c:1942:glusterd_create_status_file] 0-: Creating status file failed.
[2014-07-14 08:41:24.529300] D [glusterd-geo-rep.c:1949:glusterd_create_status_file] 0-: returning -1
[2014-07-14 08:41:24.529326] E [glusterd-utils.c:6967:_local_gsyncd_start] 0-: Unable to create status file. Error : Bad file descriptor
[2014-07-14 08:41:24.529490] E [mem-pool.c:242:__gf_free] (-->/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_volume_restart_gsyncds+0x1f) [0x7f06d813e6af] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45) [0x3723a18ce5] 
(-->/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(_local_gsyncd_start+0x204) [0x7f06d8159cd4]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t *)ptr
[2014-07-14 08:41:24.529531] D [logging.c:1805:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2014-07-14 08:41:24.529558] D [logging.c:1808:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-07-14 08:41:24
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.24
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3723a1fe56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3723a3a28f]
/lib64/libc.so.6[0x3e09c329a0]
/usr/lib64/libglusterfs.so.0(__gf_free+0xcc)[0x3723a4d17c]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(_local_gsyncd_start+0x204)[0x7f06d8159cd4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x3723a18ce5]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_volume_restart_gsyncds+0x1f)[0x7f06d813e6af]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_restart_gsyncds+0x28)[0x7f06d813e718]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_spawn_daemons+0x38)[0x7f06d8156dd8]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3723a5b742]
/lib64/libc.so.6[0x3e09c43bf0]
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


Version-Release number of selected component (if applicable):glusterfs-3.6.0.24-1.el6rhs


How reproducible: Didn't try to reproduce. 

This crash happened while restoring snapshot in geo-rep setup. This crash was side effect something unknown happened on the setup. Not sure how to reproduce this glusterd. Hence mentioning following steps when it happened. 

Steps to Reproduce:
1. create geo-rep setup. 
2. take multiple snapshots with geo-rep while IOs are happening on the master (follow steps to create snapshot with geo-rep)
3. restore to 2 immediate snapshots then restore one of the older snapshots. 
(follow the steps to restore snaps with geo-rep)
4. In the case where this crash happened, after the the third snapshot, geo-rep start failed with "Staging failed on 10.70.43.107. Error: state-file entry missing in the config file(/var/lib/glusterd/geo-replication/master_10.70.43.170_slave/gsyncd.conf)"
Staging failed on 10.70.43.162. Error: state-file entry missing in the config file(/var/lib/glusterd/geo-replication/master_10.70.43.170_slave/gsyncd.conf)

5: After this restarting glusterd on the node where all the commands executed resulted in glusterd crash both the nodes where staging failed, which was actually because it failed to create geo-rep status file. 


Actual results: glusterd crashed when it failed to  create geo-rep status file


Expected results: glusterd shouldn't crash. 
 

Additional info:

Comment 2 Avra Sengupta 2014-07-14 11:38:45 UTC
In a node, where the status file fails to be created during glusterd startup coz of any issue(it maybe a runner_run failure or any other issue), this bug will cause gluserd to crash. The issue will remain consistent as long as the status file creation fails, and glusterd will keep crashing.

Comment 3 Avra Sengupta 2014-07-22 11:02:04 UTC
Fix at https://code.engineering.redhat.com/gerrit/#/c/29533/

Comment 4 Vijaykumar Koppad 2014-08-13 07:49:34 UTC
Verified in the build glusterfs-3.6.0.27-1. 

Steps.
1. Since there are no proper steps to reproduce, verification was done using gdb and setting breakpoint at _local_gsyncd_start. 
2. This can be verified by making glusterd restart gsyncd processes and setting few variables to hit the particular scenario.

Comment 8 errata-xmlrpc 2014-09-22 19:44:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.