1119223 – [glusterd] glusterd crashed when it failed to create geo-rep status file.

Bug 1119223 - [glusterd] glusterd crashed when it failed to create geo-rep status file.

Summary: [glusterd] glusterd crashed when it failed to create geo-rep status file.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	Avra Sengupta
QA Contact:	Bhaskar Bandari
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1119256
TreeView+	depends on / blocked

Reported:	2014-07-14 10:13 UTC by Vijaykumar Koppad
Modified:	2015-05-13 16:54 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.6.0.27-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1119256 (view as bug list)
Environment:
Last Closed:	2014-09-22 19:44:19 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description Vijaykumar Koppad 2014-07-14 10:13:09 UTC

Description of problem:  glusterd crashed when it failed to create geo-rep status file. 

bt from core file
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  __gf_free (free_ptr=0x7f06c4045c70) at mem-pool.c:252
#1  0x00007f06d8159cd4 in _local_gsyncd_start (this=<value optimized out>, key=<value optimized out>, 
    value=<value optimized out>, data=0xda8110) at glusterd-utils.c:6958
#2  0x0000003723a18ce5 in dict_foreach (dict=0x7f06e028e308, fn=0x7f06d8159ad0 <_local_gsyncd_start>, data=0xda8110)
    at dict.c:1127
#3  0x00007f06d813e6af in glusterd_volume_restart_gsyncds (volinfo=0xda8110) at glusterd-utils.c:6984
#4  0x00007f06d813e718 in glusterd_restart_gsyncds (conf=0xd9e980) at glusterd-utils.c:6995
#5  0x00007f06d8156dd8 in glusterd_spawn_daemons (opaque=<value optimized out>) at glusterd-utils.c:3579
#6  0x0000003723a5b742 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#7  0x0000003e09c43bf0 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

bt in log file 
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
[2014-07-14 08:41:24.424692] D [socket.c:499:__socket_rwv] 0-socket.management: EOF on socket
[2014-07-14 08:41:24.424732] D [socket.c:2246:socket_event_handler] 0-transport: disconnecting now
[2014-07-14 08:41:24.529252] E [glusterd-geo-rep.c:1942:glusterd_create_status_file] 0-: Creating status file failed.
[2014-07-14 08:41:24.529300] D [glusterd-geo-rep.c:1949:glusterd_create_status_file] 0-: returning -1
[2014-07-14 08:41:24.529326] E [glusterd-utils.c:6967:_local_gsyncd_start] 0-: Unable to create status file. Error : Bad file descriptor
[2014-07-14 08:41:24.529490] E [mem-pool.c:242:__gf_free] (-->/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_volume_restart_gsyncds+0x1f) [0x7f06d813e6af] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45) [0x3723a18ce5] 
(-->/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(_local_gsyncd_start+0x204) [0x7f06d8159cd4]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t *)ptr
[2014-07-14 08:41:24.529531] D [logging.c:1805:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2014-07-14 08:41:24.529558] D [logging.c:1808:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-07-14 08:41:24
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.24
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3723a1fe56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3723a3a28f]
/lib64/libc.so.6[0x3e09c329a0]
/usr/lib64/libglusterfs.so.0(__gf_free+0xcc)[0x3723a4d17c]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(_local_gsyncd_start+0x204)[0x7f06d8159cd4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x3723a18ce5]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_volume_restart_gsyncds+0x1f)[0x7f06d813e6af]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_restart_gsyncds+0x28)[0x7f06d813e718]
/usr/lib64/glusterfs/3.6.0.24/xlator/mgmt/glusterd.so(glusterd_spawn_daemons+0x38)[0x7f06d8156dd8]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3723a5b742]
/lib64/libc.so.6[0x3e09c43bf0]
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


Version-Release number of selected component (if applicable):glusterfs-3.6.0.24-1.el6rhs


How reproducible: Didn't try to reproduce. 

This crash happened while restoring snapshot in geo-rep setup. This crash was side effect something unknown happened on the setup. Not sure how to reproduce this glusterd. Hence mentioning following steps when it happened. 

Steps to Reproduce:
1. create geo-rep setup. 
2. take multiple snapshots with geo-rep while IOs are happening on the master (follow steps to create snapshot with geo-rep)
3. restore to 2 immediate snapshots then restore one of the older snapshots. 
(follow the steps to restore snaps with geo-rep)
4. In the case where this crash happened, after the the third snapshot, geo-rep start failed with "Staging failed on 10.70.43.107. Error: state-file entry missing in the config file(/var/lib/glusterd/geo-replication/master_10.70.43.170_slave/gsyncd.conf)"
Staging failed on 10.70.43.162. Error: state-file entry missing in the config file(/var/lib/glusterd/geo-replication/master_10.70.43.170_slave/gsyncd.conf)

5: After this restarting glusterd on the node where all the commands executed resulted in glusterd crash both the nodes where staging failed, which was actually because it failed to create geo-rep status file. 


Actual results: glusterd crashed when it failed to  create geo-rep status file


Expected results: glusterd shouldn't crash. 
 

Additional info:

Comment 2 Avra Sengupta 2014-07-14 11:38:45 UTC

In a node, where the status file fails to be created during glusterd startup coz of any issue(it maybe a runner_run failure or any other issue), this bug will cause gluserd to crash. The issue will remain consistent as long as the status file creation fails, and glusterd will keep crashing.

Comment 3 Avra Sengupta 2014-07-22 11:02:04 UTC

Fix at https://code.engineering.redhat.com/gerrit/#/c/29533/

Comment 4 Vijaykumar Koppad 2014-08-13 07:49:34 UTC

Verified in the build glusterfs-3.6.0.27-1. 

Steps.
1. Since there are no proper steps to reproduce, verification was done using gdb and setting breakpoint at _local_gsyncd_start. 
2. This can be verified by making glusterd restart gsyncd processes and setting few variables to hit the particular scenario.

Comment 8 errata-xmlrpc 2014-09-22 19:44:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.