1576392 – Glusterd crashed on a few (master) nodes

Bug 1576392 - Glusterd crashed on a few (master) nodes

Summary: Glusterd crashed on a few (master) nodes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Kotresh HR
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1570586
Blocks:	1577868 1611106
TreeView+	depends on / blocked

Reported:	2018-05-09 11:04 UTC by Kotresh HR
Modified:	2018-10-23 15:07 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-5.0
Clone Of:	1570586
Clones:	1577868 1611106 (view as bug list)
Environment:
Last Closed:	2018-10-23 15:07:59 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kotresh HR 2018-05-09 11:04:09 UTC

Description of problem:
=======================

Glusterd crashed on a few nodes
Geo-replication status was CREATED/ACTIVE as opposed to ACTIVE/PASSIVE.

Geo-replication session was started and the following was shown as the status of the session:
----------------------------------------------------------------------------------------------
[root@dhcp41-226 scripts]# gluster volume geo-replication master 10.70.41.160::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.41.226    master        /rhs/brick3/b7    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.226    master        /rhs/brick1/b1    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.230    master        /rhs/brick2/b5    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.229    master        /rhs/brick2/b4    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.219    master        /rhs/brick2/b6    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.227    master        /rhs/brick3/b8    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.227    master        /rhs/brick1/b2    root          10.70.41.160::slave    N/A             Created    N/A                N/A                          
10.70.41.228    master        /rhs/brick3/b9    root          10.70.41.160::slave    10.70.41.160    Active     Changelog Crawl    2018-04-23 06:13:53          
10.70.41.228    master        /rhs/brick1/b3    root          10.70.41.160::slave    10.70.42.79     Active     Changelog Crawl    2018-04-23 06:13:53        


glusterd logs:
-------------
[2018-04-23 07:34:16.850166] E [mem-pool.c:307:__gf_free] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x419cf) [0x7f98a9e619cf] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x44ca5) [0x7f98a9e64ca5] -->/lib64/libglusterfs.so.0(__gf_free+0xac) [0x7f98b53e268c] ) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-04-23 07:34:16
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f98b53ba4d0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f98b53c4414]
/lib64/libc.so.6(+0x36280)[0x7f98b3a19280]
/lib64/libc.so.6(gsignal+0x37)[0x7f98b3a19207]
/lib64/libc.so.6(abort+0x148)[0x7f98b3a1a8f8]
/lib64/libc.so.6(+0x78cc7)[0x7f98b3a5bcc7]
/lib64/libc.so.6(+0x7f574)[0x7f98b3a62574]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x44ca5)[0x7f98a9e64ca5]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x419cf)[0x7f98a9e619cf]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1bdc2)[0x7f98a9e3bdc2]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23b6e)[0x7f98a9e43b6e]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f98b53f3250]
/lib64/libc.so.6(+0x47fc0)[0x7f98b3a2afc0]
---------



Version-Release number of selected component (if applicable):
=============================================================
mainline



How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Create Master and a Slave cluster from 6 nodes (each)
2. Create and Start master volume (Tiered: cold-tier 1x(4+2)  and hot-tier 1x3)
4. Create and Start slave volume (Tiered: cold-tier 1x(4+2)  and hot-tier 1x3)
5. Enable quota on master volume 
6. Enable shared storage on master volume
7. Setup geo-rep session between master and slave volume 
8. Mount master volume on client 
9. Create data from master client

Actual results:
================
Glusterd crashed on a few nodes
Geo-rep session was in Created/ACTIVE state

Expected results:
=================
Glusterd should not crash
A geo-rep session which was started should be in ACTIVE/PASSIVE state.


(gdb) bt
#0  0x00007f3fbd4d7e4d in __gf_free () from /lib64/libglusterfs.so.0
#1  0x00007f3fb1ff63de in gd_sync_task_begin () from /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so
#2  0x00007f3fb1ff6c50 in glusterd_op_begin_synctask () from /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so
#3  0x00007f3fb1fc3d98 in __glusterd_handle_gsync_set () from /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so
#4  0x00007f3fb1f38b1e in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so
#5  0x00007f3fbd4e8ad0 in synctask_wrap () from /lib64/libglusterfs.so.0
#6  0x00007f3fbbb1ffc0 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()
(gdb)

Comment 1 Worker Ant 2018-05-09 11:12:51 UTC

REVIEW: https://review.gluster.org/19993 (glusterd/geo-rep: Fix glusterd crash) posted (#1) for review on master by Kotresh HR

Comment 2 Worker Ant 2018-05-12 09:07:02 UTC

COMMIT: https://review.gluster.org/19993 committed in master by "Amar Tumballi" <amarts> with a commit message- glusterd/geo-rep: Fix glusterd crash

Using strdump instead of gf_strdup crashes
during free if mempool is being used.
gf_free checks the magic number in the
header which will not be taken care if
strdup is used.

fixes: bz#1576392
Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e
Signed-off-by: Kotresh HR <khiremat>

Comment 3 Worker Ant 2018-05-15 03:05:03 UTC

REVISION POSTED: https://review.gluster.org/20019 (glusterd/geo-rep: Fix glusterd crash) posted (#2) for review on release-3.12 by Kotresh HR

Comment 4 Shyamsundar 2018-10-23 15:07:59 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.