1397667 – coredumps found for disperse volume on all nodes hosting the bricks (most likely due to forceful umount of brick)

Bug 1397667 - coredumps found for disperse volume on all nodes hosting the bricks (most likely due to forceful umount of brick)

Summary: coredumps found for disperse volume on all nodes hosting the bricks (most lik...

Keywords:
Status:	CLOSED DUPLICATE of bug 1385606
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	posix
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-23 06:19 UTC by Nag Pavan Chilakam
Modified:	2016-11-23 07:11 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-23 07:11:47 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2016-11-23 06:19:02 UTC

Description of problem:
=======================
Many nodes hosting bricks of my ecvolume have coredumps
I looked into some of them and almost all are displaying same info
I tried to recollect what could be the possible reason, and I seriously suspect it because of umount the brick forecfully using umount -l 


core was generated by `/usr/sbin/glusterfsd -s 10.70.35.239 --volfile-id ecvol.10.70.35.239.rhs-brick1'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f1f01148694 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.8.4-5.el7rhgs.x86_64
(gdb) bt
#0  0x00007f1f01148694 in vfprintf () from /lib64/libc.so.6
#1  0x00007f1f0120c8d5 in __vsnprintf_chk () from /lib64/libc.so.6
#2  0x00007f1f02a77a18 in gf_vasprintf () from /lib64/libglusterfs.so.0
#3  0x00007f1f02ac688a in gf_event () from /lib64/libglusterfs.so.0
#4  0x00007f1ef4e475f0 in posix_fs_health_check ()
   from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so
#5  0x00007f1ef4e47774 in posix_health_check_thread_proc ()
   from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so
#6  0x00007f1f018b2dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1f011f773d in clone () from /lib64/libc.so.6




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Nag Pavan Chilakam 2016-11-23 07:01:47 UTC

[root@dhcp35-37 ~]# rpm -qa|grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-5.el7rhgs.x86_64
python-gluster-3.8.4-5.el7rhgs.noarch
glusterfs-server-3.8.4-5.el7rhgs.x86_64
glusterfs-events-3.8.4-5.el7rhgs.x86_64
glusterfs-libs-3.8.4-5.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-5.el7rhgs.x86_64
glusterfs-api-3.8.4-5.el7rhgs.x86_64
glusterfs-cli-3.8.4-5.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-5.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
glusterfs-fuse-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
[root@dhcp35-37 ~]# cat /etc/redhat-*
cat: /etc/redhat-access-insights: Is a directory
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Red Hat Gluster Storage Server 3.2.0
[root@dhcp35-37 ~]#

Comment 4 Ashish Pandey 2016-11-23 07:09:16 UTC

I just checked the version and did initial investigation.

It looks like the core is getting triggered mainly from posix_fs_health_check 
when EVENT_POSIX_HEALTH_CHECK_FAILED event is being triggered.


[root@dhcp35-37 /]# gluster --version
glusterfs 3.8.4 built on Nov 11 2016 06:45:08
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


I think this is a known issue and a patch has been sent by Pranith regarding this -
http://review.gluster.org/#/c/15671/

This BZ is  duplicate to https://bugzilla.redhat.com/show_bug.cgi?id=1386097

I would like to close it as duplicate

Comment 5 Ashish Pandey 2016-11-23 07:11:47 UTC


*** This bug has been marked as a duplicate of bug 1385606 ***

Note You need to log in before you can comment on or make changes to this bug.