Bug 1397667

Summary: coredumps found for disperse volume on all nodes hosting the bricks (most likely due to forceful umount of brick)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: posixAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED DUPLICATE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: aspandey, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-23 07:11:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-11-23 06:19:02 UTC
Description of problem:
=======================
Many nodes hosting bricks of my ecvolume have coredumps
I looked into some of them and almost all are displaying same info
I tried to recollect what could be the possible reason, and I seriously suspect it because of umount the brick forecfully using umount -l 


core was generated by `/usr/sbin/glusterfsd -s 10.70.35.239 --volfile-id ecvol.10.70.35.239.rhs-brick1'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f1f01148694 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.8.4-5.el7rhgs.x86_64
(gdb) bt
#0  0x00007f1f01148694 in vfprintf () from /lib64/libc.so.6
#1  0x00007f1f0120c8d5 in __vsnprintf_chk () from /lib64/libc.so.6
#2  0x00007f1f02a77a18 in gf_vasprintf () from /lib64/libglusterfs.so.0
#3  0x00007f1f02ac688a in gf_event () from /lib64/libglusterfs.so.0
#4  0x00007f1ef4e475f0 in posix_fs_health_check ()
   from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so
#5  0x00007f1ef4e47774 in posix_health_check_thread_proc ()
   from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so
#6  0x00007f1f018b2dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1f011f773d in clone () from /lib64/libc.so.6




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Nag Pavan Chilakam 2016-11-23 07:01:47 UTC
[root@dhcp35-37 ~]# rpm -qa|grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-5.el7rhgs.x86_64
python-gluster-3.8.4-5.el7rhgs.noarch
glusterfs-server-3.8.4-5.el7rhgs.x86_64
glusterfs-events-3.8.4-5.el7rhgs.x86_64
glusterfs-libs-3.8.4-5.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-5.el7rhgs.x86_64
glusterfs-api-3.8.4-5.el7rhgs.x86_64
glusterfs-cli-3.8.4-5.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-5.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
glusterfs-fuse-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
[root@dhcp35-37 ~]# cat /etc/redhat-*
cat: /etc/redhat-access-insights: Is a directory
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Red Hat Gluster Storage Server 3.2.0
[root@dhcp35-37 ~]#

Comment 4 Ashish Pandey 2016-11-23 07:09:16 UTC
I just checked the version and did initial investigation.

It looks like the core is getting triggered mainly from posix_fs_health_check 
when EVENT_POSIX_HEALTH_CHECK_FAILED event is being triggered.


[root@dhcp35-37 /]# gluster --version
glusterfs 3.8.4 built on Nov 11 2016 06:45:08
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


I think this is a known issue and a patch has been sent by Pranith regarding this -
http://review.gluster.org/#/c/15671/

This BZ is  duplicate to https://bugzilla.redhat.com/show_bug.cgi?id=1386097

I would like to close it as duplicate

Comment 5 Ashish Pandey 2016-11-23 07:11:47 UTC

*** This bug has been marked as a duplicate of bug 1385606 ***