1238118 – nfs-ganesha: coredump for ganesha process post executing the volume start twice

Bug 1238118 - nfs-ganesha: coredump for ganesha process post executing the volume start twice

Summary: nfs-ganesha: coredump for ganesha process post executing the volume start twice

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Jiffin
QA Contact:	Apeksha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1216951 1251815
TreeView+	depends on / blocked

Reported:	2015-07-01 09:00 UTC by Saurabh
Modified:	2016-01-19 06:15 UTC (History)
CC List:	14 users (show)
Fixed In Version:	glusterfs-3.7.1-12
Doc Type:	Bug Fix
Doc Text:	Previously, when DBus signals were sent multiple times in succession for a volume that is already exported, caused NFS-Ganesha service crash. With this fix, NFS-Ganesha service does not crash.
Clone Of:
Environment:
Last Closed:	2015-10-05 07:17:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
coredump of nfs-ganesha process (4.41 MB, application/x-xz) 2015-07-01 09:00 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1845	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 11:06:22 UTC

Description Saurabh 2015-07-01 09:00:21 UTC

Created attachment 1044954 [details]
coredump of nfs-ganesha process

Description of problem:
I tried to execute volume start twice and saw a nfs-ganesha coredump.

(gdb) bt
#0  0x00000030ba632625 in raise () from /lib64/libc.so.6
#1  0x00000030ba633e05 in abort () from /lib64/libc.so.6
#2  0x00000030ba62b74e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000030ba62b810 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000051a2c1 in free_export ()
#5  0x00000000005070b9 in export_init ()
#6  0x0000000000534597 in proc_block ()
#7  0x000000000053526d in load_config_from_node ()
#8  0x000000000051c393 in gsh_export_addexport ()
#9  0x000000000052ed50 in dbus_message_entrypoint ()
#10 0x00000030bda1cefe in ?? () from /lib64/libdbus-1.so.3
#11 0x00000030bda10b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#12 0x00000030bda10dd9 in ?? () from /lib64/libdbus-1.so.3
#13 0x000000000052f913 in gsh_dbus_thread ()
#14 0x00000030baa07a51 in start_thread () from /lib64/libpthread.so.0
#15 0x00000030ba6e896d in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-6.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
always


Actual results:
coredump as mentioned with description section.

ganesha.log,
01/07/2015 13:59:32 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol2 exported at : '/'
01/07/2015 13:59:37 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Pseudo path (/vol2) is a duplicate
01/07/2015 13:59:37 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Duplicate export id = 12


Expected results:
volume start is not suppose to cause issue with nfs-ganesha process.

Additional info:

Comment 2 Apeksha 2015-07-02 12:40:12 UTC

Hit the same backtrace again, but with different scenario:
(gdb) bt
#0  0x0000003a9dc32625 in raise () from /lib64/libc.so.6
#1  0x0000003a9dc33e05 in abort () from /lib64/libc.so.6
#2  0x0000003a9dc2b74e in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003a9dc2b810 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000051a2c1 in free_export ()
#5  0x00000000005070b9 in export_init ()
#6  0x0000000000534597 in proc_block ()
#7  0x000000000053526d in load_config_from_node ()
#8  0x000000000051c393 in gsh_export_addexport ()
#9  0x000000000052ed50 in dbus_message_entrypoint ()
#10 0x0000003aa041cefe in ?? () from /lib64/libdbus-1.so.3
#11 0x0000003aa0410b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#12 0x0000003aa0410dd9 in ?? () from /lib64/libdbus-1.so.3
#13 0x000000000052f913 in gsh_dbus_thread ()
#14 0x0000003a9e007a51 in start_thread () from /lib64/libpthread.so.0
#15 0x0000003a9dce896d in clone () from /lib64/libc.so.6


Steps:
Run an automated test for selfheal:

1. Create a 6x2 dit-rep volume, enable ganesha and mount it
2. While creating some directories/files, kill 1 brick process from each of replica-pair
3. Allow I/O to complete and start self-heal
4. Self-heal completes successfully
5. Create a new volume again, mount fails
6. Ganesha process crashes on all the nodes


Not raising a new bug since the backtrace is same

Comment 3 Soumya Koduri 2015-07-06 10:01:00 UTC

This issue is fixed as part of bug1237053

Comment 4 Meghana 2015-07-06 10:07:31 UTC

This issue is fixed and it works only when SELinux is in permissive mode

Comment 6 Meghana 2015-07-09 07:28:02 UTC

With the SElinux workaround, Apeksha has been able to verify the bug. Keeping the bug state the same until the SElinux fix is available in a build for RHEL6.7

Comment 7 Niels de Vos 2015-07-09 07:41:34 UTC

(In reply to Soumya Koduri from comment #3)
> This issue is fixed as part of bug1237053

Can this be closed as a duplicate?

Comment 8 Apeksha 2015-07-09 08:26:45 UTC

In enforcing mode after using the workaround mentioned in the bug -https://bugzilla.redhat.com/show_bug.cgi?id=1239017 , i dont see any avc showmount errors.
Also tried executing volume force multiple times, ganesha process dint crash

Comment 9 Meghana 2015-07-13 06:11:24 UTC

The fix is already available downstream but it does not work when SElinux is in enforcing mode. It works fine in permissive mode. The fix will be available when we have the next SElinux build.

Comment 10 monti lawrence 2015-07-23 14:45:50 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 11 Soumya Koduri 2015-07-27 09:29:56 UTC

Updated the doc text. Kindly verify the same.

Comment 12 Anjana Suparna Sriram 2015-07-27 18:24:00 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 13 Soumya Koduri 2015-07-28 07:36:32 UTC

doc text looks good to me.

Comment 15 Rejy M Cyriac 2015-08-10 04:13:28 UTC

Attached to RHGS 3.1 Update 1 (z-stream) Tracker BZ

Comment 16 Meghana 2015-08-13 07:24:21 UTC

Since the SElinux fixes are available, moving it to ON_QA.

Comment 17 Apeksha 2015-08-27 06:57:31 UTC

Verified on glusterfs-3.7.1-12.el7rhgs.x86_64

Comment 19 Divya 2015-09-29 06:02:39 UTC

Jiffin,

Could you review and sign-off the edited doc text?

Comment 20 Jiffin 2015-09-29 06:24:14 UTC

Looks good to me, verified

Comment 22 errata-xmlrpc 2015-10-05 07:17:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.