1393526 – [Ganesha] : Ganesha crashes intermittently during nfs-ganesha restarts.

Bug 1393526 - [Ganesha] : Ganesha crashes intermittently during nfs-ganesha restarts.

Summary: [Ganesha] : Ganesha crashes intermittently during nfs-ganesha restarts.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	io-threads
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Pranith Kumar K
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:	1396793
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-11-09 18:24 UTC by Ambarish
Modified:	2017-03-28 06:56 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.8.4-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1396793 (view as bug list)
Environment:
Last Closed:	2017-03-23 06:17:50 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Ambarish 2016-11-09 18:24:51 UTC

Description of problem:
----------------------

Post setting up Ganesha,i.e.,after installing the latest rpms,pcs auth,ganesha enable and export , nfs-ganesha crashed on 2/4 servers when I tried to restart  ganesha service..
The process came back alive,so my guess is it dumped core when Ganesha process was stopped.

*************
BT from crash
*************

(gdb) bt
#0  0x00007fb6f39e780c in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) 

The signature of the BT looks similar to the one reported in BZ#1380619.

client-io-threads was on during my testing.I'll update result after setting it to off as well in the BZ soon.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[root@gqas013 tmp]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.4-3.el7rhgs.x86_64
nfs-ganesha-2.4.1-1.el7rhgs.x86_64
[root@gqas013 tmp]# 


How reproducible:
-----------------

2/4

Steps to Reproduce:
------------------

> After a fresh install,perform steps to set up Ganesha - install rpms,pcs auth,enable Ganesha and export.

> Start the volume,restart glusterd,rpcbind and nfs-ganesha.


Actual results:
---------------

Ganesha crashed and dumped core on 2/4 servers.
The process was alive,so the core was dumped when Ganesha was stopped during the restart

Expected results:
-----------------

No crashes while restarting system services.

Additional info:
----------------

OS : RHEL 7.3

*Vol config* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7b413fd4-9775-44a2-bfa8-23d206db9dfe
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.stat-prefetch: off
server.allow-insecure: on
features.cache-invalidation: off
ganesha.enable: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable
[root@gqas013 tmp]#

Comment 3 Soumya Koduri 2016-11-09 18:33:18 UTC

Ambarish,
If you happen to reproduce the issue, please take the core (using gdb) before running service stop/restart so as to compare the threads before and after the crash. Thanks!

Comment 5 Ambarish 2016-11-10 11:34:54 UTC

I tried it twice,but I could not reproduce the issue post setting client-io-threads to "off".

The issue is a bit intermittent,so it's hard to say that with certainty ,though. (if that is or is not the culprit).

Comment 8 Ambarish 2016-11-11 05:05:50 UTC

Soumya,

I tried the steps after keeping my volume in "Started" state,before setting up the Ganesha cluster and exporting the volume,twice on fresh setups,and I could not reproduce the crash on multiple tries of system service restarts.

Comment 9 Soumya Koduri 2016-11-11 06:03:25 UTC

Thanks Amabrish. That almost confirms the theory that this crash is hit only if a volume is being exported via nfs-ganesha before it is even started. 

Since this is not a recommended configuration, lowering the priority of the bug for now.

I suspect that probably when the volume is not started, the flow shall be 

glfs_init() -> xlator_init() of all the child subvols -> and then rpc_connection to brick which shall fail.

Post which "glfs_fini" shall be called. May be since glfs_init() itself failed, graph would have not been setup and PARENT_DOWN may not have been sent to io-threads xlator, resulting in the dangling thread.

This is just the theory I have on top of my mind. Will look through the code a bit. CCin Pranith too.

Comment 19 Ambarish 2016-12-27 03:57:27 UTC

I could not reproduce this crash on multiple tries

gluster : glusterfs-3.8.4-10
ganesha : 2.4.1-3

Verified.

Comment 21 errata-xmlrpc 2017-03-23 06:17:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.