Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1393526 - [Ganesha] : Ganesha crashes intermittently during nfs-ganesha restarts.
[Ganesha] : Ganesha crashes intermittently during nfs-ganesha restarts.
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: io-threads (Show other bugs)
3.2
x86_64 Linux
medium Severity high
: ---
: RHGS 3.2.0
Assigned To: Pranith Kumar K
Ambarish
:
Depends On: 1396793
Blocks: 1351528
  Show dependency treegraph
 
Reported: 2016-11-09 13:24 EST by Ambarish
Modified: 2017-03-28 02:56 EDT (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1396793 (view as bug list)
Environment:
Last Closed: 2017-03-23 02:17:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Ambarish 2016-11-09 13:24:51 EST
Description of problem:
----------------------

Post setting up Ganesha,i.e.,after installing the latest rpms,pcs auth,ganesha enable and export , nfs-ganesha crashed on 2/4 servers when I tried to restart  ganesha service..
The process came back alive,so my guess is it dumped core when Ganesha process was stopped.

*************
BT from crash
*************

(gdb) bt
#0  0x00007fb6f39e780c in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) 

The signature of the BT looks similar to the one reported in BZ#1380619.

client-io-threads was on during my testing.I'll update result after setting it to off as well in the BZ soon.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[root@gqas013 tmp]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.4-3.el7rhgs.x86_64
nfs-ganesha-2.4.1-1.el7rhgs.x86_64
[root@gqas013 tmp]# 


How reproducible:
-----------------

2/4

Steps to Reproduce:
------------------

> After a fresh install,perform steps to set up Ganesha - install rpms,pcs auth,enable Ganesha and export.

> Start the volume,restart glusterd,rpcbind and nfs-ganesha.


Actual results:
---------------

Ganesha crashed and dumped core on 2/4 servers.
The process was alive,so the core was dumped when Ganesha was stopped during the restart

Expected results:
-----------------

No crashes while restarting system services.

Additional info:
----------------

OS : RHEL 7.3

*Vol config* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7b413fd4-9775-44a2-bfa8-23d206db9dfe
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.stat-prefetch: off
server.allow-insecure: on
features.cache-invalidation: off
ganesha.enable: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable
[root@gqas013 tmp]#
Comment 3 Soumya Koduri 2016-11-09 13:33:18 EST
Ambarish,
If you happen to reproduce the issue, please take the core (using gdb) before running service stop/restart so as to compare the threads before and after the crash. Thanks!
Comment 5 Ambarish 2016-11-10 06:34:54 EST
I tried it twice,but I could not reproduce the issue post setting client-io-threads to "off".

The issue is a bit intermittent,so it's hard to say that with certainty ,though. (if that is or is not the culprit).
Comment 8 Ambarish 2016-11-11 00:05:50 EST
Soumya,

I tried the steps after keeping my volume in "Started" state,before setting up the Ganesha cluster and exporting the volume,twice on fresh setups,and I could not reproduce the crash on multiple tries of system service restarts.
Comment 9 Soumya Koduri 2016-11-11 01:03:25 EST
Thanks Amabrish. That almost confirms the theory that this crash is hit only if a volume is being exported via nfs-ganesha before it is even started. 

Since this is not a recommended configuration, lowering the priority of the bug for now.

I suspect that probably when the volume is not started, the flow shall be 

glfs_init() -> xlator_init() of all the child subvols -> and then rpc_connection to brick which shall fail.

Post which "glfs_fini" shall be called. May be since glfs_init() itself failed, graph would have not been setup and PARENT_DOWN may not have been sent to io-threads xlator, resulting in the dangling thread.

This is just the theory I have on top of my mind. Will look through the code a bit. CCin Pranith too.
Comment 19 Ambarish 2016-12-26 22:57:27 EST
I could not reproduce this crash on multiple tries

gluster : glusterfs-3.8.4-10
ganesha : 2.4.1-3

Verified.
Comment 21 errata-xmlrpc 2017-03-23 02:17:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.