Bug 1399988 - [Ganesha + Multi-Volume] : Ganesha crashes on nfs-ganesha restarts [NEEDINFO]
Summary: [Ganesha + Multi-Volume] : Ganesha crashes on nfs-ganesha restarts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-30 08:19 UTC by Ambarish
Modified: 2018-11-19 08:21 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 08:21:10 UTC
Target Upstream Version:
skoduri: needinfo? (asoman)
skoduri: needinfo? (asoman)


Attachments (Terms of Use)

Description Ambarish 2016-11-30 08:19:17 UTC
Description of problem:
------------------------

4 node cluster.

I had 3 volumes within the cluster - 2 single brick and 1 replicate.

Restarted Ganesha service and it crashed and dumped core on all nodes.

Two of them were unique :

*******
Core 1
*******

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f197ff643b1 in shutdown_export (export=0x7f1981b10a70)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:152
#2  destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:194
#3  0x00007f197ff8c27f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:446
#4  admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:466
#5  0x00007f197e4eadc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f197dbb973d in clone () from /lib64/libc.so.6
(gdb) q


******
Core 2
******

(gdb) bt
#0  0x00007f9385a75e61 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f92fa58e48f in export_release (exp_hdl=0x7f92f40d1e20)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/export.c:74
#2  0x00007f93874ee3b1 in shutdown_export (export=0x7f92f40d1e20)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:152
#3  destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:194
#4  0x00007f938751627f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:446
#5  admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:466
#6  0x00007f9385a74dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f938514373d in clone () from /lib64/libc.so.6
(gdb) q

This really looks specific to a multi-volume scenario.
I restarted Ganesha service a couple of times with single volume in the cluster,and I hit the crash reported in https://bugzilla.redhat.com/show_bug.cgi?id=1393526.The ones mentioned above look new/unreported.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64


How reproducible:
-----------------

2/2

Steps to Reproduce:
-------------------

1. Create a 4 node cluster with > 1 volume.

2. Restart Ganesha service.


Actual results:
---------------

Ganesha crashes and dumps core.

Expected results:
-------------------

Ganesha shouldn't crash.

Additional info:
-----------------

OS : RHEL 7.3

*Vol Config* :

 
Volume Name: testvol1
Type: Distribute
Volume ID: 547da16a-b56e-46d4-ab69-98b56b6aa07e
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol1_brick0
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: testvol2
Type: Distribute
Volume ID: d73c4b60-b3f6-492f-9825-02021f625cee
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol2_brick1
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: testvol3
Type: Replicate
Volume ID: e7e6890e-87aa-486a-b164-732973b4b179
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gqas009.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick2
Brick2: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas014 tmp]#

Comment 4 Daniel Gryniewicz 2016-11-30 14:40:52 UTC
This looks like a failure to start up, causing a shutdown without being fully initialized.  Are there Ganesha logs from this run?

Comment 5 Jiffin 2016-11-30 15:08:09 UTC
(In reply to Daniel Gryniewicz from comment #4)
> This looks like a failure to start up, causing a shutdown without being
> fully initialized.  Are there Ganesha logs from this run?
You can get the logs location mentioned in c#2 and issue happened in node gqas014

Reason for crash:
Here there are three export shares : testvol1,testvol2,testvol3. But out of three testvol1 & testvol3 ended up having same export id and it fails to export testvol3. During unexport it seems there stale export entry related to testvol3 and it triggers crash. So ideally during export failure ganesha should clean up
export entry during cleanup(IMO it is not happening properly)

There is another bug opened for export id issue BZ#1398257 and posted a fix conflicting export id. So that we won't end up in this kind of situaion

Comment 6 Daniel Gryniewicz 2016-11-30 15:30:43 UTC
So I was right, and this is a minor bug.  Ganesha will exit anyway, so the crash won't cause more problems than the shutdown would.  I'll work on a fix.

Comment 7 Daniel Gryniewicz 2016-11-30 17:52:39 UTC
How are the exports added and removed here?  My testing is unable to reproduce a crash.

Comment 8 Jiffin 2016-12-01 05:56:56 UTC
(In reply to Daniel Gryniewicz from comment #7)
> How are the exports added and removed here?  My testing is unable to
> reproduce a crash.

Even I didn't try to reproduce it in my setup
IMO below should result in crash for nfs-ganesha-2.4.1
Create three export block in which two of them have same Export_Id(above case it was 4,3,4) and start ganesha process. During stop it should dump a core

Comment 9 Daniel Gryniewicz 2016-12-01 13:38:29 UTC
This is exactly what I did, and I saw no crash, with several attempts.

Comment 10 Jiffin 2016-12-01 13:58:59 UTC
(In reply to Daniel Gryniewicz from comment #9)
> This is exactly what I did, and I saw no crash, with several attempts.

I will try to reproduce it on my environment and let u know


Note You need to log in before you can comment on or make changes to this bug.