Bug 1456129

Summary:	[Ganesha] : Ganesha dumps core on restarts,possible memory corruption.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	nfs-ganesha	Assignee:	Soumya Koduri <skoduri>
Status:	CLOSED ERRATA	QA Contact:	Ambarish <asoman>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, bturner, jthottan, kkeithle, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Regression
Target Release:	RHGS 3.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	nfs-ganesha-2.4.4-8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-21 04:47:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1417151

Description Ambarish 2017-05-27 06:58:41 UTC

Description of problem:
-----------------------


While freshly building a cluster on my physical machines,I saw that post gluster ganesha enable,and exporting my already existing volume via Ganesha,3/4 nodes did not show any exports via showmount.

In an attempt to get the export,I restated Ganesha.It dumped a core .HEre's the backtrace :

(gdb) bt
#0  0x00007f829b3c21d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f829b3c38c8 in __GI_abort () at abort.c:90
#2  0x00007f829b401f07 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f829b50cb48 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f829b409503 in malloc_printerr (ar_ptr=0x7f820c000020, ptr=<optimized out>, 
    str=0x7f829b50cbb8 "double free or corruption (fasttop)", action=3) at malloc.c:5013
#4  _int_free (av=0x7f820c000020, p=<optimized out>, have_lock=0) at malloc.c:3835
#5  0x00007f829d82a091 in gsh_free (p=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/abstract_mem.h:271
#6  unregister_fsal (fsal_hdl=fsal_hdl@entry=0x7f8210ade3d0 <GlusterFS+112>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_manager.c:466
#7  0x00007f82108cdfa6 in glusterfs_unload () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:172
#8  0x00007f829d5e085a in _dl_fini () at dl-fini.c:253
#9  0x00007f829b3c5a49 in __run_exit_handlers (status=status@entry=2, listp=0x7f829b7476c8 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#10 0x00007f829b3c5a95 in __GI_exit (status=status@entry=2) at exit.c:99
#11 0x00007f829d8e6335 in Fatal () at /usr/src/debug/nfs-ganesha-2.4.4/src/log/log_functions.c:312
#12 0x00007f829d8e6f68 in display_log_component_level (component=COMPONENT_FSAL, 
    file=0x7f82108d9670 "/builddir/build/BUILD/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c", line=181, 
    function=0x7f82108d9800 <__func__.23734> "glusterfs_unload", level=NIV_FATAL, format=<optimized out>, 
    arguments=arguments@entry=0x7f8211adeed0) at /usr/src/debug/nfs-ganesha-2.4.4/src/log/log_functions.c:1514
#13 0x00007f829d8e6ffa in DisplayLogComponentLevel (component=component@entry=COMPONENT_FSAL, 
    file=file@entry=0x7f82108d9670 "/builddir/build/BUILD/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c", 
    line=line@entry=181, function=function@entry=0x7f82108d9800 <__func__.23734> "glusterfs_unload", 
    level=level@entry=NIV_FATAL, 
    format=format@entry=0x7f82108d9738 "FSAL Gluster still contains active shares. Dying.. ")
    at /usr/src/debug/nfs-ganesha-2.4.4/src/log/log_functions.c:1688
#14 0x00007f82108ce02c in glusterfs_unload () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:180
#15 0x00007f829d5e54b9 in _dl_close_worker (map=map@entry=0x7f820c0030a0) at dl-close.c:266
#16 0x00007f829d5e603c in _dl_close (_map=0x7f820c0030a0) at dl-close.c:776
#17 0x00007f829d5dfff4 in _dl_catch_error (objname=0x7f81cc007c40, errstring=0x7f81cc007c48, 
---Type <return> to continue, or q <return> to quit---
    mallocedp=0x7f81cc007c38, operate=0x7f829c936070 <dlclose_doit>, args=0x7f820c0030a0) at dl-error.c:177
#18 0x00007f829c9365bd in _dlerror_run (operate=operate@entry=0x7f829c936070 <dlclose_doit>, args=0x7f820c0030a0)
    at dlerror.c:163
#19 0x00007f829c93609f in __dlclose (handle=<optimized out>) at dlclose.c:47
#20 0x00007f829d82ddef in unload_fsal (fsal_hdl=0x7f8210ade3d0 <GlusterFS+112>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
#21 0x00007f829d82f30d in destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
#22 0x00007f829d856ebf in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
#23 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
#24 0x00007f829bdb5dc5 in start_thread (arg=0x7f8211ae0700) at pthread_create.c:308
#25 0x00007f829b48473d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 
(gdb) 



Version-Release number of selected component (if applicable):
------------------------------------------------------------

nfs-ganesha-2.4.4-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-25.el7rhgs.x86_64


How reproducible:
-----------------

Fairly

Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: b7c40c38-fa47-4e18-b296-1bed9b963bd9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 tmp]#

Comment 2 Ambarish 2017-05-27 07:01:36 UTC

I've got the steps to reproduce this :


> While setting up the cluster,keep the volume in stopped state.

> Enable Ganesha.

> Start the volume at the end.


The volume won't be exported on a few nodes..Restart Ganesha,it will dump a core.

I'll fetch the logs you guys need post bumping up the log level.

Comment 3 Ambarish 2017-05-27 07:06:23 UTC

Ive setup Ganesha on stopped volumes..It had bugs in 3.2,which got fixed : https://bugzilla.redhat.com/show_bug.cgi?id=1393526.

Marking this as Regression.

Comment 8 Soumya Koduri 2017-06-06 10:17:59 UTC

From the bug description, we think that probably showmount was issued too quickly while the export is still being initialized. That may have resulted in empty list returned. When nfs-ganesha is stopped at the same time, as there is still an active export reference, it resulted in double free while unloading Gluster FSAL.

Fix for the crash reported is posted upstream for review - https://review.gerrithub.io/#/c/364217/

But if the issue wrt 'showmount' not showing the exports (even after a delay) persists, please file a new bug. Thanks!

Comment 10 Ambarish 2017-06-13 09:39:29 UTC

Cannot be verified till the Test Blocker (https://bugzilla.redhat.com/show_bug.cgi?id=1460514) is fixed.

Comment 11 Ambarish 2017-06-29 15:13:49 UTC

Works fine on nfs-ganesha-2.4.4-10,Verified.

Comment 13 errata-xmlrpc 2017-09-21 04:47:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779