Bug 1480947 - [Ganesha] : Ganesha crashed during service restarts.
[Ganesha] : Ganesha crashed during service restarts.
Status: VERIFIED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.3
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.4.0
Assigned To: Soumya Koduri
Manisha Saini
:
Depends On:
Blocks: 1503134
  Show dependency treegraph
 
Reported: 2017-08-13 01:34 EDT by Ambarish
Modified: 2018-05-11 09:55 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ambarish 2017-08-13 01:34:25 EDT
Description of problem:
-----------------------

Ganesha crashed on one of my nodes during Ganesha restart and dumped a core.

This was the BT :

(gdb) bt
#0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
#2  0x00007f4f3fdc20d6 in glusterfs_unload () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
#3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at dl-close.c:266
#4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
#5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40, errstring=0x7f4e80007c48, mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0) at dl-error.c:177
#6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0) at dlerror.c:163
#7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
#8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
#9  0x0000563251b7730d in destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
#10 0x0000563251b9ec7f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
#11 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
#12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at pthread_create.c:308
#13 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 


I am not sure I have seen this before but feel free to close this as a DUP of another D/S bug if it looks similar to an older core.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-40.el7rhgs.x86_64

How reproducible:
-----------------

Hit it only once.

Additional info:
---------------

The core will be copied to Qe-repo.

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 1ba32c0a-5631-4b06-b8d3-6bdf6a1fb110
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
Comment 4 Soumya Koduri 2017-08-14 06:34:06 EDT
(gdb) bt
#0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
#2  0x00007f4f3fdc20d6 in glusterfs_unload ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
#3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at dl-close.c:266
#4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
#5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40, errstring=0x7f4e80007c48, 
    mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0)
    at dl-error.c:177
#6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070 <dlclose_doit>, 
    args=0x563253cf01b0) at dlerror.c:163
#7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
#8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
#9  0x0000563251b7730d in destroy_fsals ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
#10 0x0000563251b9ec7f in do_shutdown ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
#11 admin_thread (UnusedArg=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
#12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at pthread_create.c:308
#13 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 2
#2  0x00007f4f3fdc20d6 in glusterfs_unload ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
(gdb) l
178		/* All the shares should have been unexported */
179		if (!glist_empty(&GlusterFS.fs_obj)) {
180			LogWarn(COMPONENT_FSAL,
181				"FSAL Gluster still contains active shares.");
182		}
183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
185	}
(gdb) 


PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The reason it failed is because that mutex lock has been taken and is in use by another thread (dbus thread) which is trying to export volume (as can be seen below).

(gdb) t a a bt

Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
#0  0x00007f4f423be1ad in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
    at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f4f3fdca79e in initiate_up_thread (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
#3  0x00007f4f3fdc376b in glusterfs_get_fs (params=..., up_ops=up_ops@entry=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
#4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>, 
    parse_node=0x7f4e84016c50, err_type=<optimized out>, up_ops=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
#5  0x0000563251c37b3f in mdcache_fsal_create_export (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
    parse_node=parse_node@entry=0x7f4e84016c50, err_type=err_type@entry=0x7f4e8b6fd1c0, 
    super_up_ops=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_main.c:281
#6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50, link_mem=0x7f4e840127c8, 
    self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
#7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized out>, 
    link_mem=<optimized out>, err_type=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
#8  0x0000563251c50720 in do_block_load (err_type=<optimized out>, param_struct=<optimized out>, 
    relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
#9  proc_block (node=<optimized out>, item=<optimized out>, link_mem=<optimized out>, 
    err_type=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
#10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
    conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0, unique=unique@entry=false, 
    err_type=err_type@entry=0x7f4e8b6fd1c0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
#11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>, reply=0x563253db7020, 
    error=0x7f4e8b6fd2e0) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
#12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0, msg=msg@entry=0x563253db71d0, 
    user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
#13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock (tree=0x563253cee990, 
    message=message@entry=0x563253db71d0, found_object=found_object@entry=0x7f4e8b6fd484)
    at dbus-object-tree.c:862
#14 0x00007f4f44106e49 in dbus_connection_dispatch (connection=connection@entry=0x563253ce7ed0)
    at dbus-connection.c:4672
#15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch (connection=0x563253ce7ed0, 
    timeout_milliseconds=timeout_milliseconds@entry=100, dispatch=dispatch@entry=1)
    at dbus-connection.c:3646
#16 0x00007f4f44107180 in dbus_connection_read_write_dispatch (connection=<optimized out>, 
    timeout_milliseconds=timeout_milliseconds@entry=100) at dbus-connection.c:3729
---Type <return> to continue, or q <return> to quit---
#17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
#18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at pthread_create.c:308
#19 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 3
#3  0x00007f4f3fdc376b in glusterfs_get_fs (params=..., up_ops=up_ops@entry=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
709		rc = initiate_up_thread(gl_fs);
(gdb) p *gl_fs
$1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname = 0x7f4e84059730 "testvol", 
  fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread = 139978102456960, 
  destroy_mode = 0 '\000'}
(gdb) 


Its strange that dbus-signal to export volume ('testvol') was issued while the process is going down. 

@Ambarish,
Were you doing any vol set or glusterd restarts at the same time?
Comment 5 Soumya Koduri 2017-08-14 07:43:30 EDT
Submitted a potential fix upstream to handle such cases (though unlikely) by ganesha - https://review.gerrithub.io/#/c/374130/
Comment 6 Ambarish 2017-08-15 06:35:16 EDT
(In reply to Soumya Koduri from comment #4)
> (gdb) bt
> #0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
> #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> #3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at
> dl-close.c:266
> #4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
> #5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40,
> errstring=0x7f4e80007c48, 
>     mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>,
> args=0x563253cf01b0)
>     at dl-error.c:177
> #6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070
> <dlclose_doit>, 
>     args=0x563253cf01b0) at dlerror.c:163
> #7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
> #8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0
> <GlusterFS+112>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
> #9  0x0000563251b7730d in destroy_fsals ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
> #10 0x0000563251b9ec7f in do_shutdown ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
> #11 admin_thread (UnusedArg=<optimized out>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
> #12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at
> pthread_create.c:308
> #13 0x00007f4f423f734d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) f 2
> #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> (gdb) l
> 178		/* All the shares should have been unexported */
> 179		if (!glist_empty(&GlusterFS.fs_obj)) {
> 180			LogWarn(COMPONENT_FSAL,
> 181				"FSAL Gluster still contains active shares.");
> 182		}
> 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> 184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
> 185	}
> (gdb) 
> 
> 
> PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The
> reason it failed is because that mutex lock has been taken and is in use by
> another thread (dbus thread) which is trying to export volume (as can be
> seen below).
> 
> (gdb) t a a bt
> 
> Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
> #0  0x00007f4f423be1ad in nanosleep () at
> ../sysdeps/unix/syscall-template.S:81
> #1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
>     at ../sysdeps/unix/sysv/linux/sleep.c:137
> #2  0x00007f4f3fdca79e in initiate_up_thread
> (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
> #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> up_ops=up_ops@entry=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> #4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0
> <GlusterFS+112>, 
>     parse_node=0x7f4e84016c50, err_type=<optimized out>,
> up_ops=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
> #5  0x0000563251c37b3f in mdcache_fsal_create_export
> (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
>     parse_node=parse_node@entry=0x7f4e84016c50,
> err_type=err_type@entry=0x7f4e8b6fd1c0, 
>     super_up_ops=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/
> mdcache_main.c:281
> #6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50,
> link_mem=0x7f4e840127c8, 
>     self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
> #7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized
> out>, 
>     link_mem=<optimized out>, err_type=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
> #8  0x0000563251c50720 in do_block_load (err_type=<optimized out>,
> param_struct=<optimized out>, 
>     relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
> #9  proc_block (node=<optimized out>, item=<optimized out>,
> link_mem=<optimized out>, 
>     err_type=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
> #10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
>     conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0,
> unique=unique@entry=false, 
>     err_type=err_type@entry=0x7f4e8b6fd1c0)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
> #11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>,
> reply=0x563253db7020, 
>     error=0x7f4e8b6fd2e0) at
> /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
> #12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0,
> msg=msg@entry=0x563253db71d0, 
>     user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
> #13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock
> (tree=0x563253cee990, 
>     message=message@entry=0x563253db71d0,
> found_object=found_object@entry=0x7f4e8b6fd484)
>     at dbus-object-tree.c:862
> #14 0x00007f4f44106e49 in dbus_connection_dispatch
> (connection=connection@entry=0x563253ce7ed0)
>     at dbus-connection.c:4672
> #15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch
> (connection=0x563253ce7ed0, 
>     timeout_milliseconds=timeout_milliseconds@entry=100,
> dispatch=dispatch@entry=1)
>     at dbus-connection.c:3646
> #16 0x00007f4f44107180 in dbus_connection_read_write_dispatch
> (connection=<optimized out>, 
>     timeout_milliseconds=timeout_milliseconds@entry=100) at
> dbus-connection.c:3729
> ---Type <return> to continue, or q <return> to quit---
> #17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
> #18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at
> pthread_create.c:308
> #19 0x00007f4f423f734d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) f 3
> #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> up_ops=up_ops@entry=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> 709		rc = initiate_up_thread(gl_fs);
> (gdb) p *gl_fs
> $1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname =
> 0x7f4e84059730 "testvol", 
>   fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread =
> 139978102456960, 
>   destroy_mode = 0 '\000'}
> (gdb) 
> 
> 
> Its strange that dbus-signal to export volume ('testvol') was issued while
> the process is going down. 
> 
> @Ambarish,
> Were you doing any vol set or glusterd restarts at the same time?



Not glusterd restarts.

But I stopped the volume and then restarted Ganesha (to flush the internal gluster/ganesha caches for my perf tests),which is when it _possibly_ crashed.

I did this multiple times for all of my metdata tests,I hit it only once,though,which is why I did not propose this as a blocker.
Comment 7 Ambarish 2017-08-15 06:36:20 EDT
(In reply to Ambarish from comment #6)
> (In reply to Soumya Koduri from comment #4)
> > (gdb) bt
> > #0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> > #1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
> > #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> > #3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at
> > dl-close.c:266
> > #4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
> > #5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40,
> > errstring=0x7f4e80007c48, 
> >     mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>,
> > args=0x563253cf01b0)
> >     at dl-error.c:177
> > #6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070
> > <dlclose_doit>, 
> >     args=0x563253cf01b0) at dlerror.c:163
> > #7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
> > #8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0
> > <GlusterFS+112>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
> > #9  0x0000563251b7730d in destroy_fsals ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
> > #10 0x0000563251b9ec7f in do_shutdown ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
> > #11 admin_thread (UnusedArg=<optimized out>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
> > #12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at
> > pthread_create.c:308
> > #13 0x00007f4f423f734d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> > (gdb) f 2
> > #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> > 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> > (gdb) l
> > 178		/* All the shares should have been unexported */
> > 179		if (!glist_empty(&GlusterFS.fs_obj)) {
> > 180			LogWarn(COMPONENT_FSAL,
> > 181				"FSAL Gluster still contains active shares.");
> > 182		}
> > 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> > 184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
> > 185	}
> > (gdb) 
> > 
> > 
> > PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The
> > reason it failed is because that mutex lock has been taken and is in use by
> > another thread (dbus thread) which is trying to export volume (as can be
> > seen below).
> > 
> > (gdb) t a a bt
> > 
> > Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
> > #0  0x00007f4f423be1ad in nanosleep () at
> > ../sysdeps/unix/syscall-template.S:81
> > #1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
> >     at ../sysdeps/unix/sysv/linux/sleep.c:137
> > #2  0x00007f4f3fdca79e in initiate_up_thread
> > (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
> > #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> > up_ops=up_ops@entry=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> > #4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0
> > <GlusterFS+112>, 
> >     parse_node=0x7f4e84016c50, err_type=<optimized out>,
> > up_ops=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
> > #5  0x0000563251c37b3f in mdcache_fsal_create_export
> > (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
> >     parse_node=parse_node@entry=0x7f4e84016c50,
> > err_type=err_type@entry=0x7f4e8b6fd1c0, 
> >     super_up_ops=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/
> > mdcache_main.c:281
> > #6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50,
> > link_mem=0x7f4e840127c8, 
> >     self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
> > #7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized
> > out>, 
> >     link_mem=<optimized out>, err_type=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
> > #8  0x0000563251c50720 in do_block_load (err_type=<optimized out>,
> > param_struct=<optimized out>, 
> >     relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
> > #9  proc_block (node=<optimized out>, item=<optimized out>,
> > link_mem=<optimized out>, 
> >     err_type=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
> > #10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
> >     conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0,
> > unique=unique@entry=false, 
> >     err_type=err_type@entry=0x7f4e8b6fd1c0)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
> > #11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>,
> > reply=0x563253db7020, 
> >     error=0x7f4e8b6fd2e0) at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
> > #12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0,
> > msg=msg@entry=0x563253db71d0, 
> >     user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
> > #13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock
> > (tree=0x563253cee990, 
> >     message=message@entry=0x563253db71d0,
> > found_object=found_object@entry=0x7f4e8b6fd484)
> >     at dbus-object-tree.c:862
> > #14 0x00007f4f44106e49 in dbus_connection_dispatch
> > (connection=connection@entry=0x563253ce7ed0)
> >     at dbus-connection.c:4672
> > #15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch
> > (connection=0x563253ce7ed0, 
> >     timeout_milliseconds=timeout_milliseconds@entry=100,
> > dispatch=dispatch@entry=1)
> >     at dbus-connection.c:3646
> > #16 0x00007f4f44107180 in dbus_connection_read_write_dispatch
> > (connection=<optimized out>, 
> >     timeout_milliseconds=timeout_milliseconds@entry=100) at
> > dbus-connection.c:3729
> > ---Type <return> to continue, or q <return> to quit---
> > #17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
> > #18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at
> > pthread_create.c:308
> > #19 0x00007f4f423f734d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> > (gdb) f 3
> > #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> > up_ops=up_ops@entry=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> > 709		rc = initiate_up_thread(gl_fs);
> > (gdb) p *gl_fs
> > $1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname =
> > 0x7f4e84059730 "testvol", 
> >   fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread =
> > 139978102456960, 
> >   destroy_mode = 0 '\000'}
> > (gdb) 
> > 
> > 
> > Its strange that dbus-signal to export volume ('testvol') was issued while
> > the process is going down. 
> > 
> > @Ambarish,
> > Were you doing any vol set or glusterd restarts at the same time?
> 
> 
> 
> Not glusterd restarts.
> 
> But I stopped the volume and then restarted Ganesha (to flush the internal
> gluster/ganesha caches for my perf tests),which is when it _possibly_
> crashed.
> 
> I did this multiple times for all of my metdata tests,I hit it only
> once,though,which is why I did not propose this as a blocker.



I meant I restarted the volume,which would have caused the export/unexport that you see.

Note You need to log in before you can comment on or make changes to this bug.