Bug 1480947

Summary: [Ganesha] : Ganesha crashed during service restarts.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, msaini, rhinduja, rhs-bugs, sheggodu, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:53:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503134    

Description Ambarish 2017-08-13 05:34:25 UTC
Description of problem:
-----------------------

Ganesha crashed on one of my nodes during Ganesha restart and dumped a core.

This was the BT :

(gdb) bt
#0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
#2  0x00007f4f3fdc20d6 in glusterfs_unload () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
#3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at dl-close.c:266
#4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
#5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40, errstring=0x7f4e80007c48, mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0) at dl-error.c:177
#6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0) at dlerror.c:163
#7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
#8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
#9  0x0000563251b7730d in destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
#10 0x0000563251b9ec7f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
#11 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
#12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at pthread_create.c:308
#13 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 


I am not sure I have seen this before but feel free to close this as a DUP of another D/S bug if it looks similar to an older core.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-40.el7rhgs.x86_64

How reproducible:
-----------------

Hit it only once.

Additional info:
---------------

The core will be copied to Qe-repo.

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 1ba32c0a-5631-4b06-b8d3-6bdf6a1fb110
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 4 Soumya Koduri 2017-08-14 10:34:06 UTC
(gdb) bt
#0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
#2  0x00007f4f3fdc20d6 in glusterfs_unload ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
#3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at dl-close.c:266
#4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
#5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40, errstring=0x7f4e80007c48, 
    mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>, args=0x563253cf01b0)
    at dl-error.c:177
#6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070 <dlclose_doit>, 
    args=0x563253cf01b0) at dlerror.c:163
#7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
#8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
#9  0x0000563251b7730d in destroy_fsals ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
#10 0x0000563251b9ec7f in do_shutdown ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
#11 admin_thread (UnusedArg=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
#12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at pthread_create.c:308
#13 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 2
#2  0x00007f4f3fdc20d6 in glusterfs_unload ()
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
(gdb) l
178		/* All the shares should have been unexported */
179		if (!glist_empty(&GlusterFS.fs_obj)) {
180			LogWarn(COMPONENT_FSAL,
181				"FSAL Gluster still contains active shares.");
182		}
183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
185	}
(gdb) 


PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The reason it failed is because that mutex lock has been taken and is in use by another thread (dbus thread) which is trying to export volume (as can be seen below).

(gdb) t a a bt

Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
#0  0x00007f4f423be1ad in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
    at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f4f3fdca79e in initiate_up_thread (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
#3  0x00007f4f3fdc376b in glusterfs_get_fs (params=..., up_ops=up_ops@entry=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
#4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0 <GlusterFS+112>, 
    parse_node=0x7f4e84016c50, err_type=<optimized out>, up_ops=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
#5  0x0000563251c37b3f in mdcache_fsal_create_export (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
    parse_node=parse_node@entry=0x7f4e84016c50, err_type=err_type@entry=0x7f4e8b6fd1c0, 
    super_up_ops=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_main.c:281
#6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50, link_mem=0x7f4e840127c8, 
    self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
#7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized out>, 
    link_mem=<optimized out>, err_type=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
#8  0x0000563251c50720 in do_block_load (err_type=<optimized out>, param_struct=<optimized out>, 
    relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
#9  proc_block (node=<optimized out>, item=<optimized out>, link_mem=<optimized out>, 
    err_type=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
#10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
    conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0, unique=unique@entry=false, 
    err_type=err_type@entry=0x7f4e8b6fd1c0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
#11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>, reply=0x563253db7020, 
    error=0x7f4e8b6fd2e0) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
#12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0, msg=msg@entry=0x563253db71d0, 
    user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
#13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock (tree=0x563253cee990, 
    message=message@entry=0x563253db71d0, found_object=found_object@entry=0x7f4e8b6fd484)
    at dbus-object-tree.c:862
#14 0x00007f4f44106e49 in dbus_connection_dispatch (connection=connection@entry=0x563253ce7ed0)
    at dbus-connection.c:4672
#15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch (connection=0x563253ce7ed0, 
    timeout_milliseconds=timeout_milliseconds@entry=100, dispatch=dispatch@entry=1)
    at dbus-connection.c:3646
#16 0x00007f4f44107180 in dbus_connection_read_write_dispatch (connection=<optimized out>, 
    timeout_milliseconds=timeout_milliseconds@entry=100) at dbus-connection.c:3729
---Type <return> to continue, or q <return> to quit---
#17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
#18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at pthread_create.c:308
#19 0x00007f4f423f734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 3
#3  0x00007f4f3fdc376b in glusterfs_get_fs (params=..., up_ops=up_ops@entry=0x7f4e840129e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
709		rc = initiate_up_thread(gl_fs);
(gdb) p *gl_fs
$1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname = 0x7f4e84059730 "testvol", 
  fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread = 139978102456960, 
  destroy_mode = 0 '\000'}
(gdb) 


Its strange that dbus-signal to export volume ('testvol') was issued while the process is going down. 

@Ambarish,
Were you doing any vol set or glusterd restarts at the same time?

Comment 5 Soumya Koduri 2017-08-14 11:43:30 UTC
Submitted a potential fix upstream to handle such cases (though unlikely) by ganesha - https://review.gerrithub.io/#/c/374130/

Comment 6 Ambarish 2017-08-15 10:35:16 UTC
(In reply to Soumya Koduri from comment #4)
> (gdb) bt
> #0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
> #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> #3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at
> dl-close.c:266
> #4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
> #5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40,
> errstring=0x7f4e80007c48, 
>     mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>,
> args=0x563253cf01b0)
>     at dl-error.c:177
> #6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070
> <dlclose_doit>, 
>     args=0x563253cf01b0) at dlerror.c:163
> #7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
> #8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0
> <GlusterFS+112>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
> #9  0x0000563251b7730d in destroy_fsals ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
> #10 0x0000563251b9ec7f in do_shutdown ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
> #11 admin_thread (UnusedArg=<optimized out>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
> #12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at
> pthread_create.c:308
> #13 0x00007f4f423f734d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) f 2
> #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> (gdb) l
> 178		/* All the shares should have been unexported */
> 179		if (!glist_empty(&GlusterFS.fs_obj)) {
> 180			LogWarn(COMPONENT_FSAL,
> 181				"FSAL Gluster still contains active shares.");
> 182		}
> 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> 184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
> 185	}
> (gdb) 
> 
> 
> PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The
> reason it failed is because that mutex lock has been taken and is in use by
> another thread (dbus thread) which is trying to export volume (as can be
> seen below).
> 
> (gdb) t a a bt
> 
> Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
> #0  0x00007f4f423be1ad in nanosleep () at
> ../sysdeps/unix/syscall-template.S:81
> #1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
>     at ../sysdeps/unix/sysv/linux/sleep.c:137
> #2  0x00007f4f3fdca79e in initiate_up_thread
> (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
> #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> up_ops=up_ops@entry=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> #4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0
> <GlusterFS+112>, 
>     parse_node=0x7f4e84016c50, err_type=<optimized out>,
> up_ops=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
> #5  0x0000563251c37b3f in mdcache_fsal_create_export
> (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
>     parse_node=parse_node@entry=0x7f4e84016c50,
> err_type=err_type@entry=0x7f4e8b6fd1c0, 
>     super_up_ops=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/
> mdcache_main.c:281
> #6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50,
> link_mem=0x7f4e840127c8, 
>     self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
> #7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized
> out>, 
>     link_mem=<optimized out>, err_type=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
> #8  0x0000563251c50720 in do_block_load (err_type=<optimized out>,
> param_struct=<optimized out>, 
>     relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
> #9  proc_block (node=<optimized out>, item=<optimized out>,
> link_mem=<optimized out>, 
>     err_type=<optimized out>)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
> #10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
>     conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0,
> unique=unique@entry=false, 
>     err_type=err_type@entry=0x7f4e8b6fd1c0)
>     at
> /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
> #11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>,
> reply=0x563253db7020, 
>     error=0x7f4e8b6fd2e0) at
> /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
> #12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0,
> msg=msg@entry=0x563253db71d0, 
>     user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
> #13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock
> (tree=0x563253cee990, 
>     message=message@entry=0x563253db71d0,
> found_object=found_object@entry=0x7f4e8b6fd484)
>     at dbus-object-tree.c:862
> #14 0x00007f4f44106e49 in dbus_connection_dispatch
> (connection=connection@entry=0x563253ce7ed0)
>     at dbus-connection.c:4672
> #15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch
> (connection=0x563253ce7ed0, 
>     timeout_milliseconds=timeout_milliseconds@entry=100,
> dispatch=dispatch@entry=1)
>     at dbus-connection.c:3646
> #16 0x00007f4f44107180 in dbus_connection_read_write_dispatch
> (connection=<optimized out>, 
>     timeout_milliseconds=timeout_milliseconds@entry=100) at
> dbus-connection.c:3729
> ---Type <return> to continue, or q <return> to quit---
> #17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
> #18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at
> pthread_create.c:308
> #19 0x00007f4f423f734d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) f 3
> #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> up_ops=up_ops@entry=0x7f4e840129e0)
>     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> 709		rc = initiate_up_thread(gl_fs);
> (gdb) p *gl_fs
> $1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname =
> 0x7f4e84059730 "testvol", 
>   fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread =
> 139978102456960, 
>   destroy_mode = 0 '\000'}
> (gdb) 
> 
> 
> Its strange that dbus-signal to export volume ('testvol') was issued while
> the process is going down. 
> 
> @Ambarish,
> Were you doing any vol set or glusterd restarts at the same time?



Not glusterd restarts.

But I stopped the volume and then restarted Ganesha (to flush the internal gluster/ganesha caches for my perf tests),which is when it _possibly_ crashed.

I did this multiple times for all of my metdata tests,I hit it only once,though,which is why I did not propose this as a blocker.

Comment 7 Ambarish 2017-08-15 10:36:20 UTC
(In reply to Ambarish from comment #6)
> (In reply to Soumya Koduri from comment #4)
> > (gdb) bt
> > #0  0x00007f4f423341f7 in __GI_raise (sig=sig@entry=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> > #1  0x00007f4f423358e8 in __GI_abort () at abort.c:90
> > #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> > #3  0x00007f4f4455b7d9 in _dl_close_worker (map=map@entry=0x563253cf01b0) at
> > dl-close.c:266
> > #4  0x00007f4f4455c35c in _dl_close (_map=0x563253cf01b0) at dl-close.c:776
> > #5  0x00007f4f44556314 in _dl_catch_error (objname=0x7f4e80007c40,
> > errstring=0x7f4e80007c48, 
> >     mallocedp=0x7f4e80007c38, operate=0x7f4f438ab070 <dlclose_doit>,
> > args=0x563253cf01b0)
> >     at dl-error.c:177
> > #6  0x00007f4f438ab5bd in _dlerror_run (operate=operate@entry=0x7f4f438ab070
> > <dlclose_doit>, 
> >     args=0x563253cf01b0) at dlerror.c:163
> > #7  0x00007f4f438ab09f in __dlclose (handle=<optimized out>) at dlclose.c:47
> > #8  0x0000563251b75def in unload_fsal (fsal_hdl=0x7f4f3ffd23d0
> > <GlusterFS+112>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/default_methods.c:111
> > #9  0x0000563251b7730d in destroy_fsals ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_destroyer.c:222
> > #10 0x0000563251b9ec7f in do_shutdown ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:446
> > #11 admin_thread (UnusedArg=<optimized out>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_admin_thread.c:466
> > #12 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8aefd700) at
> > pthread_create.c:308
> > #13 0x00007f4f423f734d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> > (gdb) f 2
> > #2  0x00007f4f3fdc20d6 in glusterfs_unload ()
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/main.c:183
> > 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> > (gdb) l
> > 178		/* All the shares should have been unexported */
> > 179		if (!glist_empty(&GlusterFS.fs_obj)) {
> > 180			LogWarn(COMPONENT_FSAL,
> > 181				"FSAL Gluster still contains active shares.");
> > 182		}
> > 183		PTHREAD_MUTEX_destroy(&GlusterFS.lock);
> > 184		LogDebug(COMPONENT_FSAL, "FSAL Gluster unloaded");
> > 185	}
> > (gdb) 
> > 
> > 
> > PTHREAD_MUTEX_destroy(&GlusterFS.lock) failed resulting in this crash. The
> > reason it failed is because that mutex lock has been taken and is in use by
> > another thread (dbus thread) which is trying to export volume (as can be
> > seen below).
> > 
> > (gdb) t a a bt
> > 
> > Thread 15 (Thread 0x7f4e8b6fe700 (LWP 24760)):
> > #0  0x00007f4f423be1ad in nanosleep () at
> > ../sysdeps/unix/syscall-template.S:81
> > #1  0x00007f4f423be044 in __sleep (seconds=0, seconds@entry=1)
> >     at ../sysdeps/unix/sysv/linux/sleep.c:137
> > #2  0x00007f4f3fdca79e in initiate_up_thread
> > (gl_fs=gl_fs@entry=0x7f4e8400b9b0)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/gluster_internal.c:468
> > #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> > up_ops=up_ops@entry=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> > #4  0x00007f4f3fdc3c74 in glusterfs_create_export (fsal_hdl=0x7f4f3ffd23d0
> > <GlusterFS+112>, 
> >     parse_node=0x7f4e84016c50, err_type=<optimized out>,
> > up_ops=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:778
> > #5  0x0000563251c37b3f in mdcache_fsal_create_export
> > (sub_fsal=0x7f4f3ffd23d0 <GlusterFS+112>, 
> >     parse_node=parse_node@entry=0x7f4e84016c50,
> > err_type=err_type@entry=0x7f4e8b6fd1c0, 
> >     super_up_ops=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/
> > mdcache_main.c:281
> > #6  0x0000563251c17e6f in fsal_cfg_commit (node=0x7f4e84016c50,
> > link_mem=0x7f4e840127c8, 
> >     self_struct=<optimized out>, err_type=0x7f4e8b6fd1c0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/support/exports.c:755
> > #7  0x0000563251c512a8 in proc_block (node=<optimized out>, item=<optimized
> > out>, 
> >     link_mem=<optimized out>, err_type=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1337
> > #8  0x0000563251c50720 in do_block_load (err_type=<optimized out>,
> > param_struct=<optimized out>, 
> >     relax=<optimized out>, params=<optimized out>, blk=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1195
> > #9  proc_block (node=<optimized out>, item=<optimized out>,
> > link_mem=<optimized out>, 
> >     err_type=<optimized out>)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1321
> > #10 0x0000563251c51a09 in load_config_from_node (tree_node=0x7f4e840173b0, 
> >     conf_blk=0x563251ea5240 <add_export_param>, param=param@entry=0x0,
> > unique=unique@entry=false, 
> >     err_type=err_type@entry=0x7f4e8b6fd1c0)
> >     at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/config_parsing/config_parsing.c:1836
> > #11 0x0000563251c27557 in gsh_export_addexport (args=<optimized out>,
> > reply=0x563253db7020, 
> >     error=0x7f4e8b6fd2e0) at
> > /usr/src/debug/nfs-ganesha-2.4.4/src/support/export_mgr.c:967
> > #12 0x0000563251c4c9a9 in dbus_message_entrypoint (conn=0x563253ce7ed0,
> > msg=msg@entry=0x563253db71d0, 
> >     user_data=user_data@entry=0x563251ea6ce0 <export_interfaces>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:512
> > #13 0x00007f4f44114c76 in _dbus_object_tree_dispatch_and_unlock
> > (tree=0x563253cee990, 
> >     message=message@entry=0x563253db71d0,
> > found_object=found_object@entry=0x7f4e8b6fd484)
> >     at dbus-object-tree.c:862
> > #14 0x00007f4f44106e49 in dbus_connection_dispatch
> > (connection=connection@entry=0x563253ce7ed0)
> >     at dbus-connection.c:4672
> > #15 0x00007f4f441070e2 in _dbus_connection_read_write_dispatch
> > (connection=0x563253ce7ed0, 
> >     timeout_milliseconds=timeout_milliseconds@entry=100,
> > dispatch=dispatch@entry=1)
> >     at dbus-connection.c:3646
> > #16 0x00007f4f44107180 in dbus_connection_read_write_dispatch
> > (connection=<optimized out>, 
> >     timeout_milliseconds=timeout_milliseconds@entry=100) at
> > dbus-connection.c:3729
> > ---Type <return> to continue, or q <return> to quit---
> > #17 0x0000563251c4da71 in gsh_dbus_thread (arg=<optimized out>)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/dbus/dbus_server.c:737
> > #18 0x00007f4f42d29e25 in start_thread (arg=0x7f4e8b6fe700) at
> > pthread_create.c:308
> > #19 0x00007f4f423f734d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> > (gdb) f 3
> > #3  0x00007f4f3fdc376b in glusterfs_get_fs (params=...,
> > up_ops=up_ops@entry=0x7f4e840129e0)
> >     at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/export.c:709
> > 709		rc = initiate_up_thread(gl_fs);
> > (gdb) p *gl_fs
> > $1 = {fs_obj = {next = 0x7f4e8400b9b0, prev = 0x7f4e8400b9b0}, volname =
> > 0x7f4e84059730 "testvol", 
> >   fs = 0x7f4e84012c80, up_ops = 0x7f4e840129e0, refcnt = 0, up_thread =
> > 139978102456960, 
> >   destroy_mode = 0 '\000'}
> > (gdb) 
> > 
> > 
> > Its strange that dbus-signal to export volume ('testvol') was issued while
> > the process is going down. 
> > 
> > @Ambarish,
> > Were you doing any vol set or glusterd restarts at the same time?
> 
> 
> 
> Not glusterd restarts.
> 
> But I stopped the volume and then restarted Ganesha (to flush the internal
> gluster/ganesha caches for my perf tests),which is when it _possibly_
> crashed.
> 
> I did this multiple times for all of my metdata tests,I hit it only
> once,though,which is why I did not propose this as a blocker.



I meant I restarted the volume,which would have caused the export/unexport that you see.

Comment 13 errata-xmlrpc 2018-09-04 06:53:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610