Description of problem: refresh-config fails and crashes ganesha when mdcache is enabled on the volume. Version-Release number of selected component (if applicable): [root@dhcp43-92 ~]# rpm -qa|grep glusterfs glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-rdma-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-server-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-api-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-libs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-fuse-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-cli-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 [root@dhcp43-92 ~]# rpm -qa|grep ganesha nfs-ganesha-debuginfo-2.4.0-2.el7rhgs.x86_64 nfs-ganesha-2.4.0-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.0-2.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a ganesha cluster and create a volume. 2.Enable ganesha on the volume and enable md-cache related parameters: # gluster volume set <volname> features.cache-invalidation on # gluster volume set <volname> features.cache-invalidation-timeout 600 # gluster volume set <volname> performance.stat-prefetch on # gluster volume set <volname> performance.cache-invalidation on # gluster volume set <volname> performance.md-cache-timeout 600 3. Disable performance.client-io-threads on the volume. [root@dhcp43-92 ~]# gluster vol get mdcache all | grep client-io performance.client-io-threads off [root@dhcp43-92 ~]# gluster vol info mdcache Volume Name: mdcache Type: Distributed-Replicate Volume ID: 8669b7b9-209c-4530-a9cb-bb4f3a6f370c Status: Started Snapshot Count: 0 Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.43.92:/bricks/brick0/b0 Brick2: 10.70.42.170:/bricks/brick0/b0 Brick3: 10.70.43.145:/bricks/brick0/b0 Brick4: 10.70.42.183:/bricks/brick0/b0 Brick5: 10.70.43.92:/bricks/brick1/b1 Brick6: 10.70.42.170:/bricks/brick1/b1 Brick7: 10.70.43.145:/bricks/brick1/b1 Brick8: 10.70.42.183:/bricks/brick1/b1 Brick9: 10.70.43.92:/bricks/brick2/b2 Brick10: 10.70.42.170:/bricks/brick2/b2 Brick11: 10.70.43.145:/bricks/brick2/b2 Brick12: 10.70.42.183:/bricks/brick2/b2 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.cache-invalidation: on ganesha.enable: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 performance.client-io-threads: off cluster.enable-shared-storage: enable nfs-ganesha: enable 3. Perform refresh-config from one of the client. [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Refresh-config completed on dhcp42-170. Error: refresh-config failed on dhcp42-183. Observe that ganesha crashes on the node where refresh config fails with following bt: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f4eebc5c700 (LWP 19435)] loc_wipe (loc=loc@entry=0x7f4ee074ada8) at xlator.c:694 694 if (loc->inode) { (gdb) bt #0 loc_wipe (loc=loc@entry=0x7f4ee074ada8) at xlator.c:694 #1 0x00007f4ee1b9297e in dht_local_wipe (this=0x7f4ed401d150, local=0x7f4ee074ada0) at dht-helper.c:573 #2 0x00007f4ee1bb3848 in dht_ipc_cbk (frame=0x7f4eeb38706c, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>) at dht-common.c:8569 #3 0x00007f4ee1e664fa in afr_ipc_cbk (frame=0x7f4eeb3881d0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>) at afr-common.c:4074 #4 0x00007f4ee20a5b91 in client3_3_ipc_cbk (req=req@entry=0x7f4ed0ae699c, iov=iov@entry=0x0, count=count@entry=0, myframe=myframe@entry=0x7f4eeb388378) at client-rpc-fops.c:2161 #5 0x00007f4ef03e3e48 in rpc_clnt_submit (rpc=0x7f4ed409ea40, prog=prog@entry=0x7f4ee22dde20 <clnt3_3_fop_prog>, procnum=procnum@entry=47, cbkfn=cbkfn@entry=0x7f4ee20a59c0 <client3_3_ipc_cbk>, proghdr=proghdr@entry=0x7f4ed4471330, proghdrcount=<optimized out>, progpayload=progpayload@entry=0x0, progpayloadcount=progpayloadcount@entry=0, iobref=iobref@entry=0x7f4ebc001610, frame=frame@entry=0x7f4eeb388378, rsphdr=0x0, rsphdr_count=rsphdr_count@entry=0, ---Type <return> to continue, or q <return> to quit--- rsp_payload=rsp_payload@entry=0x0, rsp_payload_count=rsp_payload_count@entry=0, rsp_iobref=rsp_iobref@entry=0x0) at rpc-clnt.c:1687 #6 0x00007f4ee2096ea2 in client_submit_request (this=this@entry=0x7f4ed4015b50, req=req@entry=0x7f4ed4471600, frame=frame@entry=0x7f4eeb388378, prog=0x7f4ee22dde20 <clnt3_3_fop_prog>, procnum=procnum@entry=47, cbkfn=cbkfn@entry=0x7f4ee20a59c0 <client3_3_ipc_cbk>, iobref=iobref@entry=0x0, rsphdr=rsphdr@entry=0x0, rsphdr_count=rsphdr_count@entry=0, rsp_payload=rsp_payload@entry=0x0, rsp_payload_count=rsp_payload_count@entry=0, rsp_iobref=rsp_iobref@entry=0x0, xdrproc=0x7f4ef01c7510 <xdr_gfs3_ipc_req>) at client.c:316 #7 0x00007f4ee20b234e in client3_3_ipc (frame=0x7f4eeb388378, this=0x7f4ed4015b50, data=<optimized out>) at client-rpc-fops.c:6027 #8 0x00007f4ee2095052 in client_ipc (frame=0x7f4eeb388378, this=<optimized out>, op=<optimized out>, xdata=<optimized out>) at client.c:2039 #9 0x00007f4ee1e66dfe in afr_ipc (frame=0x7f4eeb3881d0, this=<optimized out>, op=<optimized out>, xdata=0x7f4eeab277f0) at afr-common.c:4116 #10 0x00007f4ee1bd8703 in dht_ipc (frame=0x7f4eeb38706c, this=<optimized out>, ---Type <return> to continue, or q <return> to quit--- op=<optimized out>, xdata=0x7f4eeab277f0) at dht-common.c:8611 #11 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed401ea20, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #12 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed4020360, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #13 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed40217b0, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #14 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed4022d10, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #15 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed4024440, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #16 0x00007f4ef069174f in default_ipc (frame=0x7f4eeb38706c, this=0x7f4ed4025900, op=2, xdata=0x7f4eeab277f0) at defaults.c:2234 #17 0x00007f4ef066147c in syncop_ipc (subvol=0x7f4ed4025900, op=op@entry=2, xdata_in=0x7f4eeab277f0, xdata_out=xdata_out@entry=0x0) at syncop.c:2819 #18 0x00007f4ee0d2ba63 in mdc_send_xattrs (data=0x7f4ed4010a00) at md-cache.c:2641 #19 0x00007f4ef064f862 in synctask_wrap (old_task=<optimized out>) at syncop.c:375 #20 0x00007f4ef36c2110 in ?? () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- #21 0x0000000000000000 in ?? () ****************************************************************** [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Error: refresh-config failed on dhcp42-170. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ff5589f7700 (LWP 25293)] loc_wipe (loc=loc@entry=0x7ff54c3e5adc) at xlator.c:694 694 if (loc->inode) { (gdb) bt #0 loc_wipe (loc=loc@entry=0x7ff54c3e5adc) at xlator.c:694 #1 0x00007ff54e12c97e in dht_local_wipe (this=0x7ff54001d150, local=0x7ff54c3e5ad4) at dht-helper.c:573 #2 0x00007ff54e14d848 in dht_ipc_cbk (frame=0x7ff557922870, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>) at dht-common.c:8569 #3 0x00007ff54e400a3a in afr_ipc (frame=0x7ff557921214, this=<optimized out>, op=<optimized out>, xdata=<optimized out>) at afr-common.c:4129 #4 0x00007ff54e172703 in dht_ipc (frame=0x7ff557922870, this=<optimized out>, op=<optimized out>, xdata=0x7ff5570c1698) at dht-common.c:8611 #5 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff54001ea20, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 #6 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff540020360, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 #7 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff5400217b0, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 #8 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff540022d10, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 #9 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff540024440, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 ---Type <return> to continue, or q <return> to quit--- #10 0x00007ff55cc2b74f in default_ipc (frame=0x7ff557922870, this=0x7ff540025900, op=2, xdata=0x7ff5570c1698) at defaults.c:2234 #11 0x00007ff55cbfb47c in syncop_ipc (subvol=0x7ff540025900, op=op@entry=2, xdata_in=0x7ff5570c1698, xdata_out=xdata_out@entry=0x0) at syncop.c:2819 #12 0x00007ff54d2c5a63 in mdc_send_xattrs (data=0x7ff540003770) at md-cache.c:2641 #13 0x00007ff55cbe9862 in synctask_wrap (old_task=<optimized out>) at syncop.c:375 #14 0x00007ff55fc5c110 in ?? () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () (gdb) *********************************************************** [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Refresh-config completed on dhcp42-170. Refresh-config completed on dhcp42-183. Error: refresh-config failed on dhcp43-145. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f99c831b700 (LWP 28373)] 0x00007f99be51c3ce in afr_local_transaction_cleanup ( local=local@entry=0x7f99ad9f29c0, this=this@entry=0x7f99b0016ec0) at afr-common.c:1484 1484 afr_matrix_cleanup (local->pending, priv->child_count); (gdb) bt #0 0x00007f99be51c3ce in afr_local_transaction_cleanup ( local=local@entry=0x7f99ad9f29c0, this=this@entry=0x7f99b0016ec0) at afr-common.c:1484 #1 0x00007f99be51c52a in afr_local_cleanup (local=0x7f99ad9f29c0, this=0x7f99b0016ec0) at afr-common.c:1574 #2 0x00007f99be525aa3 in afr_ipc (frame=0x7f99c7a47c94, this=0x7f99b0016ec0, op=<optimized out>, xdata=0x7f99c71e6540) at afr-common.c:4099 #3 0x00007f99be297703 in dht_ipc (frame=0x7f99c7a47bc0, this=<optimized out>, op=<optimized out>, xdata=0x7f99c71e6540) at dht-common.c:8611 #4 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b001ea20, op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #5 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b0020360, op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #6 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b00217b0, op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #7 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b0022d10, op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #8 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b0024440, op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #9 0x00007f99ccd5074f in default_ipc (frame=0x7f99c7a47bc0, this=0x7f99b0025900, ---Type <return> to continue, or q <return> to quit--- op=2, xdata=0x7f99c71e6540) at defaults.c:2234 #10 0x00007f99ccd2047c in syncop_ipc (subvol=0x7f99b0025900, op=op@entry=2, xdata_in=0x7f99c71e6540, xdata_out=xdata_out@entry=0x0) at syncop.c:2819 #11 0x00007f99bd3eaa63 in mdc_send_xattrs (data=0x7f99b0000d30) at md-cache.c:2641 #12 0x00007f99ccd0e862 in synctask_wrap (old_task=<optimized out>) at syncop.c:375 #13 0x00007f99cfd81110 in ?? () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () Following messages are seen in ganesha-gfapi logs: [2016-10-14 13:42:44.337670] I [io-stats.c:3822:fini] 0-mdcache: io-stats translator unloaded [2016-10-14 13:44:01.611611] W [afr-common.c:4096:afr_ipc] (-->/lib64/libglusterfs.so.0(default_ipc+0xcf) [0x7f99ccd5074f] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x4e703) [0x7f99be297703] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x56ed1) [0x7f99be525ed1] ) 0-mdcache-replicate-0: invalid argument: this->private [Invalid argument] Actual results: refresh-config fails and crashes ganesha when mdcache is enabled on the volume. Expected results: There should not be any crash Additional info: Didn't see any crash if mdcache is not enabled on the volume with the same build: [root@dhcp43-92 ~]# gluster vol info mdcache Volume Name: mdcache Type: Distributed-Replicate Volume ID: 2c4eb3ed-6d56-46fd-a53d-0db4ae8c5e69 Status: Started Snapshot Count: 0 Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.43.92:/bricks/brick0/b0 Brick2: 10.70.42.170:/bricks/brick0/b0 Brick3: 10.70.43.145:/bricks/brick0/b0 Brick4: 10.70.42.183:/bricks/brick0/b0 Brick5: 10.70.43.92:/bricks/brick1/b1 Brick6: 10.70.42.170:/bricks/brick1/b1 Brick7: 10.70.43.145:/bricks/brick1/b1 Brick8: 10.70.42.183:/bricks/brick1/b1 Brick9: 10.70.43.92:/bricks/brick2/b2 Brick10: 10.70.42.170:/bricks/brick2/b2 Brick11: 10.70.43.145:/bricks/brick2/b2 Brick12: 10.70.42.183:/bricks/brick2/b2 Options Reconfigured: performance.client-io-threads: off ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.enable-shared-storage: enable nfs-ganesha: enable [root@dhcp43-92 ~]# gluster vol get mdcache all | grep client-io performance.client-io-threads off [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Refresh-config completed on dhcp42-170. Refresh-config completed on dhcp42-183. Refresh-config completed on dhcp43-145. Success: refresh-config completed. [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Refresh-config completed on dhcp42-170. Refresh-config completed on dhcp42-183. Refresh-config completed on dhcp43-145. Success: refresh-config completed. [root@dhcp43-92 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /var/run/gluster/shared_storage/nfs-ganesha/ mdcache Refresh-config completed on dhcp42-170. Refresh-config completed on dhcp42-183. Refresh-config completed on dhcp43-145. Success: refresh-config completed. sosreports and ganesha logs will be attached
sosreports and logs can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1384993
This issue is also seen during unexport of volume where sometimes ganesha crashes and unexport fails with below message: [root@dhcp43-92 exports]# gluster vol set mdcache ganesha.enable off volume set: failed: Staging failed on dhcp43-145.lab.eng.blr.redhat.com. Error: Dynamic export addition/deletion failed. Please see log file for details bt from gdb: Thread 15 (Thread 0x7fc6616fa700 (LWP 10230)): #0 0x00007fc71ba0c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc717fb6c3b in syncenv_destroy (env=0x7fc65c030610) at syncop.c:779 #2 0x00007fc71824f605 in pub_glfs_fini (fs=0x7fc65c008550) at glfs.c:1215 #3 0x00007fc71867b5a1 in export_release (exp_hdl=0x7fc65c001e10) at /usr/src/debug/nfs-ganesha-2.4.0/src/FSAL/FSAL_GLUSTER/export.c:88 #4 0x00007fc71d54292d in mdcache_exp_release (exp_hdl=0x7fc65c0342d0) at /usr/src/debug/nfs-ganesha-2.4.0/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_export.c:170 #5 0x00007fc71d5223cb in free_export_resources (export=0x7fc65cc1bac8) at /usr/src/debug/nfs-ganesha-2.4.0/src/support/exports.c:2064 #6 0x00007fc71d531fa3 in free_export (export=0x7fc65cc1bac8) at /usr/src/debug/nfs-ganesha-2.4.0/src/support/export_mgr.c:252 #7 0x00007fc71d533c54 in gsh_export_removeexport (args=<optimized out>, reply=<optimized out>, error=0x7fc6616f92e0) at /usr/src/debug/nfs-ganesha-2.4.0/src/support/export_mgr.c:1096 #8 0x00007fc71d555319 in dbus_message_entrypoint (conn=0x7fc71db4ac90, msg=0x7fc71db4af70, user_data=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.0/src/dbus/dbus_server.c:512 #9 0x00007fc71cdecc86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3 #10 0x00007fc71cddee49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #11 0x00007fc71cddf0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3 #12 0x00007fc71d556390 in gsh_dbus_thread (arg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.0/src/dbus/dbus_server.c:737 #13 0x00007fc71ba08dc5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007fc71b0d5ced in clone () from /lib64/libc.so.6
Could you please give a brief on what refresh config does?
(In reply to Poornima G from comment #4) > Could you please give a brief on what refresh config does? It just unexport and reexport the volume
Calls init and fini is it? I also see that afr_ipc fop failed for some reason.
(In reply to Poornima G from comment #6) > Calls init and fini is it? I also see that afr_ipc fop failed for some > reason. As far as I remember, there were no IO's. It should only init and fini.
Fix posted upstream http://review.gluster.org/#/c/15764/2
Fix Posted: Downstream 3.2: https://code.engineering.redhat.com/gerrit/#/c/90692/ Master : http://review.gluster.org/#/c/15764/ 3.9 : http://review.gluster.org/#/c/15890/
Verified the fix in build, nfs-ganesha-2.4.1-2.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-8.el7rhgs.x86_64 Tested both the scenarios(refresh-config and disabling nfs-ganesha on volume) multiple times with mdcache enabled and with client-io threads on/off, and is working fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html