Description of problem: ============================ 8 node Ganesha cluster , 8*3 Distributed-Replicate mounted on 6 clients via v3 and v4 Ganesha got crashed while running IO's and readdir operations from multiple client ---------- Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'. Program terminated with signal 11, Segmentation fault. #0 0x00007fc232d5f5f5 in syncop_stat () from /lib64/libglusterfs.so.0 Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-13.el7_6.x86_64 elfutils-libelf-0.176-2.el7.x86_64 elfutils-libs-0.176-2.el7.x86_64 glibc-2.17-292.el7.x86_64 glusterfs-6.0-7.el7rhgs.x86_64 glusterfs-api-6.0-7.el7rhgs.x86_64 glusterfs-client-xlators-6.0-7.el7rhgs.x86_64 glusterfs-libs-6.0-7.el7rhgs.x86_64 gssproxy-0.7.0-26.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-61.el7.x86_64 libcap-2.22-10.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 libwbclient-4.9.1-6.el7.x86_64 lz4-1.7.5-3.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 samba-client-libs-4.9.1-6.el7.x86_64 systemd-libs-219-67.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007fc232d5f5f5 in syncop_stat () from /lib64/libglusterfs.so.0 #1 0x00007fc2600b48d3 in glfs_h_stat () from /lib64/libgfapi.so.0 #2 0x00007fc2602ca2a6 in getattrs (obj_hdl=0x7fc22c0dcd88, attrs=0x7fc1fb7f51f0) at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/FSAL_GLUSTER/handle.c:866 #3 0x000055dcc0a7ec03 in mdcache_refresh_attrs (entry=entry@entry=0x7fc22c0ddba0, need_acl=<optimized out>, need_fslocations=<optimized out>, invalidate=invalidate@entry=true) at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:836 #4 0x000055dcc0a80257 in mdcache_getattrs (obj_hdl=0x7fc22c0ddbd8, attrs_out=0x7fc1fb7f5500) at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:903 #5 0x000055dcc09ec3ef in file_To_Fattr (data=data@entry=0x7fc1fb7f5720, request_mask=1433550, attr=attr@entry=0x7fc1fb7f5500, Fattr=Fattr@entry=0x7fc1c000fb20, Bitmap=Bitmap@entry=0x7fc1c004ac58) at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs_proto_tools.c:3511 #6 0x000055dcc09c6e8b in nfs4_op_getattr (op=0x7fc1c004ac50, data=0x7fc1fb7f5720, resp=0x7fc1c000fb10) at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs4_op_getattr.c:108 #7 0x000055dcc09c06f3 in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fc1c0043d60) at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs4_Compound.c:942 #8 0x000055dcc09b3b0f in nfs_rpc_process_request (reqdata=0x7fc1c0008720) at /usr/src/debug/nfs-ganesha-2.7.3/src/MainNFSD/nfs_worker_thread.c:1328 #9 0x000055dcc09b2fba in nfs_rpc_decode_request (xprt=0x7fc1940015e0, xdrs=0x7fc1c004a420) at /usr/src/debug/nfs-ganesha-2.7.3/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345 #10 0x00007fc26bf7162d in svc_rqst_xprt_task () from /lib64/libntirpc.so.1.7 #11 0x00007fc26bf71b6a in svc_rqst_run_task () from /lib64/libntirpc.so.1.7 #12 0x00007fc26bf79c0b in work_pool_thread () from /lib64/libntirpc.so.1.7 #13 0x00007fc26a30fea5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007fc269c1a8cd in clone () from /lib64/libc.so.6 ---------------- Version-Release number of selected component (if applicable): ============================= # rpm -qa | grep ganesha nfs-ganesha-2.7.3-5.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.7.3-5.el7rhgs.x86_64 nfs-ganesha-gluster-2.7.3-5.el7rhgs.x86_64 glusterfs-ganesha-6.0-7.el7rhgs.x86_64 How reproducible: ================ 1/1 Steps to Reproduce: ================= 1. Create 8 node Ganesha cluster 2. Create 8*3 Distributed-Replicate volume 3. Export the volume via Ganesha 4. Add the following option in volume export file "enable_upcall = yes;" and run refresh-config. 5. Perform volume start and stop and check if volume is exported.Volume was exported successfully 6. Mount the volume on 6 clients via v3 and v4.1 using single server VIP 7. Run the following workload- Client 1: (v3) Linux untars of empty dirs Client 2: (v3) Bonnie Client 3: (v4) Bonnie Client 4: (v4) dbench Client 5: (v4) ls -lRt in loop Client 6: (v4) du -sh (single iteration) Actual results: ================== Ganesha got crahsed on the node whose VIP was used to mount volume on clients Expected results: =================== Ganesha should not crash Additional info: =================== [root@f07-h33-000-1029u exports]# service nfs-ganesha status Redirecting to /bin/systemctl status nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Mon 2019-07-01 09:15:47 UTC; 46min ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 228321 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Main PID: 59700 (code=killed, signal=SEGV) Jul 01 05:52:07 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Starting NFS-Ganesha file server... Jul 01 05:52:09 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Started NFS-Ganesha file server. Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state. Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: nfs-ganesha.service failed. -------------------------- # pcs status Cluster name: ganesha-ha Stack: corosync Current DC: f12-h02-000-1029u.rdu2.scalelab.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Mon Jul 1 10:04:39 2019 Last change: Mon Jul 1 09:15:56 2019 by root via crm_attribute on f07-h33-000-1029u.rdu2.scalelab.redhat.com 8 nodes configured 48 resources configured Online: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ] Stopped: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com ] Resource Group: f07-h33-000-1029u.rdu2.scalelab.redhat.com-group f07-h33-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com f07-h33-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com f07-h33-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com Resource Group: f07-h36-000-1029u.rdu2.scalelab.redhat.com-group f07-h36-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f07-h36-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f07-h36-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f07-h36-000-1029u.rdu2.scalelab.redhat.com Resource Group: f07-h35-000-1029u.rdu2.scalelab.redhat.com-group f07-h35-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f07-h35-000-1029u.rdu2.scalelab.redhat.com Resource Group: f07-h34-000-1029u.rdu2.scalelab.redhat.com-group f07-h34-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f07-h34-000-1029u.rdu2.scalelab.redhat.com Resource Group: f12-h05-000-1029u.rdu2.scalelab.redhat.com-group f12-h05-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f12-h05-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f12-h05-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f12-h05-000-1029u.rdu2.scalelab.redhat.com Resource Group: f12-h02-000-1029u.rdu2.scalelab.redhat.com-group f12-h02-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f12-h02-000-1029u.rdu2.scalelab.redhat.com Resource Group: f12-h03-000-1029u.rdu2.scalelab.redhat.com-group f12-h03-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f12-h03-000-1029u.rdu2.scalelab.redhat.com Resource Group: f12-h04-000-1029u.rdu2.scalelab.redhat.com-group f12-h04-000-1029u.rdu2.scalelab.redhat.com-nfs_block (ocf::heartbeat:portblock): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started f12-h04-000-1029u.rdu2.scalelab.redhat.com Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ---------------------- # showmount -e rpc mount export: RPC: Unable to receive; errno = Connection refused
Verified this BZ with # rpm -qa | grep ganesha glusterfs-ganesha-6.0-51.el7rhgs.x86_64 nfs-ganesha-gluster-3.4-1.el7rhgs.x86_64 nfs-ganesha-3.4-1.el7rhgs.x86_64 nfs-ganesha-selinux-3.4-1.el7rhgs.noarch Steps performed for verification: 1. Create 4 node Ganesha cluster 2. Create 8*3 Distributed-Replicate volume 3. Export the volume via Ganesha 4. Perform volume start and stop 5. Mount the volume on 6 clients via v3 and v4.1 using single server VIP 6. Run the following workload- Client 1: (v3) Linux untars of empty dirs Client 2: (v3) Bonnie Client 3: (v4) Bonnie Client 4: (v4) dbench Client 5: (v4) ls -lRt in loop Client 6: (v4) du -sh (single iteration) 7.Stop IO's from all the clients 8.Perform unexport and export of volume via ganesha 9.Perform rm -rf from all the clients No crashes were observed.Moving this BZ to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nfs-ganesha bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1463