1725716 – [Ganesha] Ganesha crashed in syncop_stat () while running IO's and lookups from multiple clients (vers=v3 and v4.1)

Bug 1725716 - [Ganesha] Ganesha crashed in syncop_stat () while running IO's and lookups from multiple clients (vers=v3 and v4.1)

Summary: [Ganesha] Ganesha crashed in syncop_stat () while running IO's and lookups fr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 4
Assignee:	Frank Filz
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	1745389 1746324 1800703
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-01 10:06 UTC by Manisha Saini
Modified:	2021-04-29 07:22 UTC (History)
CC List:	17 users (show)
Fixed In Version:	glusterfs-6.0-55.el7rhgs (rhgs-3.5.4)
Doc Type:	Bug Fix
Doc Text:	Previously, applications based on gfapi such as, gluster-block or samba malfunctioned or crashed in some cases due to a memory corruption bug. With this update, this issue is resolved.
Clone Of:
Environment:
Last Closed:	2021-04-29 07:21:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:1463	0	None	None	None	2021-04-29 07:22:01 UTC

Description Manisha Saini 2019-07-01 10:06:48 UTC

Description of problem:
============================

8 node Ganesha cluster , 8*3 Distributed-Replicate mounted on 6 clients via v3 and v4

Ganesha got crashed while running IO's and readdir operations from multiple client

----------
Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fc232d5f5f5 in syncop_stat () from /lib64/libglusterfs.so.0
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-13.el7_6.x86_64 elfutils-libelf-0.176-2.el7.x86_64 elfutils-libs-0.176-2.el7.x86_64 glibc-2.17-292.el7.x86_64 glusterfs-6.0-7.el7rhgs.x86_64 glusterfs-api-6.0-7.el7rhgs.x86_64 glusterfs-client-xlators-6.0-7.el7rhgs.x86_64 glusterfs-libs-6.0-7.el7rhgs.x86_64 gssproxy-0.7.0-26.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-61.el7.x86_64 libcap-2.22-10.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 libwbclient-4.9.1-6.el7.x86_64 lz4-1.7.5-3.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 samba-client-libs-4.9.1-6.el7.x86_64 systemd-libs-219-67.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fc232d5f5f5 in syncop_stat () from /lib64/libglusterfs.so.0
#1  0x00007fc2600b48d3 in glfs_h_stat () from /lib64/libgfapi.so.0
#2  0x00007fc2602ca2a6 in getattrs (obj_hdl=0x7fc22c0dcd88, attrs=0x7fc1fb7f51f0) at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/FSAL_GLUSTER/handle.c:866
#3  0x000055dcc0a7ec03 in mdcache_refresh_attrs (entry=entry@entry=0x7fc22c0ddba0, need_acl=<optimized out>, need_fslocations=<optimized out>, 
    invalidate=invalidate@entry=true) at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:836
#4  0x000055dcc0a80257 in mdcache_getattrs (obj_hdl=0x7fc22c0ddbd8, attrs_out=0x7fc1fb7f5500)
    at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:903
#5  0x000055dcc09ec3ef in file_To_Fattr (data=data@entry=0x7fc1fb7f5720, request_mask=1433550, attr=attr@entry=0x7fc1fb7f5500, 
    Fattr=Fattr@entry=0x7fc1c000fb20, Bitmap=Bitmap@entry=0x7fc1c004ac58) at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs_proto_tools.c:3511
#6  0x000055dcc09c6e8b in nfs4_op_getattr (op=0x7fc1c004ac50, data=0x7fc1fb7f5720, resp=0x7fc1c000fb10)
    at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs4_op_getattr.c:108
#7  0x000055dcc09c06f3 in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fc1c0043d60)
    at /usr/src/debug/nfs-ganesha-2.7.3/src/Protocols/NFS/nfs4_Compound.c:942
#8  0x000055dcc09b3b0f in nfs_rpc_process_request (reqdata=0x7fc1c0008720) at /usr/src/debug/nfs-ganesha-2.7.3/src/MainNFSD/nfs_worker_thread.c:1328
#9  0x000055dcc09b2fba in nfs_rpc_decode_request (xprt=0x7fc1940015e0, xdrs=0x7fc1c004a420)
    at /usr/src/debug/nfs-ganesha-2.7.3/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#10 0x00007fc26bf7162d in svc_rqst_xprt_task () from /lib64/libntirpc.so.1.7
#11 0x00007fc26bf71b6a in svc_rqst_run_task () from /lib64/libntirpc.so.1.7
#12 0x00007fc26bf79c0b in work_pool_thread () from /lib64/libntirpc.so.1.7
#13 0x00007fc26a30fea5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fc269c1a8cd in clone () from /lib64/libc.so.6
----------------


Version-Release number of selected component (if applicable):
=============================
# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-5.el7rhgs.x86_64
glusterfs-ganesha-6.0-7.el7rhgs.x86_64


How reproducible:
================
1/1

Steps to Reproduce:
=================
1. Create 8 node Ganesha cluster
2. Create 8*3 Distributed-Replicate volume
3. Export the volume via Ganesha
4. Add the following option in volume export file "enable_upcall = yes;" and run refresh-config.
5. Perform volume start and stop and check if volume is exported.Volume was exported successfully 
6. Mount the volume on 6 clients via v3 and v4.1 using single server VIP
7. Run the following workload-

Client 1: (v3) Linux untars of empty dirs
Client 2: (v3) Bonnie
Client 3: (v4) Bonnie
Client 4: (v4) dbench
Client 5: (v4) ls -lRt in loop
Client 6: (v4) du -sh (single iteration)

Actual results:
==================
Ganesha got crahsed on the node whose VIP was used to mount volume on clients


Expected results:
===================
Ganesha should not crash

Additional info:
===================

[root@f07-h33-000-1029u exports]# service nfs-ganesha status
Redirecting to /bin/systemctl status nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Mon 2019-07-01 09:15:47 UTC; 46min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 228321 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
 Main PID: 59700 (code=killed, signal=SEGV)

Jul 01 05:52:07 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Starting NFS-Ganesha file server...
Jul 01 05:52:09 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Started NFS-Ganesha file server.
Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state.
Jul 01 09:15:47 f07-h33-000-1029u.rdu2.scalelab.redhat.com systemd[1]: nfs-ganesha.service failed.
--------------------------


# pcs status
Cluster name: ganesha-ha
Stack: corosync
Current DC: f12-h02-000-1029u.rdu2.scalelab.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
Last updated: Mon Jul  1 10:04:39 2019
Last change: Mon Jul  1 09:15:56 2019 by root via crm_attribute on f07-h33-000-1029u.rdu2.scalelab.redhat.com

8 nodes configured
48 resources configured

Online: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ f07-h34-000-1029u.rdu2.scalelab.redhat.com f07-h35-000-1029u.rdu2.scalelab.redhat.com f07-h36-000-1029u.rdu2.scalelab.redhat.com f12-h02-000-1029u.rdu2.scalelab.redhat.com f12-h03-000-1029u.rdu2.scalelab.redhat.com f12-h04-000-1029u.rdu2.scalelab.redhat.com f12-h05-000-1029u.rdu2.scalelab.redhat.com ]
     Stopped: [ f07-h33-000-1029u.rdu2.scalelab.redhat.com ]
 Resource Group: f07-h33-000-1029u.rdu2.scalelab.redhat.com-group
     f07-h33-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com
     f07-h33-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com
     f07-h33-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f07-h36-000-1029u.rdu2.scalelab.redhat.com-group
     f07-h36-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f07-h36-000-1029u.rdu2.scalelab.redhat.com
     f07-h36-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f07-h36-000-1029u.rdu2.scalelab.redhat.com
     f07-h36-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f07-h36-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f07-h35-000-1029u.rdu2.scalelab.redhat.com-group
     f07-h35-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f07-h35-000-1029u.rdu2.scalelab.redhat.com
     f07-h35-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f07-h35-000-1029u.rdu2.scalelab.redhat.com
     f07-h35-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f07-h35-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f07-h34-000-1029u.rdu2.scalelab.redhat.com-group
     f07-h34-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f07-h34-000-1029u.rdu2.scalelab.redhat.com
     f07-h34-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f07-h34-000-1029u.rdu2.scalelab.redhat.com
     f07-h34-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f07-h34-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f12-h05-000-1029u.rdu2.scalelab.redhat.com-group
     f12-h05-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f12-h05-000-1029u.rdu2.scalelab.redhat.com
     f12-h05-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f12-h05-000-1029u.rdu2.scalelab.redhat.com
     f12-h05-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f12-h05-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f12-h02-000-1029u.rdu2.scalelab.redhat.com-group
     f12-h02-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f12-h02-000-1029u.rdu2.scalelab.redhat.com
     f12-h02-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f12-h02-000-1029u.rdu2.scalelab.redhat.com
     f12-h02-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f12-h02-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f12-h03-000-1029u.rdu2.scalelab.redhat.com-group
     f12-h03-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f12-h03-000-1029u.rdu2.scalelab.redhat.com
     f12-h03-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f12-h03-000-1029u.rdu2.scalelab.redhat.com
     f12-h03-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f12-h03-000-1029u.rdu2.scalelab.redhat.com
 Resource Group: f12-h04-000-1029u.rdu2.scalelab.redhat.com-group
     f12-h04-000-1029u.rdu2.scalelab.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com
     f12-h04-000-1029u.rdu2.scalelab.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com
     f12-h04-000-1029u.rdu2.scalelab.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started f12-h04-000-1029u.rdu2.scalelab.redhat.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

----------------------

# showmount -e
rpc mount export: RPC: Unable to receive; errno = Connection refused

Comment 22 Manisha Saini 2021-01-28 18:04:01 UTC

Verified this BZ with

# rpm -qa | grep ganesha
glusterfs-ganesha-6.0-51.el7rhgs.x86_64
nfs-ganesha-gluster-3.4-1.el7rhgs.x86_64
nfs-ganesha-3.4-1.el7rhgs.x86_64
nfs-ganesha-selinux-3.4-1.el7rhgs.noarch


Steps performed for verification:

1. Create 4 node Ganesha cluster
2. Create 8*3 Distributed-Replicate volume
3. Export the volume via Ganesha
4. Perform volume start and stop
5. Mount the volume on 6 clients via v3 and v4.1 using single server VIP
6. Run the following workload-

Client 1: (v3) Linux untars of empty dirs
Client 2: (v3) Bonnie
Client 3: (v4) Bonnie
Client 4: (v4) dbench
Client 5: (v4) ls -lRt in loop
Client 6: (v4) du -sh (single iteration)

7.Stop IO's from all the clients
8.Perform unexport and export of volume via ganesha
9.Perform rm -rf from all the clients

No crashes were observed.Moving this BZ to verified state.

Comment 29 errata-xmlrpc 2021-04-29 07:21:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nfs-ganesha bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1463

Note You need to log in before you can comment on or make changes to this bug.