1608523 – Ganesha] Ganesha crashed in _mdcache_readdir_chunked_ while performing lookup,Rootsquash enable

Bug 1608523 - Ganesha] Ganesha crashed in _mdcache_readdir_chunked_ while performing lookup,Rootsquash enable

Summary: Ganesha] Ganesha crashed in _mdcache_readdir_chunked_ while performing lookup...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-07-25 17:30 UTC by Manisha Saini
Modified:	2018-09-24 12:10 UTC (History)
CC List:	13 users (show)
Fixed In Version:	nfs-ganesha-2.5.5-9
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-09-04 06:55:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2610	0	None	None	None	2018-09-04 06:56:53 UTC

Description Manisha Saini 2018-07-25 17:30:39 UTC

Description of problem:

6 node ganesha cluster.Distributed-Disperse Volume,Mounted to 4 different clients via 4 different server VIP's. Rootsquash enable.

IO pattern:

1st Client- Linux Untars (Got completed before crash was hit)
2nd Client- ls -lrt in loop
3rd Client- Subdir mounted (uuid:nfsnobody) - du -sh in loop
4th Client- Bonnie 

bt
------
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f167ca79700 (LWP 31185)]
mdcache_readdir_chunked (directory=directory@entry=0x7f14980a9070, whence=0, dir_state=dir_state@entry=0x7f167ca77e30, 
    cb=cb@entry=0x55907cbc51f0 <populate_dirent>, attrmask=attrmask@entry=122830, eod_met=eod_met@entry=0x7f167ca77f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3156
3156			if (dirent->ck == whence) {
(gdb) bt
#0  mdcache_readdir_chunked (directory=directory@entry=0x7f14980a9070, whence=0, 
    dir_state=dir_state@entry=0x7f167ca77e30, cb=cb@entry=0x55907cbc51f0 <populate_dirent>, 
    attrmask=attrmask@entry=122830, eod_met=eod_met@entry=0x7f167ca77f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3156
#1  0x000055907cc98924 in mdcache_readdir (dir_hdl=0x7f14980a90a8, whence=<optimized out>, dir_state=0x7f167ca77e30, 
    cb=0x55907cbc51f0 <populate_dirent>, attrmask=122830, eod_met=0x7f167ca77f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:640
#2  0x000055907cbc70e4 in fsal_readdir (directory=directory@entry=0x7f14980a90a8, cookie=cookie@entry=0, 
    nbfound=nbfound@entry=0x7f167ca77f1c, eod_met=eod_met@entry=0x7f167ca77f1b, attrmask=122830, 
    cb=cb@entry=0x55907cc037f0 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f167ca77f20)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:1500
#3  0x000055907cc047bb in nfs4_op_readdir (op=0x7f15d4043920, data=0x7f167ca78150, resp=0x7f1498275670)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_readdir.c:627
#4  0x000055907cbf015f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f14982dcc90)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752
#5  0x000055907cbe03cb in nfs_rpc_execute (reqdata=reqdata@entry=0x7f15d40008c0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290
#6  0x000055907cbe1a2a in worker_run (ctx=0x55907d5ddd50)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562
#7  0x000055907cc721a9 in fridgethr_start_routine (arg=0x55907d5ddd50)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550
#8  0x00007f16b2540dd5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f16b1c0cb3d in clone () from /lib64/libc.so.6
(gdb) generate-core-file 
--------------

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-2.5.5-8.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-14.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64


How reproducible:
1/1

Steps to Reproduce:
1.Create 6 node ganesha cluster
2.Create 6 x (4 + 2) Distributed-Disperse volume.Export the volume via Ganesha
3.Mount the volume to 1 client using 1st server VIP
4.Change the mount point permission to chmod 777 /mnt/mount_point
5.Enable root-squash.Run refresh-config
6.Create a directory named "mani"
7.Mount the subdir "mani" on 2nd client using 2nd server VIP
8.Mount the volume to 2 more clients using different VIP
9.Run IO's and lookups from all the 4 clients.


1st Client- Linux Untars (Got completed before crash was hit)
2nd Client- ls -lrt in loop
3rd Client- Subdir mounted (uuid:nfsnobody) - du -sh in loop
4th Client- Bonnie 


Actual results:

Linux untar got completed successfully. ls -lrt got stuck on one of the client for a while.

Stopped bonnie which was running from another client.
Ganesha got crashed on the server through which client was mapped performing "ls -lrt"

Expected results:

No crash should be observed.


Additional info:
Attaching sosreport and core dump shortly.


[root@rhs-client6 test]# ls -lrt
total 12
drwxr-xr-x. 2 root      root      4096 Jul 25  2018 dir1
drwxr-xr-x. 3 nfsnobody nfsnobody 4096 Jul 25  2018 mani
drwxr-xr-x. 3 nfsnobody nfsnobody 4096 Jul 25  2018 run18512


]# df
Filesystem                          1K-blocks     Used  Available Use% Mounted on
/dev/mapper/rhel_rhs--client6-root   52403200  1771672   50631528   4% /
devtmpfs                              8105756        0    8105756   0% /dev
tmpfs                                 8117824        0    8117824   0% /dev/shm
tmpfs                                 8117824     9680    8108144   1% /run
tmpfs                                 8117824        0    8117824   0% /sys/fs/cgroup
/dev/sda1                             1038336   219484     818852  22% /boot
/dev/mapper/rhel_rhs--client6-home 1890846652   151868 1890694784   1% /home
tmpfs                                 1623568        0    1623568   0% /run/user/0
10.70.34.91:/mani1                 3251634176 34838528 3216795648   2% /mnt/test

Comment 8 Daniel Gryniewicz 2018-07-27 15:50:41 UTC

This appears to be a use-after-free on the dirent, since there's a null check for the dirent immediately above that line.  This code is heavily changed by the readdir_plus changes, and I believe it may have fixed this.  Can it be re-tested on the next build?

Comment 12 Manisha Saini 2018-08-22 11:16:50 UTC

Verified this with (Readdir disable in ganesha.conf)

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64

Steps performed for verification-

1.Create 6 node ganesha cluster
2.Create 6 x (4 + 2) Distributed-Disperse volume.Export the volume via Ganesha
3.Mount the volume to 1 client using 1st server VIP
4.Change the mount point permission to chmod 777 /mnt/mount_point
5.Enable root-squash.Run refresh-config
6.Create a directory named "mani"
7.Mount the subdir "mani" on 2nd client using 2nd server VIP
8.Mount the volume to 2 more clients using different VIP
9.Run IO's and lookups from all the 4 clients.

All operations are run in "mani" directory

1st Client- Linux Untars 
2nd Client- ls -lrt in loop
3rd Client- Subdir mounted (uuid:nfsnobody) - du -sh in loop
4th Client- Bonnie 


No crashes were observed.Moving this BZ to verified state.

Comment 14 errata-xmlrpc 2018-09-04 06:55:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610

Note You need to log in before you can comment on or make changes to this bug.