Bug 1564481

Summary:	[Ganesha] ls -lrt command is stuck on NFS mount point for more than 1 hour having around 10 lakhs file
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Manisha Saini <msaini>
Component:	nfs-ganesha	Assignee:	Jiffin <jthottan>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	dang, ffilz, grajoria, jthottan, kkeithle, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	nfs-ganesha-2.5.5-7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 06:55:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1403648, 1503137

Description Manisha Saini 2018-04-06 12:19:53 UTC

Description of problem:

ls -lrt command is hanging on NFS Mount point for more than 1 hour, having around 10 lakhs of files on mount point


Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64
nfs-ganesha-2.5.5-3.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-3.el7rhgs.x86_64


How reproducible:
Reporting 1st instance

Steps to Reproduce:
1.Create 4 node ganesha cluster
2.Create 4 x (2 + 1) = 12 Arbiter volume
3.Export the volume via nfs-ganesha
4.Mount the volume to 4 different clients from 4 different VIP's
5.Perform the following data set-
Client 1- Create 10 lakhs file in loop .
Client 2- Wait for 10 mins and trigger rename of the files created from client 1 
Client 3-Wait for 10 mins.Again rename the files created by client 2 to the original name.
Client 4- Perform Lookups in loop 
           while true;do ls -lRt;done

(After some time,I stooped the rename from client 3)

Kept the setup overnight to complete IO's.Post IO's completion,observed lookup were hang on mount point for more than 1 hour


Actual results:
ls -lrt command stuck on mount point for more than 1 hour having around 10 lakhs files on NFS mount


Expected results:
ls -lrt command should not hang

Additional info:

I have kept the setup in same state for LIVE debugging.Will be attching core dumps and tcpdumps shortly.

Comment 6 Daniel Gryniewicz 2018-04-09 13:35:45 UTC

Are all the files in a single directory?  I have a complicated directory tree with 500k file in it, and it works fine.

Looking at the pcap, there are 237 READDIR round trips of 50 entries each in 119 seconds.  This works out to 100 entries per second, which means 1M entries should take 166 minutes, not 41 hours.  (Note, this is still too slow...)

So, there must have been some very large stalls that are not reflected in the pcap.  I'm not entirely sure where to go from here.  I'll try to reproduce locally, but I don't know it it'll work.

Comment 8 Jiffin 2018-04-09 17:03:32 UTC

(In reply to Daniel Gryniewicz from comment #6)
> Are all the files in a single directory?  I have a complicated directory
> tree with 500k file in it, and it works fine.
> 
> Looking at the pcap, there are 237 READDIR round trips of 50 entries each in
> 119 seconds.  This works out to 100 entries per second, which means 1M
> entries should take 166 minutes, not 41 hours.  (Note, this is still too
> slow...)
> 
> So, there must have been some very large stalls that are not reflected in
> the pcap.  I'm not entirely sure where to go from here.  I'll try to
> reproduce locally, but I don't know it it'll work.

Hi Dan, 
I checked the gluster packets as well.So between READDIR call and reply there lookups send from gluster layer(either in FSAL_GLUSTER layer or any layer above it glusterclient). There were around 488 glusterfs packets(including call and reply) for 50 entries. This is an arbiter volume, so file is replicated among three bricks(glusterfs servers). will check the code 2mrw. Do we need to send lookups as part of READDIR call() ?(AFAIR soumya's fix for gfapi was when readdir call is send to server w, it will fetch stat information of each entry as well)

--
Jiffin

Comment 9 Daniel Gryniewicz 2018-04-09 17:44:00 UTC

We need to have the equivalent of a lookup.  There is readdirplus code in FSAL_GLUSTER (protected by USE_GLUSTER_XREADDIRPLUS, which is automatically turned on if it's available).  Othrewise, it does lookup().

Maybe the library that was built against doesn't provide glfs_xreaddirplus_r()?

Comment 16 Manisha Saini 2018-08-23 09:54:23 UTC

Verified this with Readdir disable -

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64


Steps performed for verification-

1.Create 6 node ganesha cluster
2.Create 4 x (2 + 1) = 12 Arbiter volume
3.Export the volume via nfs-ganesha
4.Mount the volume to 4 different clients from 4 different VIP's
5.Perform the following data set-
Client 1- Create 10 lakhs file in loop .
Client 2- Wait for 10 mins and trigger rename of the files created from client 1 
Client 3-Wait for 10 mins.Again rename the files created by client 2 to the original name.
Client 4- Perform Lookups in loop 
           while true;do ls -lRt;done

After the test is completed,performed ls -lrt on the data set to check the time taking for 10 Lakhs file to display.

real    24m38.787s
user    0m6.861s
sys     0m21.269s


It took around ~25 mins for 10 Lakh files.

Moving this BZ to verified state.

Comment 18 errata-xmlrpc 2018-09-04 06:55:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610