Bug 1415608 - [Stress] : Recursive ls and finds hang on EC backed volumes over Ganesha mounts on a pure metadata intensive workload.
Summary: [Stress] : Recursive ls and finds hang on EC backed volumes over Ganesha moun...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.5.0
Assignee: Frank Filz
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On: 1475699 1695078
Blocks: 1422761 1696807
TreeView+ depends on / blocked
 
Reported: 2017-01-23 08:58 UTC by Ambarish
Modified: 2019-10-30 12:16 UTC (History)
13 users (show)

Fixed In Version: nfs-ganesha-2.7.3-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:15:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3252 0 None None None 2019-10-30 12:16:15 UTC

Description Ambarish 2017-01-23 08:58:29 UTC
Description of problem:
-----------------------

4 node cluster,1 EC volume exported via Ganesha.

Mounted the EC volume on 4 clients via v3/v4.

Ran find,du -sh,ll -R on each mount .

On one of the clients(gqac024/mounted via gqas014),find and ll-R were hung (for close to more than 2 hours).

I took a packet trace on the client,and it showed no packets sent from the client itself.

sosreports,packet traces in comments.

Version-Release number of selected component (if applicable):
------------------------------------------------------------

nfs-ganesha-2.4.1-6.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64


How reproducible:
------------------

1/1

Steps to Reproduce:
------------------
1. Mount EC volume via v3/v4 on multiple clients.

2. Run no new writes..Only ls,stat,du -sh.finds.

3. Monitor .

Actual results:
--------------- 

ll/finds hang.

Expected results:
----------------

No hangs on mount point.

Additional info:
----------------
[root@gqas009 ~]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas015.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       26457
Brick gqas014.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       25391
Brick gqas009.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       25747
Self-heal Daemon on localhost               N/A       N/A        Y       17960
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       13756
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17415
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17200
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: replicate
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks12/bricknew                            49153     0          Y       27931
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks12/bricknew                            49152     0          Y       27177
Self-heal Daemon on localhost               N/A       N/A        Y       17960
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17415
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       13756
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17200
 
Task Status of Volume replicate
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick1                               49158     0          Y       29725
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       24750
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       24867
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       25931
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49165     0          Y       25201
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49165     0          Y       24769
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick1                               49153     0          Y       29771
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49166     0          Y       24788
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49165     0          Y       24886
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49165     0          Y       25950
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49166     0          Y       24905
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49166     0          Y       25969
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick1                               49154     0          Y       29827
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       24807
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       25988
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       24924
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49168     0          Y       25258
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49168     0          Y       24826
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49169     0          Y       25277
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49169     0          Y       24845
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49168     0          Y       26007
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49168     0          Y       24943
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49169     0          Y       24962
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49169     0          Y       26026
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick1                               49155     0          Y       29909
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       24864
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       26045
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       24981
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49171     0          Y       24883
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49171     0          Y       25315
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49172     0          Y       25336
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49172     0          Y       24902
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49171     0          Y       26064
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49171     0          Y       25000
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49172     0          Y       25019
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49172     0          Y       26083
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       25355
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       24921
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       26102
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       25038
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick1                              49156     0          Y       30009
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49174     0          Y       24940
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49175     0          Y       25393
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49175     0          Y       24959
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49174     0          Y       26121
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49174     0          Y       25057
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49175     0          Y       25076
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49175     0          Y       26140
Self-heal Daemon on localhost               N/A       N/A        Y       17960
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       13756
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17415
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17200
 
Task Status of Volume testvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 60d3e3e4-661c-4520-9f5f-482d95d81a82
Status               : in progress         
 
[root@gqas009 ~]#

Comment 2 Ambarish 2017-01-23 08:59:55 UTC
Proposing this to block 3.2 since application side is impacted.

Comment 3 Ambarish 2017-01-23 09:00:58 UTC
*Sidenote* : The test passed on EC over gNFS.

Comment 5 Ambarish 2017-01-23 10:01:01 UTC
Setup is in same state in case someone wants to take a look.

Comment 13 Manisha Saini 2018-08-25 09:53:29 UTC
Verified this with (Readdir chunking code disable) -


# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64

Steps performed to verify the issue-

1.Create 6 node ganesha cluster
2.Create 6 x (4 + 2) Distributed-Disperse Volume.Enable ganesha on the volume
3.Mount the volume on 4 clients v3/v4 via same VIP.
4.Create huge data set consisting of small,large and empty directories-

Detailed-

Equal number of files approximately 1.1 million files averaging to 8k file size in large and small directory sets.
The small directory set had 12.5k directory with less than or equal to 100 files per directory and the large directory set comprised of 50 directories with approximately 20k files per directory.Empty directory sets consist of 12.5k directories.

5.Once the data set is created, trigger recursive find,du -sh,ll -R from 3 clients



No hangs were observed for ll -R and find'd running recursively.

But du -sh took longer time ( ~ 2.5 Hrs) for data set of ~ 134 GB.

This is been tracked as part of BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1622281

Moving this bug to verified state,since ll -R and find'd hung issue seems to be resolved.

Comment 14 Daniel Gryniewicz 2018-08-27 12:24:27 UTC
This should be moved out of 3.4, since dirent chunk is removed.

Comment 24 Manisha Saini 2019-09-02 18:18:16 UTC
Verified this BZ with

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-7.el7rhgs.x86_64
glusterfs-ganesha-6.0-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64

Steps performed for verification-

1.Create 4 node ganesha cluster
2.Create 2 x (4 + 2) Distributed-Disperse Volume.Enable ganesha on the volume
3.Mount the volume on 4 clients v3/v4.1 via same VIP.
4.Create huge data set consisting of small,large and empty directories-
5.Once the data set is created, trigger recursive find,du -sh,ll -R from 4 clients



No hangs were observed for ll -R and find'd running recursively.Moving this BZ to verified state

Comment 26 errata-xmlrpc 2019-10-30 12:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252


Note You need to log in before you can comment on or make changes to this bug.