Bug 1379673

Summary: Creation of file hangs while doing ls from another mount.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Daniel Gryniewicz <dang>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: aloganat, amukherj, asoman, bugs, ffilz, jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhs-bugs, sbhaloth, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.4.1-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1373498
: 1403648 (view as bug list) Environment:
Last Closed: 2017-03-23 06:23:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1373498    
Bug Blocks: 1351528, 1382912, 1403648    

Description Shashank Raj 2016-09-27 12:20:10 UTC
+++ This bug was initially created as a clone of Bug #1373498 +++

Description of problem:

Creation of files hang while doing ls from another mount.

Version-Release number of selected component (if applicable):

[root@dhcp43-116 ~]# rpm -qa|grep glusterfs
glusterfs-geo-replication-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-api-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-fuse-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-server-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-libs-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-client-xlators-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-cli-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-debuginfo-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-3.8.3-0.6.git7956718.el7.centos.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64
nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64


How reproducible:

Always

Steps to Reproduce:
1.Create a volume and mount it via ganesha on 2 clients.
2.Start creating files from both the clients.
3.Once creation is done on one client, do an ls -ltr on that client.
4.Observe that as soon as you issue an ls on one client, file creations gets hanged on other client.
5.Also, ls -ltr doesn't gives any output. See below:

[root@dhcp46-206 nfs1]# ls -ltr
total 0
[root@dhcp46-206 nfs1]# 


Actual results:

So there are 2 issues here.
1) ls on one client hangs file creation on other client.
2) ls -ltr doesn't give expected output.

Expected results:

There should not be any hangs and ls output should be as expected.

Additional info:

--- Additional comment from Soumya Koduri on 2016-09-06 08:30:50 EDT ---

Both the operations (readdir and creation) are operating on the same parent directory. Looks like md-cache takes contect_lock on the parent directory for the entire duration of readdir operation resulting in blocking other operations like creation of files in this case. This needs to be fixed.

Not sure why 'ls -l' returned zero entries. Have to debug. Shashank does 'ls' works fine when there are no other client operating on the same directory?

--- Additional comment from Shashank Raj on 2016-09-06 10:57:16 EDT ---

while running ls -ltr from one client without any other client operation, i waited for around 2.5 hours (there are around 150000 files) and ls didnt list anything and was kind of stuck and below continuous messages are seen in /var/log/messages on client side:

Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa4c58ae66322418c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x90099f91be45f6ea
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x891ca18c9c967f9f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8ae1457af5d444fd
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8e18aaf9ef56a681
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb1049f34b8bbadf0
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xabf94baa3d664b6f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb5a3d7a5e893770e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb0cc9ca8578c4b04
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbe9707239b56ddc2
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa3d88ed35013d15b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x82de97c20198419b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xae467bc51845ad16
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xaf861e5fe53bfff3
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x804100c9a8de9e4f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xad8c318e8637d603
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x901a467f239015aa
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbfe501169d543d3c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb3defa4672d5ce34
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbd1cad8c38ef73e8
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb529f6477843ca2e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb6af80b6e5d40fd7
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9601ff9ce075783b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9db37de222a44875
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x965793242be5b9c0

Packet trace from both client and server side and /var/log/messages can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1373498

--- Additional comment from Soumya Koduri on 2016-09-12 01:56:27 EDT ---

This is expected behaviour as per the current design. We had discussion regarding this issue within the community. This is part of the existing design and is expected to be addressed as part of performance enhancements targetted for upstream 2.5 release.

Comment 2 Shashank Raj 2016-10-14 09:27:01 UTC
So while testing this again, i found out 2 things here:

1) file creations are not completely hanged, it is creating files during ls from another mount but the file creations are very very slow and it looks like it is hanged.

2) Even when there are just 40000 (100KB) files, ls is hanged for more than 2 hours and still going on.

So if not the file creation part, then definitely we need to look at why ls is not listing anything for so long.

packet traces can be accessed at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1379673 

based on the above observation, can we change internal whiteboard from 3.2.0 beyond to 3.2.0?

Comment 3 Shashank Raj 2016-10-14 11:35:52 UTC
As per discussion with Soumya, moving it back to 3.2.0

Comment 4 Shashank Raj 2016-10-14 11:40:52 UTC
Just an update:

Till the file creations were going on ls was hanged (~4 hours) and as soon as i stopped file creation, ls listed the files.

Comment 9 Arthy Loganathan 2016-11-23 06:14:21 UTC
I have executed the test with md-cache setting configured on the volume and with 2 clients, I am still seeing the issues.

1) When there are ~40000 files, ls gets hung for more than two hours.

2) File creation(each file 100k size) is very slow when doing ls on mountpoint from other client.

3) ls takes more time to list files even after stopping the file creation on one client.

[root@dhcp46-42 ~]# cat /run/gluster/shared_storage/nfs-ganesha/exports/export.vol1.conf 
# WARNING : Using Gluster CLI will overwrite manual
# changes made to this file. To avoid it, edit the
# file and run ganesha-ha.sh --refresh-config.
EXPORT{
      Export_Id = 2;
      Path = "/vol1";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="vol1";
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/vol1";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
      Attr_Expiration_Time = 600;
     }

[root@dhcp46-42 ~]# gluster vol info vol1
 
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: b5cc5139-d581-4454-a97a-b296bbce6f28
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick0/br0
Brick2: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick0/br0
Brick3: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick0/br0
Brick4: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick0/br0
Brick5: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick1/br1
Brick6: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick1/br1
Brick7: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick1/br1
Brick8: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick1/br1
Brick9: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick2/br2
Brick10: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick2/br2
Brick11: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick2/br2
Brick12: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick2/br2
Options Reconfigured:
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

gdb logs are at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1379673/

Comment 10 Soumya Koduri 2016-11-23 06:42:32 UTC
Thanks for re-checking Arthy.

Interval1: 

Thread 198 (Thread 0x7f56e78ab700 (LWP 20792)):
#0  0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56e78a95c0, iatt=iatt@entry=0x7f56e78a9600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x7f56889dfc54, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2  0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f561083dd10 "def_4911787", iatt=iatt@entry=0x7f56e78a9780, force_lookup=<optimized out>) at glfs-resolve.c:311
#3  0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f56e78a9be3 "def_4911787", loc=loc@entry=0x7f56e78a9880, iatt=iatt@entry=0x7f56e78a98c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417
#4  0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f56e78a9be3 "def_4911787", stat=stat@entry=0x7f56e78a99a0, follow=follow@entry=0) at glfs-handleops.c:102
#5  0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f56e78a9be3 "def_4911787", stat=stat@entry=0x7f56e78a99a0) at glfs-handleops.c:133
#6  0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f56e78a9be3 "def_4911787", handle=handle@entry=0x7f56e78a9ae8, attrs_out=attrs_out@entry=0x7f56e78a9af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112
#7  0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f56e78a9d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f56e78a9d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201
#8  0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475
#9  0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f56e78a9e78, dir_state=0x7f56e78a9e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f56e78a9f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577
#10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=578627923506410052, nbfound=nbfound@entry=0x7f56e78a9f5c, eod_met=eod_met@entry=0x7f56e78a9f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f56e78a9f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505
#11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f5640592b70, data=0x7f56e78aa180, resp=0x7f56108b1130) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631
#12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f5610038770) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f56405894a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d00f5e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d00f5e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f5719ba673d in clone () from /lib64/libc.so.6
Thread 197 (Thread 0x7f56e70aa700 (LWP 20793)):


Interval 2:
Thread 30 (Thread 0x7f5693803700 (LWP 20960)):
#0  0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56938015c0, iatt=iatt@entry=0x7f5693801600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x7f56889e7090, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2  0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f563060cc10 "def7468", iatt=iatt@entry=0x7f5693801780, force_lookup=<optimized out>) at glfs-resolve.c:311
#3  0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f5693801be3 "def7468", loc=loc@entry=0x7f5693801880, iatt=iatt@entry=0x7f56938018c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417
#4  0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f5693801be3 "def7468", stat=stat@entry=0x7f56938019a0, follow=follow@entry=0) at glfs-handleops.c:102
#5  0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f5693801be3 "def7468", stat=stat@entry=0x7f56938019a0) at glfs-handleops.c:133
#6  0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f5693801be3 "def7468", handle=handle@entry=0x7f5693801ae8, attrs_out=attrs_out@entry=0x7f5693801af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112
#7  0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f5693801d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f5693801d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201
#8  0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475
#9  0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f5693801e78, dir_state=0x7f5693801e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f5693801f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577
#10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=1729522091335920037, nbfound=nbfound@entry=0x7f5693801f5c, eod_met=eod_met@entry=0x7f5693801f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f5693801f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505
#11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f567c1900e0, data=0x7f5693802180, resp=0x7f5630171b20) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631
#12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f563004bf60) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f567c26af50) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d040a60) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d040a60) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f5719ba673d in clone () from /lib64/libc.so.6
Thread 29 (Thread 0x7f5693002700 (LWP 20961)):

Thread 76 (Thread 0x7f56aa831700 (LWP 20914)):
#0  0x00007f571a4dae24 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x00007f571c023080 in mdc_lookup (mdc_parent=0x7f5688c9dec0, name=0x7f571d22f170 "def_176420", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7f56aa82f4c0, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:916
#2  0x00007f571c01a9eb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7f56aa82f578, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:166
#3  0x00007f571bf52c97 in fsal_lookup (parent=parent@entry=0x7f5688c9def8, name=0x7f571d22f170 "def_176420", obj=obj@entry=0x7f56aa82f578, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:712
#4  0x00007f571bf3fe0c in open4_ex (arg=arg@entry=0x7f567c36ad28, data=data@entry=0x7f56aa830180, res_OPEN4=res_OPEN4@entry=0x7f571d055668, clientid=<optimized out>, owner=0x7f56840f7d90, file_state=file_state@entry=0x7f56aa82ffa0, new_state=new_state@entry=0x7f56aa82ff8f) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1257
#5  0x00007f571bf88a39 in nfs4_op_open (op=0x7f567c36ad20, data=0x7f56aa830180, resp=0x7f571d055660) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1844
#6  0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f571d0c3360) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#7  0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f567c21a0c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#8  0x00007f571bf6d78a in worker_run (ctx=0x7f571d032fe0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#9  0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d032fe0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#10 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5719ba673d in clone () from /lib64/libc.so.6
Thread 75 (Thread 0x7f56aa030700 (LWP 20915)):


Interval 3:
Thread 109 (Thread 0x7f56bb052700 (LWP 20881)):
#0  0x00007f571a4dae24 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x00007f571c023080 in mdc_lookup (mdc_parent=0x7f5688c9dec0, name=0x7f568016d790 "def_176inew7", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7f56bb0504c0, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:916
#2  0x00007f571c01a9eb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7f56bb050578, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:166
#3  0x00007f571bf52c97 in fsal_lookup (parent=parent@entry=0x7f5688c9def8, name=0x7f568016d790 "def_176inew7", obj=obj@entry=0x7f56bb050578, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:712
#4  0x00007f571bf3fe0c in open4_ex (arg=arg@entry=0x7f564008d5a8, data=data@entry=0x7f56bb051180, res_OPEN4=res_OPEN4@entry=0x7f56801fe808, clientid=<optimized out>, owner=0x7f56840f7d90, file_state=file_state@entry=0x7f56bb050fa0, new_state=new_state@entry=0x7f56bb050f8f) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1257
#5  0x00007f571bf88a39 in nfs4_op_open (op=0x7f564008d5a0, data=0x7f56bb051180, resp=0x7f56801fe800) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1844
#6  0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f5680229bd0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#7  0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f56405894a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#8  0x00007f571bf6d78a in worker_run (ctx=0x7f571d029320) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#9  0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d029320) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#10 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5719ba673d in clone () from /lib64/libc.so.6
Thread 108 (Thread 0x7f56ba851700 (LWP 20882)):


Thread 85 (Thread 0x7f56af03a700 (LWP 20905)):
#0  0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56af0385c0, iatt=iatt@entry=0x7f56af038600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2  0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f566c4c7b50 "def_492600", iatt=iatt@entry=0x7f56af038780, force_lookup=<optimized out>) at glfs-resolve.c:311
#3  0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f56af038be3 "def_492600", loc=loc@entry=0x7f56af038880, iatt=iatt@entry=0x7f56af0388c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417
#4  0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f56af038be3 "def_492600", stat=stat@entry=0x7f56af0389a0, follow=follow@entry=0) at glfs-handleops.c:102
#5  0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f56af038be3 "def_492600", stat=stat@entry=0x7f56af0389a0) at glfs-handleops.c:133
#6  0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f56af038be3 "def_492600", handle=handle@entry=0x7f56af038ae8, attrs_out=attrs_out@entry=0x7f56af038af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112
#7  0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f56af038d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f56af038d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201
#8  0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475
#9  0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f56af038e78, dir_state=0x7f56af038e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f56af038f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577
#10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=3995323130471280627, nbfound=nbfound@entry=0x7f56af038f5c, eod_met=eod_met@entry=0x7f56af038f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f56af038f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505
#11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f564046dc40, data=0x7f56af039180, resp=0x7f566c29db00) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631
#12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f566c115cb0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f5640489ed0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d030520) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d030520) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f5719ba673d in clone () from /lib64/libc.so.6
Thread 84 (Thread 0x7f56ae839700 (LWP 20906)):



readdir+lookup consuming lot of time.

Comment 12 surabhi 2016-11-29 10:08:46 UTC
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 15 Daniel Gryniewicz 2016-11-30 17:43:47 UTC
Submitted a fix upstream:

https://review.gerrithub.io/304278
https://review.gerrithub.io/304279

With this fix, I'm able to run 'ls' on a directory of 1 million entries (it takes 1:20 - 2:00 minutes to run) while simultaneously operating on the directory (create/read/write/delete) from another client with no noticeable latency.

I had my dircache limit set to 100,000 for this test.  Default is ~1/2 million.

Comment 16 Ambarish 2017-01-28 14:13:46 UTC
ls still hangs/stalls during file creation,which is by design.to be tackled in Ganesha 2.5(tracked via BZ#https://bugzilla.redhat.com/show_bug.cgi?id=1403648)

New writes aren't hanging anymore.

Verified on 2.4.1-6/3.8.4-13.

Comment 18 errata-xmlrpc 2017-03-23 06:23:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html