+++ This bug was initially created as a clone of Bug #1373498 +++ Description of problem: Creation of files hang while doing ls from another mount. Version-Release number of selected component (if applicable): [root@dhcp43-116 ~]# rpm -qa|grep glusterfs glusterfs-geo-replication-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-api-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-fuse-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-server-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-libs-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-client-xlators-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-cli-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-debuginfo-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-3.8.3-0.6.git7956718.el7.centos.x86_64 [root@dhcp43-116 ~]# rpm -qa|grep ganesha nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64 glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64 nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64 nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a volume and mount it via ganesha on 2 clients. 2.Start creating files from both the clients. 3.Once creation is done on one client, do an ls -ltr on that client. 4.Observe that as soon as you issue an ls on one client, file creations gets hanged on other client. 5.Also, ls -ltr doesn't gives any output. See below: [root@dhcp46-206 nfs1]# ls -ltr total 0 [root@dhcp46-206 nfs1]# Actual results: So there are 2 issues here. 1) ls on one client hangs file creation on other client. 2) ls -ltr doesn't give expected output. Expected results: There should not be any hangs and ls output should be as expected. Additional info: --- Additional comment from Soumya Koduri on 2016-09-06 08:30:50 EDT --- Both the operations (readdir and creation) are operating on the same parent directory. Looks like md-cache takes contect_lock on the parent directory for the entire duration of readdir operation resulting in blocking other operations like creation of files in this case. This needs to be fixed. Not sure why 'ls -l' returned zero entries. Have to debug. Shashank does 'ls' works fine when there are no other client operating on the same directory? --- Additional comment from Shashank Raj on 2016-09-06 10:57:16 EDT --- while running ls -ltr from one client without any other client operation, i waited for around 2.5 hours (there are around 150000 files) and ls didnt list anything and was kind of stuck and below continuous messages are seen in /var/log/messages on client side: Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa4c58ae66322418c Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x90099f91be45f6ea Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x891ca18c9c967f9f Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8ae1457af5d444fd Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8e18aaf9ef56a681 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb1049f34b8bbadf0 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xabf94baa3d664b6f Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb5a3d7a5e893770e Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb0cc9ca8578c4b04 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbe9707239b56ddc2 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa3d88ed35013d15b Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x82de97c20198419b Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xae467bc51845ad16 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xaf861e5fe53bfff3 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x804100c9a8de9e4f Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xad8c318e8637d603 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x901a467f239015aa Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbfe501169d543d3c Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb3defa4672d5ce34 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbd1cad8c38ef73e8 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb529f6477843ca2e Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb6af80b6e5d40fd7 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9601ff9ce075783b Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9db37de222a44875 Sep 6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x965793242be5b9c0 Packet trace from both client and server side and /var/log/messages can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1373498 --- Additional comment from Soumya Koduri on 2016-09-12 01:56:27 EDT --- This is expected behaviour as per the current design. We had discussion regarding this issue within the community. This is part of the existing design and is expected to be addressed as part of performance enhancements targetted for upstream 2.5 release.
So while testing this again, i found out 2 things here: 1) file creations are not completely hanged, it is creating files during ls from another mount but the file creations are very very slow and it looks like it is hanged. 2) Even when there are just 40000 (100KB) files, ls is hanged for more than 2 hours and still going on. So if not the file creation part, then definitely we need to look at why ls is not listing anything for so long. packet traces can be accessed at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1379673 based on the above observation, can we change internal whiteboard from 3.2.0 beyond to 3.2.0?
As per discussion with Soumya, moving it back to 3.2.0
Just an update: Till the file creations were going on ls was hanged (~4 hours) and as soon as i stopped file creation, ls listed the files.
I have executed the test with md-cache setting configured on the volume and with 2 clients, I am still seeing the issues. 1) When there are ~40000 files, ls gets hung for more than two hours. 2) File creation(each file 100k size) is very slow when doing ls on mountpoint from other client. 3) ls takes more time to list files even after stopping the file creation on one client. [root@dhcp46-42 ~]# cat /run/gluster/shared_storage/nfs-ganesha/exports/export.vol1.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id = 2; Path = "/vol1"; FSAL { name = GLUSTER; hostname="localhost"; volume="vol1"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/vol1"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; Attr_Expiration_Time = 600; } [root@dhcp46-42 ~]# gluster vol info vol1 Volume Name: vol1 Type: Distributed-Replicate Volume ID: b5cc5139-d581-4454-a97a-b296bbce6f28 Status: Started Snapshot Count: 0 Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick0/br0 Brick2: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick0/br0 Brick3: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick0/br0 Brick4: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick0/br0 Brick5: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick1/br1 Brick6: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick1/br1 Brick7: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick1/br1 Brick8: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick1/br1 Brick9: dhcp46-42.lab.eng.blr.redhat.com:/bricks/brick2/br2 Brick10: dhcp46-101.lab.eng.blr.redhat.com:/bricks/brick2/br2 Brick11: dhcp47-155.lab.eng.blr.redhat.com:/bricks/brick2/br2 Brick12: dhcp47-167.lab.eng.blr.redhat.com:/bricks/brick2/br2 Options Reconfigured: performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable gdb logs are at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1379673/
Thanks for re-checking Arthy. Interval1: Thread 198 (Thread 0x7f56e78ab700 (LWP 20792)): #0 0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56e78a95c0, iatt=iatt@entry=0x7f56e78a9600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x7f56889dfc54, xdata_out=xdata_out@entry=0x0) at syncop.c:1223 #2 0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f561083dd10 "def_4911787", iatt=iatt@entry=0x7f56e78a9780, force_lookup=<optimized out>) at glfs-resolve.c:311 #3 0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f56e78a9be3 "def_4911787", loc=loc@entry=0x7f56e78a9880, iatt=iatt@entry=0x7f56e78a98c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417 #4 0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f56e78a9be3 "def_4911787", stat=stat@entry=0x7f56e78a99a0, follow=follow@entry=0) at glfs-handleops.c:102 #5 0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f56e78a9be3 "def_4911787", stat=stat@entry=0x7f56e78a99a0) at glfs-handleops.c:133 #6 0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f56e78a9be3 "def_4911787", handle=handle@entry=0x7f56e78a9ae8, attrs_out=attrs_out@entry=0x7f56e78a9af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112 #7 0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f56e78a9d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f56e78a9d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201 #8 0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475 #9 0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f56e78a9e78, dir_state=0x7f56e78a9e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f56e78a9f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577 #10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=578627923506410052, nbfound=nbfound@entry=0x7f56e78a9f5c, eod_met=eod_met@entry=0x7f56e78a9f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f56e78a9f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505 #11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f5640592b70, data=0x7f56e78aa180, resp=0x7f56108b1130) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631 #12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f5610038770) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f56405894a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d00f5e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d00f5e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f5719ba673d in clone () from /lib64/libc.so.6 Thread 197 (Thread 0x7f56e70aa700 (LWP 20793)): Interval 2: Thread 30 (Thread 0x7f5693803700 (LWP 20960)): #0 0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56938015c0, iatt=iatt@entry=0x7f5693801600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x7f56889e7090, xdata_out=xdata_out@entry=0x0) at syncop.c:1223 #2 0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f563060cc10 "def7468", iatt=iatt@entry=0x7f5693801780, force_lookup=<optimized out>) at glfs-resolve.c:311 #3 0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f5693801be3 "def7468", loc=loc@entry=0x7f5693801880, iatt=iatt@entry=0x7f56938018c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417 #4 0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f5693801be3 "def7468", stat=stat@entry=0x7f56938019a0, follow=follow@entry=0) at glfs-handleops.c:102 #5 0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f5693801be3 "def7468", stat=stat@entry=0x7f56938019a0) at glfs-handleops.c:133 #6 0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f5693801be3 "def7468", handle=handle@entry=0x7f5693801ae8, attrs_out=attrs_out@entry=0x7f5693801af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112 #7 0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f5693801d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f5693801d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201 #8 0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475 #9 0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f5693801e78, dir_state=0x7f5693801e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f5693801f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577 #10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=1729522091335920037, nbfound=nbfound@entry=0x7f5693801f5c, eod_met=eod_met@entry=0x7f5693801f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f5693801f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505 #11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f567c1900e0, data=0x7f5693802180, resp=0x7f5630171b20) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631 #12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f563004bf60) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f567c26af50) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d040a60) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d040a60) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f5719ba673d in clone () from /lib64/libc.so.6 Thread 29 (Thread 0x7f5693002700 (LWP 20961)): Thread 76 (Thread 0x7f56aa831700 (LWP 20914)): #0 0x00007f571a4dae24 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0 #1 0x00007f571c023080 in mdc_lookup (mdc_parent=0x7f5688c9dec0, name=0x7f571d22f170 "def_176420", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7f56aa82f4c0, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:916 #2 0x00007f571c01a9eb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7f56aa82f578, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:166 #3 0x00007f571bf52c97 in fsal_lookup (parent=parent@entry=0x7f5688c9def8, name=0x7f571d22f170 "def_176420", obj=obj@entry=0x7f56aa82f578, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:712 #4 0x00007f571bf3fe0c in open4_ex (arg=arg@entry=0x7f567c36ad28, data=data@entry=0x7f56aa830180, res_OPEN4=res_OPEN4@entry=0x7f571d055668, clientid=<optimized out>, owner=0x7f56840f7d90, file_state=file_state@entry=0x7f56aa82ffa0, new_state=new_state@entry=0x7f56aa82ff8f) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1257 #5 0x00007f571bf88a39 in nfs4_op_open (op=0x7f567c36ad20, data=0x7f56aa830180, resp=0x7f571d055660) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1844 #6 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f571d0c3360) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #7 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f567c21a0c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #8 0x00007f571bf6d78a in worker_run (ctx=0x7f571d032fe0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #9 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d032fe0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #10 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f5719ba673d in clone () from /lib64/libc.so.6 Thread 75 (Thread 0x7f56aa030700 (LWP 20915)): Interval 3: Thread 109 (Thread 0x7f56bb052700 (LWP 20881)): #0 0x00007f571a4dae24 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0 #1 0x00007f571c023080 in mdc_lookup (mdc_parent=0x7f5688c9dec0, name=0x7f568016d790 "def_176inew7", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7f56bb0504c0, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:916 #2 0x00007f571c01a9eb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7f56bb050578, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:166 #3 0x00007f571bf52c97 in fsal_lookup (parent=parent@entry=0x7f5688c9def8, name=0x7f568016d790 "def_176inew7", obj=obj@entry=0x7f56bb050578, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:712 #4 0x00007f571bf3fe0c in open4_ex (arg=arg@entry=0x7f564008d5a8, data=data@entry=0x7f56bb051180, res_OPEN4=res_OPEN4@entry=0x7f56801fe808, clientid=<optimized out>, owner=0x7f56840f7d90, file_state=file_state@entry=0x7f56bb050fa0, new_state=new_state@entry=0x7f56bb050f8f) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1257 #5 0x00007f571bf88a39 in nfs4_op_open (op=0x7f564008d5a0, data=0x7f56bb051180, resp=0x7f56801fe800) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_open.c:1844 #6 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f5680229bd0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #7 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f56405894a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #8 0x00007f571bf6d78a in worker_run (ctx=0x7f571d029320) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #9 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d029320) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #10 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f5719ba673d in clone () from /lib64/libc.so.6 Thread 108 (Thread 0x7f56ba851700 (LWP 20882)): Thread 85 (Thread 0x7f56af03a700 (LWP 20905)): #0 0x00007f571a4db6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f568df26b1b in syncop_lookup (subvol=subvol@entry=0x7f5648026b40, loc=loc@entry=0x7f56af0385c0, iatt=iatt@entry=0x7f56af038600, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1223 #2 0x00007f568e1ceb27 in glfs_resolve_component (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, parent=parent@entry=0x7f563c00a47c, component=component@entry=0x7f566c4c7b50 "def_492600", iatt=iatt@entry=0x7f56af038780, force_lookup=<optimized out>) at glfs-resolve.c:311 #3 0x00007f568e1cf2d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f5688098b20, subvol=subvol@entry=0x7f5648026b40, at=at@entry=0x7f563c00a47c, origpath=origpath@entry=0x7f56af038be3 "def_492600", loc=loc@entry=0x7f56af038880, iatt=iatt@entry=0x7f56af0388c0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417 #4 0x00007f568e1d0c18 in pub_glfs_h_lookupat (fs=0x7f5688098b20, parent=<optimized out>, path=path@entry=0x7f56af038be3 "def_492600", stat=stat@entry=0x7f56af0389a0, follow=follow@entry=0) at glfs-handleops.c:102 #5 0x00007f568e1d0d18 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7f56af038be3 "def_492600", stat=stat@entry=0x7f56af0389a0) at glfs-handleops.c:133 #6 0x00007f568e5ec8bb in lookup (parent=parent@entry=0x7f5688c9dc08, path=path@entry=0x7f56af038be3 "def_492600", handle=handle@entry=0x7f56af038ae8, attrs_out=attrs_out@entry=0x7f56af038af0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:112 #7 0x00007f568e5ecc38 in read_dirents (dir_hdl=0x7f5688c9dc08, whence=<optimized out>, dir_state=0x7f56af038d60, cb=0x7f571c024690 <mdc_populate_dirent>, attrmask=9223372036855160782, eof=0x7f56af038d4f) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:201 #8 0x00007f571c024c70 in mdcache_dirent_populate (dir=dir@entry=0x7f5688c9dec0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1475 #9 0x00007f571c01a016 in mdcache_readdir (dir_hdl=0x7f5688c9def8, whence=0x7f56af038e78, dir_state=0x7f56af038e90, cb=0x7f571bf519a0 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f56af038f5b) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:577 #10 0x00007f571bf537ad in fsal_readdir (directory=directory@entry=0x7f5688c9def8, cookie=cookie@entry=3995323130471280627, nbfound=nbfound@entry=0x7f56af038f5c, eod_met=eod_met@entry=0x7f56af038f5b, attrmask=122830, cb=cb@entry=0x7f571bf8d520 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f56af038f60) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1505 #11 0x00007f571bf8e4eb in nfs4_op_readdir (op=0x7f564046dc40, data=0x7f56af039180, resp=0x7f566c29db00) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_readdir.c:631 #12 0x00007f571bf7af7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f566c115cb0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #13 0x00007f571bf6c12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f5640489ed0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #14 0x00007f571bf6d78a in worker_run (ctx=0x7f571d030520) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #15 0x00007f571bff7189 in fridgethr_start_routine (arg=0x7f571d030520) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #16 0x00007f571a4d7dc5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f5719ba673d in clone () from /lib64/libc.so.6 Thread 84 (Thread 0x7f56ae839700 (LWP 20906)): readdir+lookup consuming lot of time.
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack
Submitted a fix upstream: https://review.gerrithub.io/304278 https://review.gerrithub.io/304279 With this fix, I'm able to run 'ls' on a directory of 1 million entries (it takes 1:20 - 2:00 minutes to run) while simultaneously operating on the directory (create/read/write/delete) from another client with no noticeable latency. I had my dircache limit set to 100,000 for this test. Default is ~1/2 million.
ls still hangs/stalls during file creation,which is by design.to be tackled in Ganesha 2.5(tracked via BZ#https://bugzilla.redhat.com/show_bug.cgi?id=1403648) New writes aren't hanging anymore. Verified on 2.4.1-6/3.8.4-13.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html