Bug 1373498

Summary: Creation of file hangs while doing ls from another mount.
Product: [Retired] nfs-ganesha Reporter: Shashank Raj <sraj>
Component: Cache InodeAssignee: Soumya Koduri <skoduri>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: develCC: bugs, ffilz, jthottan, kkeithle, ndevos, pasik, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1379673 (view as bug list) Environment:
Last Closed: 2019-11-22 15:23:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1379673, 1403648    

Description Shashank Raj 2016-09-06 12:24:13 UTC
Description of problem:

Creation of files hang while doing ls from another mount.

Version-Release number of selected component (if applicable):

[root@dhcp43-116 ~]# rpm -qa|grep glusterfs
glusterfs-geo-replication-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-api-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-fuse-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-server-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-libs-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-client-xlators-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-cli-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-debuginfo-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-3.8.3-0.6.git7956718.el7.centos.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64
nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64


How reproducible:

Always

Steps to Reproduce:
1.Create a volume and mount it via ganesha on 2 clients.
2.Start creating files from both the clients.
3.Once creation is done on one client, do an ls -ltr on that client.
4.Observe that as soon as you issue an ls on one client, file creations gets hanged on other client.
5.Also, ls -ltr doesn't gives any output. See below:

[root@dhcp46-206 nfs1]# ls -ltr
total 0
[root@dhcp46-206 nfs1]# 


Actual results:

So there are 2 issues here.
1) ls on one client hangs file creation on other client.
2) ls -ltr doesn't give expected output.

Expected results:

There should not be any hangs and ls output should be as expected.

Additional info:

Comment 1 Soumya Koduri 2016-09-06 12:30:50 UTC
Both the operations (readdir and creation) are operating on the same parent directory. Looks like md-cache takes contect_lock on the parent directory for the entire duration of readdir operation resulting in blocking other operations like creation of files in this case. This needs to be fixed.

Not sure why 'ls -l' returned zero entries. Have to debug. Shashank does 'ls' works fine when there are no other client operating on the same directory?

Comment 2 Shashank Raj 2016-09-06 14:57:16 UTC
while running ls -ltr from one client without any other client operation, i waited for around 2.5 hours (there are around 150000 files) and ls didnt list anything and was kind of stuck and below continuous messages are seen in /var/log/messages on client side:

Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa4c58ae66322418c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x90099f91be45f6ea
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x891ca18c9c967f9f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8ae1457af5d444fd
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8e18aaf9ef56a681
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb1049f34b8bbadf0
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xabf94baa3d664b6f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb5a3d7a5e893770e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb0cc9ca8578c4b04
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbe9707239b56ddc2
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa3d88ed35013d15b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x82de97c20198419b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xae467bc51845ad16
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xaf861e5fe53bfff3
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x804100c9a8de9e4f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xad8c318e8637d603
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x901a467f239015aa
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbfe501169d543d3c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb3defa4672d5ce34
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbd1cad8c38ef73e8
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb529f6477843ca2e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb6af80b6e5d40fd7
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9601ff9ce075783b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9db37de222a44875
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x965793242be5b9c0

Packet trace from both client and server side and /var/log/messages can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1373498

Comment 3 Soumya Koduri 2016-09-12 05:56:27 UTC
This is expected behaviour as per the current design. We had discussion regarding this issue within the community. This is part of the existing design and is expected to be addressed as part of performance enhancements targetted for upstream 2.5 release.