Bug 1373498 - Creation of file hangs while doing ls from another mount.
Summary: Creation of file hangs while doing ls from another mount.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: nfs-ganesha
Classification: Community
Component: Cache Inode
Version: devel
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1379673 1403648
TreeView+ depends on / blocked
 
Reported: 2016-09-06 12:24 UTC by Shashank Raj
Modified: 2019-11-22 15:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1379673 (view as bug list)
Environment:
Last Closed: 2019-11-22 15:23:33 UTC


Attachments (Terms of Use)

Description Shashank Raj 2016-09-06 12:24:13 UTC
Description of problem:

Creation of files hang while doing ls from another mount.

Version-Release number of selected component (if applicable):

[root@dhcp43-116 ~]# rpm -qa|grep glusterfs
glusterfs-geo-replication-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-api-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-fuse-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-server-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-libs-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-client-xlators-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-cli-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-debuginfo-3.8.3-0.6.git7956718.el7.centos.x86_64
glusterfs-3.8.3-0.6.git7956718.el7.centos.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64
nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64


How reproducible:

Always

Steps to Reproduce:
1.Create a volume and mount it via ganesha on 2 clients.
2.Start creating files from both the clients.
3.Once creation is done on one client, do an ls -ltr on that client.
4.Observe that as soon as you issue an ls on one client, file creations gets hanged on other client.
5.Also, ls -ltr doesn't gives any output. See below:

[root@dhcp46-206 nfs1]# ls -ltr
total 0
[root@dhcp46-206 nfs1]# 


Actual results:

So there are 2 issues here.
1) ls on one client hangs file creation on other client.
2) ls -ltr doesn't give expected output.

Expected results:

There should not be any hangs and ls output should be as expected.

Additional info:

Comment 1 Soumya Koduri 2016-09-06 12:30:50 UTC
Both the operations (readdir and creation) are operating on the same parent directory. Looks like md-cache takes contect_lock on the parent directory for the entire duration of readdir operation resulting in blocking other operations like creation of files in this case. This needs to be fixed.

Not sure why 'ls -l' returned zero entries. Have to debug. Shashank does 'ls' works fine when there are no other client operating on the same directory?

Comment 2 Shashank Raj 2016-09-06 14:57:16 UTC
while running ls -ltr from one client without any other client operation, i waited for around 2.5 hours (there are around 150000 files) and ls didnt list anything and was kind of stuck and below continuous messages are seen in /var/log/messages on client side:

Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa4c58ae66322418c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x90099f91be45f6ea
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x891ca18c9c967f9f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8ae1457af5d444fd
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x8e18aaf9ef56a681
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb1049f34b8bbadf0
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xabf94baa3d664b6f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb5a3d7a5e893770e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb0cc9ca8578c4b04
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbe9707239b56ddc2
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xa3d88ed35013d15b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x82de97c20198419b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xae467bc51845ad16
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xaf861e5fe53bfff3
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x804100c9a8de9e4f
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xad8c318e8637d603
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x901a467f239015aa
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbfe501169d543d3c
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb3defa4672d5ce34
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xbd1cad8c38ef73e8
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb529f6477843ca2e
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0xb6af80b6e5d40fd7
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9601ff9ce075783b
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x9db37de222a44875
Sep  6 20:07:30 dhcp46-206 kernel: NFS: server 10.70.40.192 error: fileid changed#012fsid 0:39: expected fileid 0x1, got 0x965793242be5b9c0

Packet trace from both client and server side and /var/log/messages can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1373498

Comment 3 Soumya Koduri 2016-09-12 05:56:27 UTC
This is expected behaviour as per the current design. We had discussion regarding this issue within the community. This is part of the existing design and is expected to be addressed as part of performance enhancements targetted for upstream 2.5 release.


Note You need to log in before you can comment on or make changes to this bug.