Description of problem: While running I/O's on fuse/smb mount, adding brick is causing the I/O to fail. Even with remove-brick the I/O's are failing. The I/O's are failing on distribute as well as dis-rep volume both. Version-Release number of selected component (if applicable): glusterfs-api-devel-3.4.0.55rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-167.9.el6rhs.x86_64 glusterfs-fuse-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-libs-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.55rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.55rhs-1.el6rhs.x86_64 How reproducible: Always Steps to Reproduce: 1.create a dis or dis-rep volume 2.mount via fuse or smb 3.start i/O's on mount point 4.Do add-brick operation Actual results: I/O's are failing as soon as add brick operation is done Expected results: I/O's should not fail Additional info:
Created attachment 846574 [details] scripts used to run I/O
Tried the add-brick operation on volume when I/O's are running on the following build: glusterfs-geo-replication-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.53rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-167.9.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.40rhs-1.el6rhs.x86_64 glusterfs-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.53rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.53rhs-1.el6rhs.x86_64 Following are my observations: On SMB mount: The I/o's are not failing on the first add-brick operation but if we do it second time the I/O's are failing.But with the latest build glusterfs-3.4.0.53rhs-1.el6rhs.x86_64 as soon as add-brick is executed the I/O's failing. Same is true for fuse mount. Following is the test case which passed earlier as with one add-brick operation it never failed: https://tcms.engineering.redhat.com/case/304025/?from_plan=11532
With further analysis it looks like the issue happens when we create files in different layer of directories but if we create files in top level it is not failing.Also the error that we get is : Creating directory at /mnt/fuseMount/io//TestDir0/TestDir0 Creating files in /mnt/fuseMount/io//TestDir0/TestDir0...... Cannot open file: Invalid argument flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 90. Cannot lock - Bad file descriptor
I tried it on build 33 and was able to reproduce the bug on it. Here are the details: Creating directory at /mnt/withreaddir//TestDir0/TestDir2/TestDir2 Creating files in /mnt/withreaddir//TestDir0/TestDir2/TestDir2...... Cannot open file: No such file or directory flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 74. Cannot lock - Bad file descriptor root.42.178[Jan-08-2014- 6:30:55] >rpm -qa | grep gluster glusterfs-fuse-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-libs-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.33rhs-1.el6rhs.x86_64 Analysis as of now: Gluster fails to create/open a file when: a. File's hash corresponds to the new brick. b. File must not be directly under the / of the volume. c. Folder or multiple folders under which the file lies are not yet created on the new brick.
After the initial analysis, here is the observation: The issue is when I/O(mkdir/create) is going on, on a volume. Meanwhile a brick is added, because of which the graph changes. dht xlator is notified of the graph change, rpc-client xlator is also notified of the graph change, hence it(glusterfs) tries to reconnect with all the bricks(glusterfsd). If all the bricks are immediately reconnected then there is no issue seen. Some times it may happen that except for newly added brick all the others are immediately reconnected, and the I/O continues on the old bricks, but when the client(glusterfs) finally connects with the new brick(glusterfsd) the further I/O operation may fail. The issue needs to be debugged from 2 aspects: 1. The delay caused while the rp-client(glusterfs) tries to connect to the new brick(glusterfsd) 2. dht-xlator not healing the IO(mkdir/create) if it fails on the new brick.
The bug has changed its course on reproducible steps on analysis, than what is mentioned in bug description. Can we please have the exact steps to reproduce and also if it is consistent.
Exact steps are as follows which is similar to what is mentioned in steps to reproduce ,Only thing is to run I/O using the perl script attached here in the bz which creates files in multi layer of directories. Create a volume: any type dis,dis-rep Mount it via smb on windows client. Start perl script to run I/O's on mount point. Do add-brick operation result: with add-brick operation ,I/O fails with the following error: Creating directory at /mnt/fuseMount/io//TestDir0/TestDir0 Creating files in /mnt/fuseMount/io//TestDir0/TestDir0...... Cannot open file: Invalid argument flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 90. Cannot lock - Bad file descriptor Tried it 3 times on glusterfs-3.4.0.55rhs-1.el6rhs.x86_64 version and the issue is consistently reproducible.
Run the perl script as follows: perl Win-CreateDirTreeNFiles.pl Z:\date18 3 10000 10000 3 3
How does one recover from this situation ? do we have a workaround ?
Created attachment 850358 [details] script to run on fuse and nfs mount
As mentioned above gluster fails to create or open a file when it is not directly under root of the volume.If we try to create and open files inside multiple folders we are hitting this issue.So once I/O fails on that particular dir/folder still our mount point is accessible and we can start the i/o from a different dir but whatever was running before is failed. If we want to run further i/o's we could start it from diff folder.or we can run it on root of volume.
The only workaround to come up with this situation is to remove that directory and create a new one with that name and continue the i/o.
With further analysis and testing following are the findings: 1.With the script that is being used here takes exclusive lock on file and write to it so if add-brick is done when lock is still held the i/o may fail on this particular dir/file. 2.however mount point will still be accessible so we can remove this directory and start the i/o again on the same or from some other dir within the same mount point. 3.Tried with dd as well and we don't see i/o failing ,also creating directories and files without taking lock also runs fine.
Patches posted at https://code.engineering.redhat.com/gerrit/19113 https://code.engineering.redhat.com/gerrit/19114 https://code.engineering.redhat.com/gerrit/19115 As per discussion with Avati, these three upstream patches are the required fix for this bug. Compiled downstream code with three patches included and tried to reproduce the issue using script in the above comment, has not be able to reproduce the issue.
A duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1278399 Fixed by: https://code.engineering.redhat.com/gerrit/#/c/61036/2
*** This bug has been marked as a duplicate of bug 1278399 ***