Bug 1049181

Summary: File creation in nested folders fails when add-brick operation is done on a volume with exclusive file lock.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: glusterfsAssignee: Raghavendra Talur <rtalur>
Status: CLOSED DUPLICATE QA Contact: surabhi <sbhaloth>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: grajaiya, ira, lmohanty, nbalacha, pgurusid, rgowdapp, rtalur, sbhaloth, shmohan, spalai, vagarwal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: dht-add-brick
Fixed In Version: glusterfs-3.7.5-6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1279830 (view as bug list) Environment:
Last Closed: 2015-11-27 10:32:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1278399    
Bug Blocks: 1279830    
Attachments:
Description Flags
scripts used to run I/O
none
script to run on fuse and nfs mount none

Description surabhi 2014-01-07 07:42:43 UTC
Description of problem:
While running I/O's on fuse/smb mount, adding brick is causing the I/O to fail.
Even with remove-brick the I/O's are failing.
The I/O's are failing on distribute as well as dis-rep volume both.

Version-Release number of selected component (if applicable):

glusterfs-api-devel-3.4.0.55rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-167.9.el6rhs.x86_64
glusterfs-fuse-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.55rhs-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.create a dis or dis-rep volume
2.mount via fuse or smb
3.start i/O's on mount point
4.Do add-brick operation 

Actual results:
I/O's are failing as soon as add brick operation is done

Expected results:
I/O's should not fail

Additional info:

Comment 1 surabhi 2014-01-07 10:34:10 UTC
Created attachment 846574 [details]
scripts used to run I/O

Comment 2 surabhi 2014-01-07 10:42:03 UTC
Tried the add-brick operation on volume when I/O's are running on the following build:
 
glusterfs-geo-replication-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.53rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-167.9.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.40rhs-1.el6rhs.x86_64
glusterfs-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.53rhs-1.el6rhs.x86_64

Following are my observations:
On SMB mount: The I/o's are not failing on the first add-brick operation but if we do it second time the I/O's are failing.But with the latest build glusterfs-3.4.0.53rhs-1.el6rhs.x86_64 as soon as add-brick is executed the I/O's failing.
Same is true for fuse mount.
Following is the test case which passed earlier as with one add-brick operation it never failed:
https://tcms.engineering.redhat.com/case/304025/?from_plan=11532

Comment 3 surabhi 2014-01-07 11:54:44 UTC
With further analysis it looks like the issue happens when we create files in different layer of directories but if we create files in top level it is not failing.Also the error that we get is :
Creating directory at /mnt/fuseMount/io//TestDir0/TestDir0
Creating files in /mnt/fuseMount/io//TestDir0/TestDir0......
Cannot open file: Invalid argument
flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 90.
Cannot lock - Bad file descriptor

Comment 4 Raghavendra Talur 2014-01-08 10:45:07 UTC
I tried it on build 33 and was able to reproduce the bug on it.

Here are the details:
Creating directory at /mnt/withreaddir//TestDir0/TestDir2/TestDir2
Creating files in /mnt/withreaddir//TestDir0/TestDir2/TestDir2......
Cannot open file: No such file or directory
flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 74.
Cannot lock - Bad file descriptor


root.42.178[Jan-08-2014- 6:30:55] >rpm -qa | grep gluster
glusterfs-fuse-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.33rhs-1.el6rhs.x86_64



Analysis as of now:

Gluster fails to create/open a file when:
a. File's hash corresponds to the new brick.
b. File must not be directly under the / of the volume.
c. Folder or multiple folders under which the file lies are not yet created on the new brick.

Comment 5 Poornima G 2014-01-13 16:21:57 UTC
After the initial analysis, here is the observation:

The issue is when I/O(mkdir/create) is going on, on a volume. Meanwhile a brick is added, because of which the graph changes.

dht xlator is notified of the graph change, rpc-client xlator is also notified of the graph change, hence it(glusterfs) tries to reconnect with all the bricks(glusterfsd). 

If all the bricks are immediately reconnected then there is no issue seen. Some times it may happen that except for newly added brick all the others are immediately reconnected, and the I/O continues on the old bricks, but when the client(glusterfs) finally connects with the new brick(glusterfsd) the further I/O operation may fail.

The issue needs to be debugged from 2 aspects:
1. The delay caused while the rp-client(glusterfs) tries to connect to the new brick(glusterfsd)
2. dht-xlator not healing the IO(mkdir/create) if it fails on the new brick.

Comment 6 Gowrishankar Rajaiyan 2014-01-14 10:50:48 UTC
The bug has changed its course on reproducible steps on analysis, than what is mentioned in bug description. Can we please have the exact steps to reproduce and also if it is consistent.

Comment 7 surabhi 2014-01-15 06:46:38 UTC
Exact steps are as follows which is similar to what is mentioned in steps to reproduce ,Only thing is to run I/O using the perl script attached here in the bz which creates files in multi layer of directories.

Create a volume: any type dis,dis-rep
Mount it via smb on windows client.
Start perl script to run I/O's on mount point.
Do add-brick operation

result:
with add-brick operation ,I/O fails with the following error:
Creating directory at /mnt/fuseMount/io//TestDir0/TestDir0
Creating files in /mnt/fuseMount/io//TestDir0/TestDir0......
Cannot open file: Invalid argument
flock() on closed filehandle FH at ./CreateDirAndFileTree.pl line 90.
Cannot lock - Bad file descriptor

Tried it 3 times on glusterfs-3.4.0.55rhs-1.el6rhs.x86_64 version and the issue is consistently reproducible.

Comment 8 surabhi 2014-01-15 06:56:52 UTC
Run the perl script as follows:
perl Win-CreateDirTreeNFiles.pl Z:\date18 3 10000 10000 3 3

Comment 9 Gowrishankar Rajaiyan 2014-01-15 07:17:00 UTC
How does one recover from this situation ? do we have a workaround ?

Comment 10 surabhi 2014-01-15 07:20:35 UTC
Created attachment 850358 [details]
script to run on fuse and nfs mount

Comment 11 surabhi 2014-01-15 08:38:22 UTC
As mentioned above gluster fails to create or open a file when it is not directly under root of the volume.If we try to create and open files inside multiple folders we are hitting this issue.So once I/O fails on that particular dir/folder still our mount point is accessible and we can start the i/o from a different dir but whatever was running before is failed.
If we want to run further i/o's we could start it from diff folder.or we can run it on root of volume.

Comment 12 surabhi 2014-01-15 08:54:39 UTC
The only workaround to come up with this situation is to remove that directory and create a new one with that name and continue the i/o.

Comment 13 surabhi 2014-01-15 11:06:25 UTC
With further analysis and testing following are the findings:
1.With the script that is being used here takes exclusive lock on file and write to it so if add-brick is done when lock is still held the i/o may fail on this particular dir/file.
2.however mount point will still be accessible so we can remove this directory and start the i/o again on the same or from some other dir within the same mount point.
3.Tried with dd as well and we don't see i/o failing ,also creating directories and files without taking lock also runs fine.

Comment 16 Raghavendra Talur 2014-01-30 12:57:33 UTC
Patches posted at 
https://code.engineering.redhat.com/gerrit/19113
https://code.engineering.redhat.com/gerrit/19114
https://code.engineering.redhat.com/gerrit/19115

As per discussion with Avati, these three upstream patches
are the required fix for this bug.

Compiled downstream code with three patches included and tried to reproduce
the issue using script in the above comment, has not be able to reproduce the issue.

Comment 18 Susant Kumar Palai 2015-11-27 10:32:50 UTC

*** This bug has been marked as a duplicate of bug 1278399 ***