Created attachment 891461 [details] Stand alone test case that reproduces the errors Description of problem: While files are getting created on the nfs mount point , adding bricks to the same volume make file creation failure. Write on nfs mount is failing when add-brick operation is going on same volume. If we mount a volume with NFS and performs some write operation like copy some files to this NFS mount in parallel with add brick operation to this volume. Copy operation(write) is failing to a sub-directory on this volume. cp: cannot create regular file `/mnt/testvol/testdir/file071': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file072': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file073': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file074': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file075': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file076': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file077': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file078': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file079': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file080': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file081': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file082': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file083': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file084': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file085': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file086': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file087': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file088': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file089': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file090': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file091': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file092': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file093': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file094': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file095': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file096': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file097': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file098': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file099': No such file or directory cp: cannot create regular file `/mnt/testvol/testdir/file100': No such file or directory Version-Release number of selected component (if applicable): Current Gluster master branch with last commit: > commit c52ab5eb52519d41b0ae146ec7b1276f2ffae9e9 > Author: Santosh Kumar Pradhan <spradhan> > Date: Thu Apr 24 14:39:20 2014 +0530 > > glusterd: Help does not show performance.nfs.* opt and earlier releases, at least including 3.4. How reproducible: Always Steps to Reproduce: 1. Create a distributed volume 2. Mount the volume through NFS (on /mnt/testvol in this test ) 3. Copy files to the volume with adding a brick. Additional info: Irrespective of volume topology if we perform writes in parallel with add-brick on NFS mount, we have above given issue. One good point to note here is that, this issue does *NOT* exist with glusterfs-fuse mount I have tested in both distribute and distribute-replicate volume topologies. The following shows the events that should potentially fix this: - a LOOKUP-by-GFID for a directory is done - DHT receives an ESTALE on LOOKUP for at least one, but not all bricks/subvolumes - layout of the whole parent directory tree needs to be corrected - repeat the 1st LOOKUP-by-GFID - return result of the LOOKUP-by-GFID, should not be ESTALE anymore Note: ESTALE would be a valid return if all bricks/subvolumes return ESTALE Attaching the test script bug-1090446.sh from the original bug. This test-case includes an option to run 'rebalance fix-layout' when the first error happens, enabling that option makes the test succeed (disabled by default).
Started a discussion to get more ideas on fixing this problem: - http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6300
Tried to reproduce the issue. Started from 2 bricks and expanded up to 18 bricks. And copied files to mount. [root@vm1 upstream]# gvi Volume Name: test1 Type: Distribute Volume ID: 794db521-2b22-48bb-9a26-42e2c606865d Status: Started Snapshot Count: 0 Number of Bricks: 18 Transport-type: tcp Bricks: Brick1: vm1:/brick/1 Brick2: vm1:/brick/2 Brick3: vm1:/brick/3 Brick4: vm1:/brick/4 Brick5: vm1:/brick/5 Brick6: vm1:/brick/6 Brick7: vm1:/brick/7 Brick8: vm1:/brick/8 Brick9: vm1:/brick/9 Brick10: vm1:/brick/10 Brick11: vm1:/brick/11 Brick12: vm1:/brick/12 Brick13: vm1:/brick/13 Brick14: vm1:/brick/14 Brick15: vm1:/brick/15 Brick16: vm1:/brick/16 Brick17: vm1:/brick/17 Brick18: vm1:/brick/18 Options Reconfigured: nfs.disable: off transport.address-family: inet IO paused during the switch but did not throw any error. Niels, can you reproduce the issue on latest master and update here.
(In reply to Susant Kumar Palai from comment #2) > Tried to reproduce the issue. > > Started from 2 bricks and expanded up to 18 bricks. And copied files to > mount. Did you use the attached test script? If that does not reproduce the problem, it may have been fixed through some other patches. In that case, it would be good to have an idea which patches could have fixed it, and this bug can then be closed as a duplicate of the one that was used to merge the changes.
Thanks Niels for the input. Ran the script four times. It passed seamlessly. Will close this bug as it's working on latest master. Going to figure out what fixed this will be difficult as the bug is 3 years old.