Bug 1093324 - File creation fails on the NFS mount point while adding a brick to the same volume
Summary: File creation fails on the NFS mount point while adding a brick to the same v...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1090446 1286579 1286580
TreeView+ depends on / blocked
 
Reported: 2014-05-01 11:36 UTC by Niels de Vos
Modified: 2017-08-31 09:59 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1090446
Environment:
Last Closed: 2017-08-31 09:59:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Stand alone test case that reproduces the errors (2.38 KB, text/plain)
2014-05-01 11:36 UTC, Niels de Vos
no flags Details

Description Niels de Vos 2014-05-01 11:36:29 UTC
Created attachment 891461 [details]
Stand alone test case that reproduces the errors

Description of problem:

While files are getting created on the nfs mount point , adding bricks to the same volume make file creation failure.

Write on nfs mount is failing when add-brick operation is going on same volume.

If we mount a volume with NFS and performs some write operation like copy some files to this NFS mount in parallel with add brick operation to this volume.

Copy operation(write) is failing to a sub-directory on this volume.

cp: cannot create regular file `/mnt/testvol/testdir/file071': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file072': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file073': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file074': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file075': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file076': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file077': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file078': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file079': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file080': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file081': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file082': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file083': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file084': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file085': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file086': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file087': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file088': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file089': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file090': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file091': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file092': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file093': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file094': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file095': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file096': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file097': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file098': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file099': No such file or directory
cp: cannot create regular file `/mnt/testvol/testdir/file100': No such file or directory


Version-Release number of selected component (if applicable):
Current Gluster master branch with last commit:

> commit c52ab5eb52519d41b0ae146ec7b1276f2ffae9e9
> Author: Santosh Kumar Pradhan <spradhan>
> Date:   Thu Apr 24 14:39:20 2014 +0530
> 
>     glusterd: Help does not show performance.nfs.* opt

and earlier releases, at least including 3.4.

How reproducible:
Always

Steps to Reproduce:
1. Create a distributed volume
2. Mount the volume through NFS (on /mnt/testvol in this test )
3. Copy files to the volume with adding a brick.

Additional info:

Irrespective of volume topology if we perform writes in parallel with add-brick on NFS mount, we have above given issue.

One good point to note here is that, this issue does *NOT* exist with glusterfs-fuse mount I have tested in both distribute and distribute-replicate volume topologies. 


The following shows the events that should potentially fix this:
- a LOOKUP-by-GFID for a directory is done
- DHT receives an ESTALE on LOOKUP for at least one, but not all bricks/subvolumes
- layout of the whole parent directory tree needs to be corrected
- repeat the 1st LOOKUP-by-GFID
- return result of the LOOKUP-by-GFID, should not be ESTALE anymore

Note: ESTALE would be a valid return if all bricks/subvolumes return ESTALE

Attaching the test script bug-1090446.sh from the original bug.
This test-case includes an option to run 'rebalance fix-layout' when the first error happens, enabling that option makes the test succeed (disabled by default).

Comment 1 Niels de Vos 2014-05-04 16:23:12 UTC
Started a discussion to get more ideas on fixing this problem:
- http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6300

Comment 2 Susant Kumar Palai 2017-08-31 09:05:31 UTC
Tried to reproduce the issue.

Started from 2 bricks and expanded up to 18 bricks. And copied files to mount.

[root@vm1 upstream]# gvi
 
Volume Name: test1
Type: Distribute
Volume ID: 794db521-2b22-48bb-9a26-42e2c606865d
Status: Started
Snapshot Count: 0
Number of Bricks: 18
Transport-type: tcp
Bricks:
Brick1: vm1:/brick/1
Brick2: vm1:/brick/2
Brick3: vm1:/brick/3
Brick4: vm1:/brick/4
Brick5: vm1:/brick/5
Brick6: vm1:/brick/6
Brick7: vm1:/brick/7
Brick8: vm1:/brick/8
Brick9: vm1:/brick/9
Brick10: vm1:/brick/10
Brick11: vm1:/brick/11
Brick12: vm1:/brick/12
Brick13: vm1:/brick/13
Brick14: vm1:/brick/14
Brick15: vm1:/brick/15
Brick16: vm1:/brick/16
Brick17: vm1:/brick/17
Brick18: vm1:/brick/18
Options Reconfigured:
nfs.disable: off
transport.address-family: inet


IO paused during the switch but did not throw any error.

Niels, can you reproduce the issue on latest master and update here.

Comment 3 Niels de Vos 2017-08-31 09:48:40 UTC
(In reply to Susant Kumar Palai from comment #2)
> Tried to reproduce the issue.
> 
> Started from 2 bricks and expanded up to 18 bricks. And copied files to
> mount.

Did you use the attached test script? If that does not reproduce the problem, it may have been fixed through some other patches. In that case, it would be good to have an idea which patches could have fixed it, and this bug can then be closed as a duplicate of the one that was used to merge the changes.

Comment 4 Susant Kumar Palai 2017-08-31 09:59:10 UTC
Thanks Niels for the input.

Ran the script four times. It passed seamlessly. Will close this bug as it's working on latest master. Going to figure out what fixed this will be difficult as the bug is 3 years old.


Note You need to log in before you can comment on or make changes to this bug.