Bug 1065705

Summary: ls & stat give inconsistent results after file deletion by another client
Product: [Community] GlusterFS Reporter: Louis Zuckerman <glusterbugs>
Component: coreAssignee: Vinayaga Raman <vraman>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4.2CC: glusterbugs, gluster-bugs, rwheeler, sasundar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-20 04:24:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Louis Zuckerman 2014-02-16 02:33:36 UTC
Description of problem:

Immediately after a file is deleted in one fuse client other clients can not see the file in a directory listing but can still stat the file for about one second.

I have tried disabling various performance xlators (see volume info below) but this behavior persists.

I have only tried this on a trivial one-brick volume, mounted locally, which I am using to test a libgfapi client.  I don't know if this problem affects multi-brick volumes or non-local clients.

Version-Release number of selected component (if applicable):

glusterfs 3.4.2 on ubuntu saucy

Linux zzbp 3.11.0-15-generic #25-Ubuntu SMP Thu Jan 30 17:22:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Consistently, on different localhost mounts of my one-brick test volume.  I first noticed this behavior happening between a fuse mount where the file was being deleted and a libgfapi client which was monitoring the directory.

I then tried reproducing the problem using two fuse clients and plain old bash commands and it happens every time.

Steps to Reproduce:

1. Make two fuse mounts of the same volume, /mnt/foo1 & /mnt/foo2, and open a shell in each one
2. In /mnt/foo1, create a test file
   echo test > test.file
3. In /mnt/foo2, run this command to monitor the file
   while true; do sleep 0.1; ls -la | grep test.file; stat -t test.file; done
4. Back in /mnt/foo1, delete the test file
   rm test.file
5. Check the output from the monitor in /mnt/foo2

Actual results:

Notice the output from stat changes 10 iterations after the output from ls changes, that's a full second later.

-rw-r--r-- 1 root  root     5 Feb 15 21:18 test.file
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
-rw-r--r-- 1 root  root     5 Feb 15 21:18 test.file
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
-rw-r--r-- 1 root  root     5 Feb 15 21:18 test.file
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
test.file 5 1 81a4 0 0 26 9449262257167296185 1 0 0 1392517108 1392517108 1392517108 0 131072
stat: cannot stat ‘test.file’: No such file or directory
stat: cannot stat ‘test.file’: No such file or directory
stat: cannot stat ‘test.file’: No such file or directory
stat: cannot stat ‘test.file’: No such file or directory



Expected results:

The file should disappear from ls & stat at the same time, not one second apart.

Additional info:

My test volume details:

Volume Name: foo
Type: Distribute
Volume ID: 4bb5d118-be31-4bdb-a645-9a045241cc37
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gluster:/var/tmp/foo
Options Reconfigured:
performance.md-cache-timeout: 0
performance.cache-refresh-timeout: 0
performance.flush-behind: off
diagnostics.brick-log-level: TRACE
diagnostics.client-log-level: DEBUG
server.allow-insecure: on
performance.quick-read: off
performance.write-behind: off

As mentioned above I tried disabling some performance xlators to get rid of this problem but nothing made a difference.


On a single client mount ls & stat are consistent immediately after deletion.
For example...
root@zzbp:/mnt/foo1# echo test > test.file; sleep 1; rm test.file; while true; do ls -la | grep test.file; stat -t test.file; sleep 0.1; done
stat: cannot stat ‘test.file’: No such file or directory
stat: cannot stat ‘test.file’: No such file or directory


Thank you very much,
-louis

Comment 1 Louis Zuckerman 2014-02-20 04:24:12 UTC
Vijay recommended on IRC that I try mounting the FUSE clients with "-o attribute-timeout=0" and that resolved the problem.

Thanks, Vijay!