Bug 1387219

Summary: SMB:[MD-CACHE]:While creating large no. of files and removing it from other client , the listing shows incorrect file permissions
Product: Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: md-cacheAssignee: Poornima G <pgurusid>
Status: CLOSED NOTABUG QA Contact: storage-qa-internal <storage-qa-internal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, nbalacha, pgurusid, pkarampu, rhinduja, rhs-bugs, rjoseph, sbhaloth
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-08 06:20:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description surabhi 2016-10-20 11:26:55 UTC
Description of problem:
*********************************
While creating 100000 of files from cifs mount and doing rm -rf from another client and listing from third client shows no such file or directory for few files and unknown permission for few files.

-rw-r--r--. 1 root root 15 Oct 19 09:56 file2510
-?????????? ? ?    ?     ?            ? file2511
-?????????? ? ?    ?     ?            ? file2512
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2513
-?????????? ? ?    ?     ?            ? file2514
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2515
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2516
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2517
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2518
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2519
-rw-r--r--. 1 root root 15 Oct 19 09:53 file252
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2520
-rw-r--r--. 1 root root 15 Oct 19 09:56 file2521

Version-Release glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
number of selected component (if applicable):


How reproducible:
twice

Steps to Reproduce:
1.As mentioned in description
2.Mount volume on cifs, create 100000 files, rm -rf from other client, ll from third client
3.

Actual results:
No such file or directory for few files (which may happen because the files are not yet removed ) but showing unknown file permissions as well.

Expected results:
Should not show unknown file permissions and no such file or directory.


Additional info:

Comment 2 Poornima G 2016-10-27 08:42:06 UTC
Could you please try disabling readdir-ahead and md-cache? Does that have any effect?

Comment 3 rjoseph 2016-11-07 12:56:46 UTC
This does not look like an md-cache issue. If unlink lands between read-dirp and the follow up lookup/stat call then we will end up in this scenario. Cross checked this with DHT team (Rafi). Also simulated this scenario using gdb in a non md-cache setup.

The same behavior is seen with Kernel NFS server and multiple NFS clients, where one client is running "rm -f" and the other client running "ls -l".

Comment 4 Atin Mukherjee 2016-11-07 15:57:54 UTC
Nithya - could you have a look at it as comment 3 claims that this could be a DHT issue?

Comment 5 Nithya Balachandran 2016-11-07 16:14:51 UTC
I don't think this is something new and expected with the readdirp/stat/unlink race. Was this seen in earlier releases?

Comment 6 Pranith Kumar K 2016-11-08 06:20:10 UTC
This is expected behavior. I could recreate the same on xfs also.

ls: cannot access 'd/6351': No such file or directory
ls: cannot access 'd/6357': No such file or directory
ls: cannot access 'd/6364': No such file or directory
ls: cannot access 'd/6366': No such file or directory
total 0
??????????? ? ?  ?  ?            ? 6229
??????????? ? ?  ?  ?            ? 6230
??????????? ? ?  ?  ?            ? 6231
??????????? ? ?  ?  ?            ? 6232
??????????? ? ?  ?  ?            ? 6233
??????????? ? ?  ?  ?            ? 6234


Have 3 terminals.
One one execute:
while true; do touch d/{1..10000}; done
on second execute:
while true; do ls -l d; done
on third execute:
while true; do rm -f d/*; done

You will see above.

As per now closing this as not a bug.

Comment 7 rjoseph 2016-11-08 06:33:48 UTC
(In reply to Atin Mukherjee from comment #4)
> Nithya - could you have a look at it as comment 3 claims that this could be
> a DHT issue?

I should have been more explicit about my statement. As pranith mentioned this is not a bug. That is why I mentioned the issue can be easily reproduced with Kernel NFS. Local filesystem like XFS would be little faster and that is why we may not always see the issue. But Pranith gave example of XFS as well.

Wanted to discuss this with QE before closing this bug and hence did not changed the bug status.

Comment 8 surabhi 2016-11-17 06:22:34 UTC
As the Bug is closed with sufficient data points, clearing the needinfo.