Bug 1125958 - DHT + NFS :- if Directory is created when sub-volume was down then unable to access Directory and its data when sub-volume is up again
Summary: DHT + NFS :- if Directory is created when sub-volume was down then unable t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Susant Kumar Palai
QA Contact: amainkar
URL:
Whiteboard:
Depends On: 1121099 1125824 1138393 1139997 1140338
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-01 12:57 UTC by Rachana Patel
Modified: 2015-05-13 16:54 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.6.0.28-1
Doc Type: Bug Fix
Doc Text:
Cause: The directory is missing on some bricks as it was created when some bricks were down. If a caller bypasses lookup and calls access due to saved/cached inode information (like NFS server does) then, dht_access fails the op in case ENOENT is found. Fix: In case the directory is not found in one sub-volume, then fetch information from the next sub-volume.
Clone Of:
Environment:
Last Closed: 2014-09-22 19:45:15 UTC
Embargoed:


Attachments (Terms of Use)
Brief test case (1.26 KB, application/x-perl)
2014-08-13 17:32 UTC, Shyamsundar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Rachana Patel 2014-08-01 12:57:27 UTC
Description of problem:
======================
Bring sub-volume down and create Directories and files inside it.
Once sub-volume is up again; access of that Directory fails with ' No such file or directory' error

[root@OVM3 new]# ls
abc  down  new
[root@OVM3 new]# cd down
-bash: cd: down: No such file or directory
[root@OVM3 new]# ls down
ls: cannot open directory down: No such file or directory


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.25-1.el6rhs.x86_64


How reproducible:
=================
always


Steps to Reproduce:
===================
1. cre[root@OVM3 ~]# gluster volume status new
Status of volume: new
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.35.198:/brick3/n1				49183	Y	17907
Brick 10.70.35.198:/brick3/n2				49184	Y	17918
Brick 10.70.35.198:/brick3/n3				49185	Y	17929
Brick 10.70.35.198:/brick3/n4				49186	Y	17940
NFS Server on localhost					2049	Y	17953
NFS Server on 10.70.35.172				2049	Y	11804
NFS Server on 10.70.35.240				2049	Y	16587
 
Task Status of Volume new
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@OVM3 ~]# kill -9 17907
ate distributed volume and bring one or more sub-volumes down.


2. create Directories and files inside it

[root@OVM3 new]# mkdir down
[root@OVM3 new]# touch down/f{1..100}

3. bring all sub-volumes up

4. access Directory from mount point
[root@OVM3 new]# cd down
-bash: cd: down: No such file or directory
[root@OVM3 new]# ls down
ls: cannot open directory down: No such file or directory


Actual results:
===============
Directory and data inside it not  accessible.
lookup is not healing Directory on previously sown subvolume


Expected results:
=================
Directory and data should be accessible when all sub-volumes are up.
look up should heal Directory on previously down sub-volume



Additional info:
=================
[2014-08-01 10:27:21.738422] W [client-rpc-fops.c:1357:client3_3_access_cbk] 0-new-client-0: remote operation failed: Stale file handle
[2014-08-01 10:27:21.738565] W [client-rpc-fops.c:1357:client3_3_access_cbk] 0-new-client-0: remote operation failed: Stale file handle
[2014-08-01 10:27:21.740668] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-new-client-0: remote operation failed: No such file or directory. Path: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> (97616c4c-2aea-473e-8fae-3a53576439e3)
[2014-08-01 10:27:21.740836] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-new-client-0: remote operation failed: No such file or directory. Path: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> (97616c4c-2aea-473e-8fae-3a53576439e3)
[2014-08-01 10:27:21.740884] E [dht-helper.c:813:dht_migration_complete_check_task] 0-new-dht: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3>: failed to lookup the file on new-client-0
[2014-08-01 10:27:21.740950] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 198c87cc: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> => -1 (No such file or directory)
[2014-08-01 10:27:21.740974] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 198c87cc, ACCESS: NFS: 2(No such file or directory), POSIX: 2(No such file or directory)
[2014-08-01 10:27:21.741160] E [dht-helper.c:813:dht_migration_complete_check_task] 0-new-dht: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3>: failed to lookup the file on new-client-0
[2014-08-01 10:27:21.741191] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 188c87cc: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> => -1 (No such file or directory)
[2014-08-01 10:27:21.741230] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 188c87cc, ACCESS: NFS: 2(No such file or directory), POSIX: 2(No such file or directory)

Comment 3 Shyamsundar 2014-08-13 17:32:06 UTC
Created attachment 926543 [details]
Brief test case

Tested this using the attached test script post the fix presented here was applied to upstream code, http://review.gluster.org/#/c/8462/

The test case passed. A lot of the TC is commented out as the kill was not working properly, so did manual steps post the point where things were commented out.

Susant, can we try this test case before and after the dht_access fix, so that we know we have fixed the regression?

Comment 5 Susant Kumar Palai 2014-08-25 11:30:57 UTC
Shyam,
  Here is update on the patch.

 Tried without patch and here is the result:
 [root@vm50 mnt1]# kill -9 5881
[root@vm50 mnt1]# mkdir down
[root@vm50 mnt1]# ls
down
[root@vm50 mnt1]# touch down/f{1..100}
[root@vm50 mnt1]# gluster v start test1 force
volume start: test1: success
[root@vm50 mnt1]# cd down
[root@vm50 down]# ls
ls: cannot open directory .: No such file or directory
[root@vm50 down]# ls
ls: cannot open directory .: No such file or directory
[root@vm50 down]# ls
ls: cannot open directory .: No such file or directory


And with the patch:
 [root@vm50 mnt1]# kill -9 10866
[root@vm50 mnt1]# mkdir down
[root@vm50 mnt1]# touch down/f{1..100}
[root@vm50 mnt1]# ls
down
[root@vm50 mnt1]# cd down/^C
[root@vm50 mnt1]# gluster v start test1 force
volume start: test1: success
[root@vm50 mnt1]# cd down/
[root@vm50 down]# ls
f1    f12  f16  f2   f23  f27  f30  f34  f38  f41  f45  f49  f52  f56  f6   f63  f67  f70  f74  f78  f81  f85  f89  f92  f96
f10   f13  f17  f20  f24  f28  f31  f35  f39  f42  f46  f5   f53  f57  f60  f64  f68  f71  f75  f79  f82  f86  f9   f93  f97
f100  f14  f18  f21  f25  f29  f32  f36  f4   f43  f47  f50  f54  f58  f61  f65  f69  f72  f76  f8   f83  f87  f90  f94  f98
f11   f15  f19  f22  f26  f3   f33  f37  f40  f44  f48  f51  f55  f59  f62  f66  f7   f73  f77  f80  f84  f88  f91  f95  f99
[root@vm50 down]# 


So everything looks good :)

Comment 6 Rachana Patel 2014-09-19 10:28:21 UTC
verified with 3.6.0.28-1.el6rhs.x86_64 , working as expected hence moving to verified

Comment 8 errata-xmlrpc 2014-09-22 19:45:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.