Bug 1030309
Summary: | DHT + NFS mount : trusted.glusterfs.dht xattr was not created for directory on any sub-volume and file creation inside that directory failed with error 'No such file or directory' | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED DEFERRED | QA Contact: | Matt Zywusko <mzywusko> | |
Severity: | high | Docs Contact: | ||
Priority: | urgent | |||
Version: | 2.1 | CC: | mzywusko, nsathyan, rgowdapp, sanandpa, smohan, spalai, vbellur | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.6.0.19-1 | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
------
During directory creations attempted by geo-replication, though an mkdir fails with EEXIST, the directory might not have a complete layout for sometime. This can happen if there is a parallel mkdir attempt on the same name. Till the other mkdir completes, layout is not set on the directory. Without a layout, entry creations within that directory can fail.
Consequence:
------------
A new directory creation fails with "Directory exists". However, creations within that directory can fail.
Fix:
-----
Set the layout on those subvolumes where directory is already created by the parallel mkdir before failing the current mkdir with EEXIST.
Result:
--------
This is not a complete fix as the other mkdir might not have created directories on all subvolumes. However, on those subvolumes, where directory is already created, the layout is set. Any files/directory names which hash to these subvolumes on which layout is set, can be c
an be created successfully.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1088231 1286208 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-27 12:30:32 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1087818, 1088231, 1102550, 1126358, 1286208, 1286584 |
Description
Rachana Patel
2013-11-14 10:35:52 UTC
exact steps to reproduce are known so updating the bug for the same Steps to Reproduce: =================== 1. create Distributed volume and mount it on multiple client(NFS) 2. from one client start creating Directories. 3. From another mount point create files inside those Directory and make sure File creation request is send when parent Directory is created on all/more than one sub-volume but before hash layout is assigned(i.e. trusted.glusterfs.dht) (it's race condition so can be achieved by putting creation in loop or add breakpoint before it creates layout and send creation from another mount point) 4. file creation would fail with error as layout is not present verified with 3.6.0.19-1.el6rhs.x86_64 there is some race condition but unable to figure out. Tried 5-7 times and got this error only once. Steps followed :- 1. create Distributed volume and mount it on multiple client(NFS & FUSE) 2. from one client - FUSE start creating Directory mkdir dir2 3. From another mount point - NFS mount, create files inside that Directory and make sure File creation request is send when parent Directory is created on all/more than one sub-volume but before hash layout is assigned(i.e. trusted.glusterfs.dht) [root@OVM3 nfs]# touch dir2/f1 touch: cannot touch `dir2/f1': Stale file handle [root@OVM3 nfs]# touch dir2/a touch: cannot touch `dir2/a': No such file or directory Hence moving bug back to ASSIGNED log snippet:- [2014-06-27 07:41:06.643011] W [nfs3-helpers.c:3470:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 285db38b, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2014-06-27 07:41:06.644619] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 3551819610 [2014-06-27 07:41:06.645413] W [nfs3.c:1230:nfs3svc_lookup_cbk] 0-nfs: 2b5db38b: <gfid:c0a48017-ec23-4c93-b6bd-311a8a814ae8>/f1 => -1 (Stale file handle) [2014-06-27 07:41:06.645450] W [nfs3-helpers.c:3470:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 2b5db38b, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2014-06-27 07:41:20.424140] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 974644454 [2014-06-27 07:41:20.425465] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 974644454 [2014-06-27 07:41:20.425747] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/cluster/distribute.so(dht_create+0x393) [0x7fc53757e203] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/debug/io-stats.so(io_stats_create_cbk+0x27d) [0x7fc53713e9ed] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7fc536eee3c9]))) 0-fd: fd is NULL [2014-06-27 07:41:20.425769] W [nfs3.c:2370:nfs3svc_create_cbk] 0-nfs: 2f5db38b: <gfid:c0a48017-ec23-4c93-b6bd-311a8a814ae8>/a => -1 (No such file or directory) Rachana, I need answers to following questions: 1. Is it guaranteed from the perspective of application that creation of directory dir2 was successful, before you start creating files within it? This guarantee can be obtained either by: * on fuse mount mkdir completes successfully and you start creating files on nfs mount _after_ you get that confirmation. * on nfs mount, you attempt creating directory dir2 and this directory creation either succeeds or fails with EEXIST and then you start creating files within dir2. If only we guarantee that directory creation of dir2 was successful from application perspective, we can treat this as bug. There are many internal states within dht where you can run into this issue, but the question is whether these internal states are hit when the application is trying to do any legal operations (like the ones explained above). regards, Raghavendra. Cloning this to 3.1. To be fixed in future release. |