Bug 761880 (GLUSTER-148)

Summary: replicate: Returns st_dev from different subvols resulting in ESTALE thru unfs3booster
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: libglusterfsclientAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vikas Gorur 2009-07-16 11:28:26 UTC
All replicate FOPs that return stat buf "remember" the inode number that was sent to them (getting it from {loc,fd}->inode->ino) and return the same.

To keep the dev number consistent, we would need to add a "dev" field to inode_t.

Comment 1 Shehjar Tikoo 2009-07-16 13:46:59 UTC
It is possible that AFR returns a different device number for an export between two different instances of unfs3booster. For eg, at time T we start first instance which is mounted by a client, at T+5 we kill the first instance, at T+7 another instance of unfs3booster is started with same configurables, then at T+8 if a client does ls, it'll receive stale file handle errors since it is possible the device numbers in the file handles with client are not the same as the dev number returned by AFR for this second instance.
    
I have a patch that forces unfs3 to generate an artificial fsid. This fsid is generated using the export path which is expected to stay constant. This approach is a short-term workaround and will only work for linux clients because I have a feeling they only use the file handle returned by LOOKUP, and not other calls like readdirplus. There are many other corner cases in unfs3 where the comparison is performed between the device number of an export point and the device number of a file on that export point. Such corner checks are bound to fail one day. The alternative is to remove these checks unfs3 under the assumption that the device id is generated by a hard-coded algo. I am not keen on doing so because I do not completely understand unfs3 to remove such checks.

I think the ideal place to fix it is replicate which should return the device nos from the first subvolume like it does for inos.

Comment 2 Shehjar Tikoo 2009-07-17 07:07:49 UTC
I spoke to Avati about this. He's of the opinion that the fix for this lies in libglusterfsclient where every mount gets assigned a device/fsid generated there.

I quite agree. We've seen how dirty it is to fix this in unfs3 and adding a dev no to inode_t doesnt sound like its worth it.

Comment 3 Shehjar Tikoo 2009-07-18 09:32:43 UTC
Proposed fixes:
release-2.0: http://patches.gluster.com/patch/782/
mainline: http://patches.gluster.com/patch/783/

Comment 4 Anand Avati 2009-07-20 18:29:33 UTC
PATCH: http://patches.gluster.com/patch/786 in master (libglusterfsclient: Fake a fsid for every VMP)

Comment 5 Anand Avati 2009-07-20 18:29:44 UTC
PATCH: http://patches.gluster.com/patch/787 in release-2.0 (libglusterfsclient: Fake a fsid for every VMP)