Bug 1488120
| Summary: | Moving multiple temporary files to the same destination concurrently causes ESTALE error | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Raghavendra G <rgowdapp> | |
| Component: | distribute | Assignee: | Raghavendra G <rgowdapp> | |
| Status: | CLOSED ERRATA | QA Contact: | Prasad Desala <tdesala> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | rhgs-3.3 | CC: | amukherj, bugs, couture.danny, jgalvez, nbalacha, pkarampu, rbhat, rgowdapp, rhinduja, rhs-bugs, sheggodu, simon.turcotte-langevin, storage-qa-internal, tdesala | |
| Target Milestone: | --- | Keywords: | Triaged | |
| Target Release: | RHGS 3.4.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.12.2-10 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1378550 | |||
| : | 1543279 (view as bug list) | Environment: | ||
| Last Closed: | 2018-09-04 06:35:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1378550 | |||
| Bug Blocks: | 1425421, 1503134, 1543279, 1576291 | |||
|
Description
Raghavendra G
2017-09-04 11:23:28 UTC
The initial RCA of failing inodelks while necessary, is not sufficient to get the use-case working. Over the course of getting testcase working, I found following issues: 1 Entrylk has to be secured on hashed subvol as cached-subvol can change due to migration of file. Credits for identifying this during code review goes to Nithya. This part of the solution is done. 2 After locking, lookup done on dst should handle the error scenario of ESTALE, as gfid associated with path will change due to rename(s) that happened between lookup and the time lock is acquired. This part of the solution is done. 3 dht_lookup has to handle the scenario of gfid of linkto and data files differing during fresh lookup as it might be done in the middle of a rename. While I've a hacky way of resolving this (by making fuse-bridge retry even fresh lookup during ESTALE), I need to implement a proper fix (To make dht_lookup to retry the lookup under locks when it encounters this scenario). 4. linkfile creation during lookup has to be done under lock which synchronizes linkfile creation with any renames involving the file. This part of the solution is implemented. But, I am still thinking through whether this locking is actually required. IOW, I am not able to find the RCA which requires this solution. But, having this lock gets test working. 5. server_link has to handle the scenario of a stale dentry left in the inode table due to a racing lookup and rename involving the dentry. I've an hacky implementation which changes resolve-type to RESOLVE_MAY from RESOLVE_NOT in server_link. But correct solution would to be enhance RESOLVE_NOT implementation in server-resolver to check on backend for existance of file (and then fail if file exists on backend too) before failing the fop on finding an inode in itable. This part of the solution is still pending. With the above set of solutions, I am able to get the test working (with 4 clients simultaneously executing the above script and on client continuously doing lookup on the contents of directory in which renames are being done) for couple of hours. But after that I end up with rename failing and two dst data files in the volume. I am in the process of debugging this. (In reply to Raghavendra G from comment #6) > 4. linkfile creation during lookup has to be done under lock which > synchronizes linkfile creation with any renames involving the file. This > part of the solution is implemented. But, I am still thinking through > whether this locking is actually required. IOW, I am not able to find the > RCA which requires this solution. But, having this lock gets test working. The reason we need locks here is because a half done rename can result in multiple gfids for the same path (dst) (though this is transient which will get corrected once rename is complete - either successfully or a failure. The exception is client crashing in the middle of a rename). Gfid of cached file at the time of lookup (outside locks) can be different by the time linkfile is created. This results in a permanent condition of linkto file having a different gfid than data file. So, lookup before attempting linkto creation, * acquire entrylk on parent, so that renames are blocked. * check whether conditions for linkto creation are still valid - like data-file has the same gfid as the inode in glusterfs process, linkto file abset etc. If any of these checks fail, abandon linkto creation. > With the above set of solutions, I am able to get the test working (with 4 > clients simultaneously executing the above script and on client continuously > doing lookup on the contents of directory in which renames are being done) > for couple of hours. But after that I end up with rename failing and two > dst data files in the volume. I am in the process of debugging this. Previously I was not verifying conditions for creation of linkto are still valid _after_ acquiring entrylk. This resulted in lookup of dst failing with ESTALE and dst-cached getting set as NULL. Subsequent renames would result in more than one data file, with each having different gfids. Tests have been running successfully for the past hour and I am optimistic that they'll continue to run successfully. Tests were running overnight and no errors are seen. I also reverted fixes 3 and 5 as I had a doubt that they are not contributing to failures. So, final fix will have 1, 2 and 4 along with added checks in linkfile creation during lookup. *** Bug 1425421 has been marked as a duplicate of this bug. *** Do we have a patch posted in upstream against comment 8? (In reply to Atin Mukherjee from comment #10) > Do we have a patch posted in upstream against comment 8? https://review.gluster.org/#/c/19547/ Verified this BZ on glusterfs version: 3.12.2-14.el7rhgs.x86_64. From 8 clients, started moving multiple temp files to the same destination. It ran for almost 3 hrs without any issues and then rename started throwing EEXIST errors on the client mount points. Also, have seen issues while doing some fops on the destination file. Filed separate issues to track those issues, https://bugzilla.redhat.com/show_bug.cgi?id=1609210 https://bugzilla.redhat.com/show_bug.cgi?id=1609224 https://bugzilla.redhat.com/show_bug.cgi?id=1610258 Considering that ESTALE/ENOENT are no more seen while moving multiple temp files to the same destination. I am moving this BZ to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |