Bug 1471031
Summary: | dht_(f)xattrop does not implement migration checks | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, kdhananj, rgowdapp, stefano.stagnaro | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-4.0.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1498081 1515434 1530146 1540224 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-15 11:17:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1498081, 1515434, 1530146, 1540224 |
Description
Nithya Balachandran
2017-07-14 09:45:06 UTC
REVIEW: https://review.gluster.org/17776 (cluster/dht: Add migration checks to dht_(f)xattrop) posted (#1) for review on master by N Balachandran (nbalacha) REVIEW: https://review.gluster.org/17776 (cluster/dht: Add migration checks to dht_(f)xattrop) posted (#2) for review on master by N Balachandran (nbalacha) REVIEW: https://review.gluster.org/17776 (cluster/dht: Add migration checks to dht_(f)xattrop) posted (#3) for review on master by N Balachandran (nbalacha) REVIEW: https://review.gluster.org/17776 (cluster/dht: Add migration checks to dht_(f)xattrop) posted (#4) for review on master by N Balachandran (nbalacha) We may run into issues with dht_migrate_file and xattrops sent by the shard xlator where ,as the xattrop is an ADD operation, so this the wrong value could end up being set. file = data file file' = linkto file Time Source Target ----------------------------------------------------------------------- t0 file file' (shard.size = x1) t1 file (listxattr) file' (shard.size = x1) t2 file file' (setxattr) shard.size=x1 shard.size=x1 t3 file (xattrop (ADD)) file' shard.size = x2 (shard.size = x1) [operation not performed on the target as this is not marked PHASE1 yet] Now, a write + xattrop is performed after the S+T bits have been set. t4 file (S+T) (xattrop(ADD)) file' shard.size = x3 (shard.size = x3') The operation is now performed on the target as well but the value will be different as it is an ADD performed in a different initial value. Convert target to data file t5 file (S+T) file shard.size = x3 (shard.size = x3') A write + xattrop is now performed but is sent only to the target file (as it is now a data file on the hashed subvol) t6 file (S +T) file xattrop(ADD)) shard.size = x3 shard.size = x4 Copy xattrs again from src to target t7 file (S + T ) listxattr file (setxattr shard.size = x3 shard.size=x3 Convert source to linkto t8 file' file shard.size= x3 (shard.size = x3) The listxattr+setxattr at t7 would fix the race at t3. However, it introduces its own race and hence possible file corruption. A possible solution would be to: 1. Set the S+T bits before performing the initial listxattr + setxattr. This does not fix the race where xattrops may reach the dst out of order. 2. Set only the Posix ACLS in the second setxattr. Initially I thought the f/xattrops need only check for PHASE2 and that the xattrs could be copied across from the second listxattr+setxattr values. However, that leaves a window where the values could go out of sync. A write could hit the src before the dst linkto as been converted to a data file. After the dst has been converted to a data file but before the listxattr+setxattr, a lookup from somewhere could update the inode ctx to point to the new hashed subvol (dst) and the xattrop would be sent only on the dst. However, if phase1 checks were not implemented, this add would be on the wrong value and the listxattr+setxattr would then overwrite it with the wrong value. @Krutika, is the above understanding of how xattrops work with shard correct? (In reply to Nithya Balachandran from comment #5) > We may run into issues with dht_migrate_file and xattrops sent by the shard > xlator where ,as the xattrop is an ADD operation, so this the wrong value > could end up being set. > > file = data file > file' = linkto file > > Time Source Target > ----------------------------------------------------------------------- > > t0 file file' > (shard.size = x1) > > t1 file (listxattr) file' > (shard.size = x1) > > t2 file file' (setxattr) > shard.size=x1 shard.size=x1 > > t3 file (xattrop (ADD)) file' > shard.size = x2 (shard.size = x1) > [operation not performed on the target as this is not marked PHASE1 yet] > > Now, a write + xattrop is performed after the S+T bits have been set. > > t4 file (S+T) (xattrop(ADD)) file' > shard.size = x3 (shard.size = x3') > The operation is now performed on the target as well but the value will be > different as it is an ADD performed in a different initial value. > > > Convert target to data file > t5 file (S+T) file > shard.size = x3 (shard.size = x3') > > A write + xattrop is now performed but is sent only to the target file (as > it is now a data file on the hashed subvol) > > t6 file (S +T) file xattrop(ADD)) > shard.size = x3 shard.size = x4 > > Copy xattrs again from src to target > > t7 file (S + T ) listxattr file (setxattr > shard.size = x3 shard.size=x3 > > > Convert source to linkto > > t8 file' file > shard.size= x3 (shard.size = x3) > > > > The listxattr+setxattr at t7 would fix the race at t3. However, it > introduces its own race and hence possible file corruption. > > > A possible solution would be to: > > 1. Set the S+T bits before performing the initial listxattr + setxattr. This > does not fix the race where xattrops may reach the dst out of order. That's correct. This can still lead to a race where setxattr and xattrop can reach dst out-of-order. This is very much similar to read from rebalance racing with writes from a client discussed in [1][2]. Solution discussed in [2], when extended to setxattr and xattrop can solve this race. [1] https://github.com/gluster/glusterfs/issues/308 [2] https://github.com/gluster/glusterfs/issues/347 COMMIT: https://review.gluster.org/17776 committed in master by \"N Balachandran\" <nbalacha> with a commit message- cluster/dht: Add migration checks to dht_(f)xattrop The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1471031 Signed-off-by: N Balachandran <nbalacha> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/ The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |