Bug 1376757
Summary: | Data corruption in write ordering of rebalance and application writes | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Karthik U S <ksubrahm> |
Component: | distribute | Assignee: | Karthik U S <ksubrahm> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | mainline | CC: | atumball, bugs, nbalacha, rgowdapp |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-30 10:21:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Karthik U S
2016-09-16 10:41:41 UTC
Currently rebalance process does: 1. read (src) 2. write (dst) To make sure that src and dst are identical, we need to make combined transaction of 1 and 2 atomic. Otherwise with parallel writes happening to same region during rebalance, writes on dst can go out of order (relative to src) and dst can be different from src which is basically a corruption. Consider Following sequence of events happening on overlapping/same region of a file: 1. rebalance process read a region (lets say which was written by an earlier write w1). 2. application does a write (w2) to same region. w2 completes on src and dst. 3. rebalance process proceeds to write on dst the data it read in 1. So, w1 is sent to dst. After the above steps, the order of w1 and w2 in src is (w1, w2) but on dst, it is (w2, w1). Hence w2 is lost on dst, resulting in corruption (as w2 was reported success to application). To make atomic, we need to: * lock (src) the region of file being read before 1 * unlock (src) the region of file being read after 2 and make sure that this lock blocks new writes from application (till an unlock is issued). Combine this with the approach that application writes are serially written to src first and then to dst and we have a solution. REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#1) for review on master by Karthik U S (ksubrahm) REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#2) for review on master by Karthik U S (ksubrahm) REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#3) for review on master by Karthik U S (ksubrahm) REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#4) for review on master by Karthik U S (ksubrahm) REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#5) for review on master by Karthik U S (ksubrahm) REVIEW: http://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#6) for review on master by Karthik U S (ksubrahm) REVIEW: https://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#7) for review on master by Karthik U S (ksubrahm) REVIEW: https://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#8) for review on master by Karthik U S (ksubrahm) REVIEW: https://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#9) for review on master by Karthik U S (ksubrahm) REVIEW: https://review.gluster.org/15698 (cluster/dht: Lack of atomicity b/w read-src and write-dst of rebalance process) posted (#10) for review on master by Karthik U S (ksubrahm) This issue is being tracked by https://github.com/gluster/glusterfs/issues/308. Since there is no active work going on this closing this for now. Feel free to reopen this or open a new bug when needed. |