Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1400092 - Increasing replica count while I/O is in progress can lead to replica inconsistency
Increasing replica count while I/O is in progress can lead to replica inconsi...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.2
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: hari gowtham
Bala Konda Reddy M
: ZStream
Depends On:
Blocks: 1632148 1351530
  Show dependency treegraph
 
Reported: 2016-11-30 07:49 EST by Ravishankar N
Modified: 2018-10-31 05:48 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Performing add-brick to increase replica count while I/O is going on can lead to data loss. Workaround: Ensure that increasing replica count is done offline, i.e. without clients accessing the volume.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ravishankar N 2016-11-30 07:49:00 EST
Steps:
1. created a 2x2 volume (bricks b1 to b4) using a 2 node cluster, fuse mounted it on a client.
2. brought down one b1.
3. Started small file creation on the mount
4. performed add-brick to convert it to arbiter i.e. convert it from 2x2 to 2x(2+1) using a 3rd node for the newly added bricks. Let the bricks be b5 and b6.
5. `volume start force` to bring up b1.
6. I/O was still going on.
7. After I/O and self-heal completed, it was found that a few files were missing on the newly added brick b5 (but present in the other bricks of the replica i.e. b1 and b2). heal-info showed zero entries.

Problem:
When add-brick was performed, the shd got the updated volfile first and it did a conservative merge (as expected) and reset the pending xattrs for entry-heal.

The fuse mount was still operating on the old graph (with replica 2) and hence the creates did not happen on b5, until the fuse mount also got the new graph after which the creates went to all bricks.


This is a gluster infra problem but is serious when replicate comes into the picture:

- If it were a plain distribute vol, the effect of fuse client doing I/O on the old graph is that the the files may get hashed based on the old layout.

- When replication is involved, this can lead to data loss:
In the above example the files were present in b1 and b2 and not b5. If for some reason, *later on*, an I/O happens which makes b5 as the source for entry heal, then it will delete the files from b1 and b2.


We need to document this as a known issue. ie. Doing an add-brick to increase the replica count should only be done offline, i.e. when no I/O is going on.
Comment 6 Bhavana 2017-03-13 11:33:54 EDT
Edited the doc text slightly for the release notes.
Comment 11 Anand Paladugu 2018-07-03 14:29:14 EDT
Atin:  Any inputs that you can provide w.r.t to this issue.  It's pretty old, but a customer enquired about this as it's preventing them from increasing bricks without shutting down the production environment ...

Note You need to log in before you can comment on or make changes to this bug.