Bug 1400092 - Increasing replica count while I/O is in progress can lead to replica inconsistency
Summary: Increasing replica count while I/O is in progress can lead to replica inconsi...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: hari gowtham
QA Contact: Bala Konda Reddy M
Depends On:
Blocks: 1351530 1632148
TreeView+ depends on / blocked
Reported: 2016-11-30 12:49 UTC by Ravishankar N
Modified: 2021-12-10 14:48 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Performing add-brick to increase replica count while I/O is going on can lead to data loss. Workaround: Ensure that increasing replica count is done offline, i.e. without clients accessing the volume.
Clone Of:
Last Closed: 2018-11-20 10:08:54 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Ravishankar N 2016-11-30 12:49:00 UTC
1. created a 2x2 volume (bricks b1 to b4) using a 2 node cluster, fuse mounted it on a client.
2. brought down one b1.
3. Started small file creation on the mount
4. performed add-brick to convert it to arbiter i.e. convert it from 2x2 to 2x(2+1) using a 3rd node for the newly added bricks. Let the bricks be b5 and b6.
5. `volume start force` to bring up b1.
6. I/O was still going on.
7. After I/O and self-heal completed, it was found that a few files were missing on the newly added brick b5 (but present in the other bricks of the replica i.e. b1 and b2). heal-info showed zero entries.

When add-brick was performed, the shd got the updated volfile first and it did a conservative merge (as expected) and reset the pending xattrs for entry-heal.

The fuse mount was still operating on the old graph (with replica 2) and hence the creates did not happen on b5, until the fuse mount also got the new graph after which the creates went to all bricks.

This is a gluster infra problem but is serious when replicate comes into the picture:

- If it were a plain distribute vol, the effect of fuse client doing I/O on the old graph is that the the files may get hashed based on the old layout.

- When replication is involved, this can lead to data loss:
In the above example the files were present in b1 and b2 and not b5. If for some reason, *later on*, an I/O happens which makes b5 as the source for entry heal, then it will delete the files from b1 and b2.

We need to document this as a known issue. ie. Doing an add-brick to increase the replica count should only be done offline, i.e. when no I/O is going on.

Comment 6 Bhavana 2017-03-13 15:33:54 UTC
Edited the doc text slightly for the release notes.

Comment 11 Anand Paladugu 2018-07-03 18:29:14 UTC
Atin:  Any inputs that you can provide w.r.t to this issue.  It's pretty old, but a customer enquired about this as it's preventing them from increasing bricks without shutting down the production environment ...

Note You need to log in before you can comment on or make changes to this bug.