--- Additional comment from Karthik U S on 2017-08-16 07:55:21 EDT --- Description of problem: When data bricks in arbiter volume are brought down in a cyclic manner i see that arbiter brick becomes the source for heal which should not happen as this brick just contains meta data. Version-Release number of selected component (if applicable): mainline How reproducible: Hit it once Steps to Reproduce: 1. Install HC stack on arbiter volumes 2. start doing I/O on the vms 3. While IO is going on bring down one of the brick and after some time bring up the brick and bring down another data brick 4.After some time Bring up the down brick and i observed few VM's are getting paused and arbiter brick becomes the source for other two bricks. Actual results: Vms are getting paused and i see that arbiter brick becomes source for the other two bricks. Expected results: Arbiter brick should not become source for other two bricks as it does not hold any data. --- Additional comment from Worker Ant on 2017-08-16 08:24:13 EDT --- REVIEW: https://review.gluster.org/18049 (cluster/afr: Fix for arbiter becoming source) posted (#1) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-11-17 19:38:54 EST --- COMMIT: https://review.gluster.org/18049 committed in master by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Fix for arbiter becoming source Problem: When eager-lock is on, and two writes happen in parallel on a FD we were observing the following behaviour: - First write fails on one data brick - Since the post-op is not yet happened, the inode refresh will get both the data bricks as readable and set it in the inode context - In flight split brain check see both the data bricks as readable and allows the second write - Second write fails on the other data brick - Now the post-op happens and marks both the data bricks as bad and arbiter will become source for healing Fix: Adding one more variable called write_suvol in inode context and it will have the in memory representation of the writable subvols. Inode refresh will not update this value and its lifetime is pre-op through unlock in the afr transaction. Initially the pre-op will set this value same as read_subvol in inode context and then in the in flight split brain check we will use this value instead of read_subvol. After all the checks we will update the value of this and set the read_subvol same as this to avoid having incorrect value in that. Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3 BUG: 1482064 Signed-off-by: karthik-us <ksubrahm> --- Additional comment from Worker Ant on 2017-12-18 07:52:17 EST --- REVIEW: https://review.gluster.org/19045 (cluster/afr: Fixing the flaws in arbiter becoming source patch) posted (#1) for review on master by Karthik U S --- Additional comment from Worker Ant on 2018-01-12 21:56:15 EST --- COMMIT: https://review.gluster.org/19045 committed in master by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Fixing the flaws in arbiter becoming source patch Problem: Setting the write_subvol value to read_subvol in case of metadata transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03) might lead to the original problem of arbiter becoming source. Scenario: 1) All bricks are up and good 2) 2 writes w1 and w2 are in progress in parallel 3) ctx->read_subvol is good for all the subvolumes 4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on the disk 5) read/lookup comes on the same file and refreshes read_subvols back to all good 6) metadata transaction happens which makes ctx->write_subvol to be assigned with ctx->read_subvol which is all good 7) w2 succeeds on brick1 and fails on brick0 and this will update the brick in reverse order leading to arbiter becoming source Fix: Instead of setting the ctx->write_subvol to ctx->read_subvol in the pre-op statge, if there is a metadata transaction, check in the function __afr_set_in_flight_sb_status() if it is a data/metadata transaction. Use the value of ctx->write_subvol if it is a data transactions and ctx->read_subvol value for other transactions. With this patch we assign the value of ctx->write_subvol in the afr_transaction_perform_fop() with the on disk value, instead of assigning it in the afr_changelog_pre_op() with the in memory value. Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4 BUG: 1482064 Signed-off-by: karthik-us <ksubrahm> --- Additional comment from Shyamsundar on 2018-03-15 07:17:56 EDT --- This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/ --- Additional comment from on 2018-04-10 07:50:25 EDT --- Can this fix please be included in the 3.12 LTM release? Thank you very much.
REVIEW: https://review.gluster.org/19847 (cluster/afr: Fix for arbiter becoming source) posted (#1) for review on release-3.12 by Ravishankar N
REVIEW: https://review.gluster.org/19848 (cluster/afr: Fixing the flaws in arbiter becoming source patch) posted (#1) for review on release-3.12 by Ravishankar N
COMMIT: https://review.gluster.org/19847 committed in release-3.12 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/afr: Fix for arbiter becoming source Backport of https://review.gluster.org/#/c/18049/ Problem: When eager-lock is on, and two writes happen in parallel on a FD we were observing the following behaviour: - First write fails on one data brick - Since the post-op is not yet happened, the inode refresh will get both the data bricks as readable and set it in the inode context - In flight split brain check see both the data bricks as readable and allows the second write - Second write fails on the other data brick - Now the post-op happens and marks both the data bricks as bad and arbiter will become source for healing Fix: Adding one more variable called write_suvol in inode context and it will have the in memory representation of the writable subvols. Inode refresh will not update this value and its lifetime is pre-op through unlock in the afr transaction. Initially the pre-op will set this value same as read_subvol in inode context and then in the in flight split brain check we will use this value instead of read_subvol. After all the checks we will update the value of this and set the read_subvol same as this to avoid having incorrect value in that. Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3 BUG: 1566131 Signed-off-by: karthik-us <ksubrahm>
COMMIT: https://review.gluster.org/19848 committed in release-3.12 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/afr: Fixing the flaws in arbiter becoming source patch Backport of https://review.gluster.org/19045 Problem: Setting the write_subvol value to read_subvol in case of metadata transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03) might lead to the original problem of arbiter becoming source. Scenario: 1) All bricks are up and good 2) 2 writes w1 and w2 are in progress in parallel 3) ctx->read_subvol is good for all the subvolumes 4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on the disk 5) read/lookup comes on the same file and refreshes read_subvols back to all good 6) metadata transaction happens which makes ctx->write_subvol to be assigned with ctx->read_subvol which is all good 7) w2 succeeds on brick1 and fails on brick0 and this will update the brick in reverse order leading to arbiter becoming source Fix: Instead of setting the ctx->write_subvol to ctx->read_subvol in the pre-op statge, if there is a metadata transaction, check in the function __afr_set_in_flight_sb_status() if it is a data/metadata transaction. Use the value of ctx->write_subvol if it is a data transactions and ctx->read_subvol value for other transactions. With this patch we assign the value of ctx->write_subvol in the afr_transaction_perform_fop() with the on disk value, instead of assigning it in the afr_changelog_pre_op() with the in memory value. Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4 BUG: 1566131 Signed-off-by: karthik-us <ksubrahm> Signed-off-by: Ravishankar N <ravishankar>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.9, please open a new bug report. glusterfs-3.12.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-April/000096.html [2] https://www.gluster.org/pipermail/gluster-users/