Backport of https://bugzilla.redhat.com/show_bug.cgi?id=1343038 fix to 3.7.
REVIEW: http://review.gluster.org/15165 (gfapi: fix I/O error on consecutive graph switches) posted (#1) for review on release-3.7 by Oleksandr Natalenko (oleksandr)
REVIEW: http://review.gluster.org/15167 (gfapi: Fix IO error caused when there is consecutive graph switches) posted (#3) for review on release-3.7 by Oleksandr Natalenko (oleksandr)
REVIEW: http://review.gluster.org/14835 (gfapi: Fix IO error caused when there is consecutive graph switches) posted (#12) for review on release-3.7 by Oleksandr Natalenko (oleksandr)
COMMIT: http://review.gluster.org/14835 committed in release-3.7 by Kaushal M (kaushal) ------ commit 9cd5066226770cf3c06a21757b963d315b8fe32b Author: Poornima G <pgurusid> Date: Mon Jun 6 06:29:40 2016 -0400 gfapi: Fix IO error caused when there is consecutive graph switches Issue: Consider a simple situation, where glfs_init() is done, i.e. initial graph is up. Now perform 2 volume sets that results in 2 client side graph changes. After this perform some IO, the IO fails with ENOTCON. The only way to recover this client is i guess another graph switch or restart. What actually is happening from code perspective: Initial graph lets say A, followed by 2 consecutive graph switches to B and C without any IO those two switches. - graph_setup (A) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = A - glfs_init() results in fs->active_subvol = A, fs->next_subvol = NULL - graph_setup (B) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = B - graph_setup (C) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = C. It also sees that the previous graph B was never set as fs->active_subvol, i.e. no IO or anything happened on B, so can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)). This parent down on B, results in child_down(B), which is fine. But child_down also triggers graph_setup(B). - graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and fs->next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained above. This again leads to GF_EVENT_CHILD_DOWN on C. - graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and fs->next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained above. Thus both the graphs B and C are disconnected, and hence the ENOTCON Solution: Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN. It don't see any reason why graph_setup should be called when there is child_down. Not sure what the original reason was, to have graph_setup in child_down. git hostory shows the first patch itself had this call. > Reviewed-on: http://review.gluster.org/14656 > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Jeff Darcy <jdarcy> BUG: 1367294 Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b Signed-off-by: Poornima G <pgurusid> Signed-off-by: Oleksandr Natalenko <oleksandr> Reviewed-on: http://review.gluster.org/14835 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Kaushal M <kaushal>
COMMIT: http://review.gluster.org/15167 committed in release-3.7 by Kaushal M (kaushal) ------ commit 9df752213e6f8a1cc9a5e875cf68ca8ef32f61db Author: Poornima G <pgurusid> Date: Tue Jul 19 15:20:09 2016 +0530 gfapi: Fix IO error caused when there is consecutive graph switches Backport of http://review.gluster.org/#/c/14722/ This is part 2 of the fix, the part 1 can be found at: http://review.gluster.org/#/c/14656/ Problem: ======= Consider a race between, __glfs_active_subvol() and graph_setup(). Lets say @TIME T1: fs->active_subvol = A fs->next_subvol = B __glfs_active_subvol() //under lock fs->mutex { .... new_subvol = fs->next_subvol //which is B .... //Start migration from A to B __glfs_first_lookup(){ .... unlock fs->mutex //@TIME T2 network fop lock fs->mutex .... } .... //migration continue on B fs->active_subvol = fs->next_subvol //which is C (explained below) .... } @Time T2, lets say in another thread, graph_setup() is called with C, note that at T2, fs->mutex is unlocked. graph_stup(C...) { lock fs->mutex .... if (fs->next_subvol) // which is B destroy subvol (fs->next_subvol) .... fs->next_subvol = C .... unlock fs->mutex } Thus at the end of this, fs->old_subvol = A; fs->active_subvol = C; fs->next_subvol = NULL; which is wrong, as B completed migration, but was destroyed by graph_setup, and C never was migrated. Solution: ========= Any new graph can be in one of the 2 states: - Picked for migration, migration in progress (fs->mip_subvol) - Not picked so far for migration (fs->next_subvol) graph_setup() updates fs->next_subvol only, __glfs_active_subvol() moves fs->next_subvol to fs->mip_subvol and fs->next_subvol = NULL atomically, and then once the migration is complete, make that the fs->active_subvol > Reviewed-on: http://review.gluster.org/14722 > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Raghavendra Talur <rtalur> > Reviewed-by: Rajesh Joseph <rjoseph> > Reviewed-by: Niels de Vos <ndevos> BUG: 1367294 Change-Id: Ib6ff0565105c5eedb912a43da4017cd413243612 Signed-off-by: Poornima G <pgurusid> Signed-off-by: Oleksandr Natalenko <oleksandr> Reviewed-on: http://review.gluster.org/15167 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Kaushal M <kaushal>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.15, please open a new bug report. glusterfs-3.7.15 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-September/050714.html [2] https://www.gluster.org/pipermail/gluster-users/