948686 – Glusterd updates list of volumes and peers in a race'y manner.

Bug 948686 - Glusterd updates list of volumes and peers in a race'y manner.

Summary: Glusterd updates list of volumes and peers in a race'y manner.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	krishnan parthasarathi
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	924296 (view as bug list)
Depends On:
Blocks:	918917 959907
TreeView+	depends on / blocked

Reported:	2013-04-05 06:12 UTC by krishnan parthasarathi
Modified:	2015-11-03 23:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.4.0
Clone Of:
Clones:	959907 (view as bug list)
Environment:
Last Closed:	2013-07-24 17:35:32 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
This test script needs to be run in the regression tests framework (ref: http://www.gluster.org/community/documentation/index.php/Development_Work_Flow#Test_cases) (966 bytes, text/troff) 2013-04-05 06:12 UTC, krishnan parthasarathi	no flags	Details
View All

Description krishnan parthasarathi 2013-04-05 06:12:51 UTC

Created attachment 731831 [details]
This test script needs to be run in the regression tests framework (ref: http://www.gluster.org/community/documentation/index.php/Development_Work_Flow#Test_cases)

Description of problem:

Glusterd could end up incorrectly updating its in-memory knowledge about the cluster and/or volumes, when certain kind of network/node failures happen in conjunction with volume/peer operations.

Users don't see the effects of this (latent) issue 'often', because the chances for a race'y update depend on the relative latency between backend-disks and network.

Version-Release number of selected component (if applicable):


How reproducible:
Intermittent.

Steps to Reproduce:
1. Run the regression test script added.
2.
3.
  
Actual results:
Sometimes, glusterd may crash and at other times, it could lead to inconsistent view of the cluster among glusterds.

Expected results:
Glusterd must continue to function as expected.


Additional info:

Comment 1 Anand Avati 2013-04-05 06:27:27 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 2 Anand Avati 2013-04-05 13:24:34 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 3 Anand Avati 2013-04-09 05:02:15 UTC

REVIEW: http://review.gluster.org/4795 (syncenv: be robust against spurious wake()s) posted (#1) for review on master by Anand Avati (avati)

Comment 4 Anand Avati 2013-04-09 07:11:52 UTC

COMMIT: http://review.gluster.org/4795 committed in master by Anand Avati (avati) 
------
commit ce111f472796d027796b0cc3a4a6f78689f1172d
Author: Anand Avati <avati>
Date:   Fri Apr 5 02:18:06 2013 -0700

    syncenv: be robust against spurious wake()s
    
    In the current implementation, when the callers of synctasks perform
    a spurious wake() of a sleeping synctask (i.e, an extra wake() soon
    after a wake() which already woke up a yielded synctask), there is
    now a possibility of two sync threacs picking up the same synctask.
    This can result in a crash. The fix is to change ->slept = 0|1 and
    membership of synctask in runqueue atomically.
    
    Today we dequeue a task from the runqueue in syncenv_task(), but
    reset ->slept = 0 much later in synctask_switchto() in an unlocked
    manner -- which is safe, when there are no spurious wake()s.
    
    However, this opens a race window where, if a second wake() happens
    after the dequeue, but before setting ->slept = 0, it results in
    queueing the same synctask in the runqueue once again, and get
    picked up by a different synctask.
    
    This is has been diagnosed to be the crashes in the regression tests
    of http://review.gluster.org/4784. However that patch still has a
    spurious wake() [the trigger for this bug] which is yet to be fixed.
    
    Change-Id: I9b4b9dd5115d6e62ba45162ae90dd5e917a4f83d
    BUG: 948686
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/4795
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Gluster Build System <jenkins.com>

Comment 5 Anand Avati 2013-04-09 15:18:34 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#3) for review on master by Krishnan Parthasarathi (kparthas)

Comment 6 Anand Avati 2013-04-10 00:35:59 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#4) for review on master by Anand Avati (avati)

Comment 7 Anand Avati 2013-04-10 11:56:13 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#5) for review on master by Krishnan Parthasarathi (kparthas)

Comment 8 Anand Avati 2013-04-10 11:56:37 UTC

REVIEW: http://review.gluster.org/4802 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 9 Anand Avati 2013-04-10 11:59:45 UTC

REVIEW: http://review.gluster.org/4802 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 10 Anand Avati 2013-04-10 12:00:03 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#6) for review on master by Krishnan Parthasarathi (kparthas)

Comment 11 Anand Avati 2013-04-11 18:08:07 UTC

REVIEW: http://review.gluster.org/4802 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#3) for review on master by Krishnan Parthasarathi (kparthas)

Comment 12 Anand Avati 2013-04-11 18:08:25 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#7) for review on master by Krishnan Parthasarathi (kparthas)

Comment 13 Anand Avati 2013-04-12 07:23:41 UTC

COMMIT: http://review.gluster.org/4802 committed in master by Anand Avati (avati) 
------
commit 77a02c4dd8467e2a78b3ab3cdef95178ef4b1898
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Thu Apr 11 09:36:35 2013 -0400

    object-storage: rebase Swift to 1.8.0 (grizzly)
    
    Two minor tweaks found while packaging 3.4.0-0.1.alpha2 for Fedora 19
    
    BUG: 948039
    Change-Id: I97175636164702cf4042bc4a18ffead76ad386cb
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/4807
    Reviewed-by: Jeff Darcy <jdarcy>
    Tested-by: Gluster Build System <jenkins.com>

Comment 14 Anand Avati 2013-04-12 07:26:25 UTC

REVIEW: http://review.gluster.org/4784 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#8) for review on master by Krishnan Parthasarathi (kparthas)

Comment 15 Anand Avati 2013-04-12 18:56:24 UTC

REVIEW: http://review.gluster.org/4820 (glusterd: big lock - a coarse-grained locking to prevent  races) posted (#1) for review on release-3.4 by Jeff Darcy (jdarcy)

Comment 16 Anand Avati 2013-04-12 20:48:13 UTC

COMMIT: http://review.gluster.org/4784 committed in master by Anand Avati (avati) 
------
commit f34343d3751cd73e8eabe6d5544fb1f58b316595
Author: Krishnan Parthasarathi <kparthas>
Date:   Tue Apr 2 07:56:25 2013 +0530

    glusterd: big lock - a coarse-grained locking to prevent races
    
    There are primarily three lists that are part of glusterd process,
    that are concurrently accessed. Namely, priv->volumes, priv->peers
    and volinfo->bricks_list.
    
    Big-lock approach
    -----------------
    WHAT IS IT?
    Big lock is a coarse-grained lock which protects all three
    lists, mentioned above, from racy access.
    
    HOW DOES IT WORK?
    At any given point in time, glusterd's thread(s) are in execution
    _iff_ there is a preceding, inbound network event. Of course, the
    sigwaiter thread and timer thread are exceptions.
    A network event is an external trigger to glusterd, via the epoll
    thread, in the form of POLLIN and POLLERR.
    As long as we take the big-lock at all such entry points and yield
    it when we are done, we are guaranteed that all the network events,
    accessing the global lists, are serialised.
    
    This amounts to holding the big lock at
    - all the handlers of all the actors in glusterd. (POLLIN)
    - all the cbks in glusterd. (POLLIN)
    - rpc_notify (DISCONNECT event), if we access/modify
      one of the three lists. (POLLERR)
    
    In the case of synctask'ized volume operations, we must remember that,
    if we held the big lock for the entire duration of the handler,
    we may block other non-synctask rpc actors from executing.
    For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
    To prevent this, we need to yield the big lock, when we yield the
    synctask, and reacquire on waking up of the synctask.
    
    Change-Id: Ib929f9905b55fb6c3fc27fefb497a26dba058e4f
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4784
    Reviewed-by: Jeff Darcy <jdarcy>
    Tested-by: Gluster Build System <jenkins.com>

Comment 17 Anand Avati 2013-04-15 10:35:27 UTC

REVIEW: http://review.gluster.org/4833 (syncenv: be robust against spurious wake()s) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 18 Anand Avati 2013-04-15 10:35:46 UTC

REVIEW: http://review.gluster.org/4834 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 19 Anand Avati 2013-04-15 10:36:04 UTC

REVIEW: http://review.gluster.org/4835 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 20 Anand Avati 2013-04-15 16:43:27 UTC

REVIEW: http://review.gluster.org/4837 (glusterd: Avoided deadlock in single node cluster, glusterd restart) posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 21 Anand Avati 2013-04-15 17:05:19 UTC

REVIEW: http://review.gluster.org/4837 (glusterd: Avoided deadlock in single node cluster, glusterd restart) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 22 Anand Avati 2013-04-16 16:36:20 UTC

REVIEW: http://review.gluster.org/4834 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#2) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 23 Anand Avati 2013-04-16 16:36:39 UTC

REVIEW: http://review.gluster.org/4835 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#2) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 24 Anand Avati 2013-04-16 16:37:14 UTC

REVIEW: http://review.gluster.org/4833 (syncenv: be robust against spurious wake()s) posted (#2) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 25 Anand Avati 2013-04-16 17:43:18 UTC

COMMIT: http://review.gluster.org/4837 committed in master by Vijay Bellur (vbellur) 
------
commit dcebed550ebfc878b0b3bd02ab7fe15db6764f81
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Apr 15 22:07:21 2013 +0530

    glusterd: Avoided deadlock in single node cluster, glusterd restart
    
    In a single node cluster, it is possible to deadlock on the "big
    lock", while restarting bricks. In glusterd_restart_bricks, we perform a
    glusterd_brick_connect, where we release the big lock in anticipation
    that glusterd_brick_rpc_notify could run in the same C stack (and
    deadlocking). So, in the restart code path, we could unlock before we
    have performed a lock on the big lock.
    
    To fix this, we need to take the big lock in the
    glusterd_launch_synctask 'thread' as well.
    
    Change-Id: I1abea1ca82b55c784b8a810a8194f254b32b1dcc
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4837
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 26 Anand Avati 2013-04-17 06:09:20 UTC

REVIEW: http://review.gluster.org/4834 (glusterd: Fixed spurious wakeups in glusterd syncops) posted (#3) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 27 Anand Avati 2013-04-17 06:09:40 UTC

REVIEW: http://review.gluster.org/4835 (glusterd: big lock - a coarse-grained locking to prevent races) posted (#3) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 28 Anand Avati 2013-04-17 06:10:16 UTC

REVIEW: http://review.gluster.org/4833 (syncenv: be robust against spurious wake()s) posted (#3) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 29 Anand Avati 2013-04-17 12:46:36 UTC

COMMIT: http://review.gluster.org/4833 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 1787debc1b6640e15a02ccac4699b92affb2bb14
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Apr 15 15:55:28 2013 +0530

    syncenv: be robust against spurious wake()s
    
    In the current implementation, when the callers of synctasks perform
    a spurious wake() of a sleeping synctask (i.e, an extra wake() soon
    after a wake() which already woke up a yielded synctask), there is
    now a possibility of two sync threacs picking up the same synctask.
    This can result in a crash. The fix is to change ->slept = 0|1 and
    membership of synctask in runqueue atomically.
    
    Today we dequeue a task from the runqueue in syncenv_task(), but
    reset ->slept = 0 much later in synctask_switchto() in an unlocked
    manner -- which is safe, when there are no spurious wake()s.
    
    However, this opens a race window where, if a second wake() happens
    after the dequeue, but before setting ->slept = 0, it results in
    queueing the same synctask in the runqueue once again, and get
    picked up by a different synctask.
    
    This is has been diagnosed to be the crashes in the regression tests
    of http://review.gluster.org/4784. However that patch still has a
    spurious wake() [the trigger for this bug] which is yet to be fixed.
    
    BUG: 948686
    Change-Id: I51858e887cad2680e46fb973629f8465f4429363
    Original-author: Anand Avati <avati>
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4833
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Gluster Build System <jenkins.com>

Comment 30 Anand Avati 2013-04-17 12:47:11 UTC

COMMIT: http://review.gluster.org/4834 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 47c118e22d9d6fb6662fe96841ed4fe3089739b5
Author: Krishnan Parthasarathi <kparthas>
Date:   Wed Apr 10 17:12:01 2013 +0530

    glusterd: Fixed spurious wakeups in glusterd syncops
    
    glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1.
    This is based on the wrong assumption that the cbkfn wasn't called.
    This would result in one more wakeup than there ought to be.
    
    BUG: 948686
    Change-Id: I839fd218a81255fe50c2047d67461d45360e894d
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4834
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 31 Anand Avati 2013-04-17 12:49:17 UTC

COMMIT: http://review.gluster.org/4835 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 92729add67e2e7b8c7589c2dfab0bde071a7faf2
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Apr 15 15:56:57 2013 +0530

    glusterd: big lock - a coarse-grained locking to prevent races
    
    There are primarily three lists that are part of glusterd process,
    that are concurrently accessed. Namely, priv->volumes, priv->peers
    and volinfo->bricks_list.
    
    Big-lock approach
    -----------------
    WHAT IS IT?
    Big lock is a coarse-grained lock which protects all three
    lists, mentioned above, from racy access.
    
    HOW DOES IT WORK?
    At any given point in time, glusterd's thread(s) are in execution
    _iff_ there is a preceding, inbound network event. Of course, the
    sigwaiter thread and timer thread are exceptions.
    A network event is an external trigger to glusterd, via the epoll
    thread, in the form of POLLIN and POLLERR.
    As long as we take the big-lock at all such entry points and yield
    it when we are done, we are guaranteed that all the network events,
    accessing the global lists, are serialised.
    
    This amounts to holding the big lock at
    - all the handlers of all the actors in glusterd. (POLLIN)
    - all the cbks in glusterd. (POLLIN)
    - rpc_notify (DISCONNECT event), if we access/modify
      one of the three lists. (POLLERR)
    
    In the case of synctask'ized volume operations, we must remember that,
    if we held the big lock for the entire duration of the handler,
    we may block other non-synctask rpc actors from executing.
    For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
    To prevent this, we need to yield the big lock, when we yield the
    synctask, and reacquire on waking up of the synctask.
    
    BUG: 948686
    Change-Id: I429832f1fed67bcac0813403d58346558a403ce9
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4835
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 32 Anand Avati 2013-05-01 07:22:09 UTC

REVIEW: http://review.gluster.org/4921 (synctask: implement barriers around yield, not the other way) posted (#1) for review on master by Anand Avati (avati)

Comment 33 Anand Avati 2013-05-01 07:34:01 UTC

REVIEW: http://review.gluster.org/4921 (synctask: implement barriers around yield, not the other way) posted (#2) for review on master by Anand Avati (avati)

Comment 34 Anand Avati 2013-05-02 03:58:38 UTC

REVIEW: http://review.gluster.org/4922 (Revert "glusterd: Fixed spurious wakeups in glusterd syncops") posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 35 Anand Avati 2013-05-02 10:51:39 UTC

REVIEW: http://review.gluster.org/4938 (glusterd: Syncop callbks should take big lock too) posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 36 Anand Avati 2013-05-03 01:17:02 UTC

COMMIT: http://review.gluster.org/4938 committed in master by Anand Avati (avati) 
------
commit b224d7ffb88c274ef0a65d4b5d30b2ce320c6200
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu May 2 16:13:59 2013 +0530

    glusterd: Syncop callbks should take big lock too
    
    Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4938
    Reviewed-by: Amar Tumballi <amarts>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 37 Anand Avati 2013-05-03 03:25:05 UTC

REVIEW: http://review.gluster.org/4922 (Revert "glusterd: Fixed spurious wakeups in glusterd syncops") posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 38 Anand Avati 2013-05-03 11:40:18 UTC

REVIEW: http://review.gluster.org/4944 (glusterd: Removed spurious barrier_wake) posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 39 Anand Avati 2013-05-04 04:40:01 UTC

REVIEW: http://review.gluster.org/4922 (Revert "glusterd: Fix spurious wakeups in glusterd syncops") posted (#3) for review on master by Krishnan Parthasarathi (kparthas)

Comment 40 Anand Avati 2013-05-04 05:51:55 UTC

REVIEW: http://review.gluster.org/4921 (synctask: implement barriers around yield, not the other way) posted (#3) for review on master by Anand Avati (avati)

Comment 41 Anand Avati 2013-05-04 07:33:29 UTC

COMMIT: http://review.gluster.org/4921 committed in master by Anand Avati (avati) 
------
commit 83cedcd9be2676e63b1be72ecaf3316a781773cb
Author: Anand Avati <avati>
Date:   Mon Apr 22 04:35:03 2013 -0700

    synctask: implement barriers around yield, not the other way
    
    In the current implementation, barriers are in the core of the
    syncprocessors. Wake()s are treated as syncbarrier wake. This
    is however delicate, as spurious wake()s of the synctask can
    mess up the accounting of the barrier and waking it prematurely.
    
    The fix is to keep yield() and wake() as the basic primitives,
    and implement barriers as an object impelemented on top of these
    primitives. This way, only an explicit barrier_wake() gets
    counted towards the barrier accounting, and spurious wakes
    will be truly safe.
    
    Change-Id: I8087f0f446113e5b2d0853431c0354335ccda076
    BUG: 948686
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/4921
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 42 Anand Avati 2013-05-04 07:39:14 UTC

REVIEW: http://review.gluster.org/4922 (Revert "glusterd: Fix spurious wakeups in glusterd syncops") posted (#4) for review on master by Krishnan Parthasarathi (kparthas)

Comment 43 Anand Avati 2013-05-05 04:36:21 UTC

COMMIT: http://review.gluster.org/4922 committed in master by Anand Avati (avati) 
------
commit fc39ee2ea3a22704ebacd0607cf6fd4eae9ec66a
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu May 2 09:11:58 2013 +0530

    Revert "glusterd: Fix spurious wakeups in glusterd syncops"
    
    This reverts commit efa154bb0a4cac34d5a9610ec25d38eebe495f22.
    
    -- Following is Avati's analysis (edited) from gerrit --
    The claim of the patch (being reverted) is that it in some cases cbkfn
    is missed.  This is wrong analysis. cbk_fn is _always_ called. The patch
    treats ret > 0 as a "missed cbk". ret > 0 only means socket submission
    was not complete, and is queued to submit asynchronously when POLLOUT is
    raised.  This is sufficient to guarantee that cbkfn is going to be
    called (either the socket errors or submission succeeds and reply
    eventually arrives).
    
    This commit also removes spurious barrier_wake(s).
    
    call backs are guaranteed to be called even if the transport is
    disconnected. This means, a 'wake' would be called if rpc_clnt_submit is
    called.  Also, we count both successful and failed operations in a
    particular batch of operations for the synctask_barrier_wait.  So,
    calling synctask_barrier_wake on failure of rpc_clnt_submit (say, due to
    network failure) would result in a spurious wake.
    
    Change-Id: I7d508c2a54b74a65b82f097742206bc777afc53a
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4922
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 44 Anand Avati 2013-05-13 05:40:06 UTC

REVIEW: http://review.gluster.org/4985 (syncop: Update synctask state appropriately) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 45 Anand Avati 2013-05-13 07:28:40 UTC

REVIEW: http://review.gluster.org/4987 (syncop: Remove task from synclock's waitq before 'wake') posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 46 Anand Avati 2013-05-16 05:22:14 UTC

COMMIT: http://review.gluster.org/4987 committed in master by Vijay Bellur (vbellur) 
------
commit 4c0b149d8e7c574186a1ccefd9c74b79f8a06267
Author: Krishnan Parthasarathi <kparthas>
Date:   Sat May 11 14:36:38 2013 +0530

    syncop: Remove task from synclock's waitq before 'wake'
    
    Removing task from synclock's waitq after wake could result in
    a subsequent unlock, wake'ing up the already running task.
    This fix makes the removal from waitq and wake 'atomic'.
    
    Change-Id: Ie51ccf9d38f2cee84471097644aab95496f488b1
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4987
    Reviewed-by: Jeff Darcy <jdarcy>
    Tested-by: Gluster Build System <jenkins.com>

Comment 47 Anand Avati 2013-05-16 11:45:41 UTC

REVIEW: http://review.gluster.org/5021 (glusterd: Refresh glusterd-syncop fixes from master) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 48 Anand Avati 2013-05-17 03:45:14 UTC

COMMIT: http://review.gluster.org/5021 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 8505eadb3032132a1b936951687ac643731c29ec
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu May 2 16:13:59 2013 +0530

    glusterd: Refresh glusterd-syncop fixes from master
    
    Following commits were cherry-picked from master,
    044f8ce syncop: Remove task from synclock's waitq before 'wake'
    cb6aeed glusterd: Give up big lock before performing any RPC
    46572fe Revert "glusterd: Fix spurious wakeups in glusterd syncops"
    5021e04 synctask: implement barriers around yield, not the other way
    4843937 glusterd: Syncop callbks should take big lock too
    
    Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4938
    Reviewed-by: Amar Tumballi <amarts>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/5021
    Reviewed-by: Vijay Bellur <vbellur>

Comment 49 Anand Avati 2013-05-21 03:00:28 UTC

REVIEW: http://review.gluster.org/5047 (syncop: Remove task from syncbarrier's waitq before 'wake') posted (#3) for review on master by Krishnan Parthasarathi (kparthas)

Comment 50 Anand Avati 2013-05-21 04:08:55 UTC

COMMIT: http://review.gluster.org/5047 committed in master by Anand Avati (avati) 
------
commit 277fabf577f95b20c61d65b28f8269e6abca6fee
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon May 20 17:17:05 2013 +0530

    syncop: Remove task from syncbarrier's waitq before 'wake'
    
    Removing task from syncbarrier's waitq after wake could result in a
    subsequent syncbarrier_wake, wake'ing up the already running task.  This
    fix makes the removal from waitq and wake 'atomic'
    
    The root cause and the fix are similar in spirit to what was observed
    in synclock's waitq implementation.
    
    Change-Id: I7dd9e6ad5945742bcda20eb5a06a9376bb18528e
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5047
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 51 Anand Avati 2013-05-21 08:49:26 UTC

REVIEW: http://review.gluster.org/5053 (syncop: Update synctask state appropriately) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 52 Anand Avati 2013-05-21 08:49:55 UTC

REVIEW: http://review.gluster.org/5054 (syncop: Remove task from syncbarrier's waitq before 'wake') posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 53 Anand Avati 2013-05-21 13:03:50 UTC

COMMIT: http://review.gluster.org/5053 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit f8e19c6b6f21142aadabd2f30dc3b960a6ec1fa2
Author: Krishnan Parthasarathi <kparthas>
Date:   Sat May 11 14:29:29 2013 +0530

    syncop: Update synctask state appropriately
    
            Backport of http://review.gluster.org/4985
    
    * Earlier, SYNCOP macro, the only consumer of synctask_yield, would set
    the task->state to SYNCTASK_SUSPEND. Today, we have glusterd having its
    own wrapper macros which don't set task's state. There is also the
    syncbarrier and synclock framework, which also participate in a
    synctask's scheduling (and need to keep a task's state up to date). It
    only makes more sense to leave a synctask's state to the synctask
    library, since its an internal affair.
    
    * Need to 'yawn' before 'yield' to avoid re-running tasks to set
      task->woken appropriately.
    
    Change-Id: Ic7a59e6ebcc46f03e53223ca237668d45a3cba40
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5053
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 54 Anand Avati 2013-05-21 13:04:27 UTC

COMMIT: http://review.gluster.org/5054 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 418f763303ec4003f4b805867ff306a2e43bca76
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon May 20 17:17:05 2013 +0530

    syncop: Remove task from syncbarrier's waitq before 'wake'
    
            Backport of http://review.gluster.org/5047
    
    Removing task from syncbarrier's waitq after wake could result in a
    subsequent syncbarrier_wake, wake'ing up the already running task.  This
    fix makes the removal from waitq and wake 'atomic'
    
    The root cause and the fix are similar in spirit to what was observed
    in synclock's waitq implementation.
    
    Change-Id: I7dd9e6ad5945742bcda20eb5a06a9376bb18528e
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5054
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 55 Anand Avati 2013-05-21 19:09:57 UTC

REVIEW: http://review.gluster.org/5058 (syncop: synctask shouldn't yawn, it could miss a 'wake') posted (#1) for review on master by Krishnan Parthasarathi (kparthas)

Comment 56 Anand Avati 2013-05-21 23:00:26 UTC

COMMIT: http://review.gluster.org/5058 committed in master by Anand Avati (avati) 
------
commit 16b5ec67120e198fb320e13ade9e31d3761b0932
Author: Krishnan Parthasarathi <kparthas>
Date:   Wed May 22 00:18:04 2013 +0530

    syncop: synctask shouldn't yawn, it could miss a 'wake'
    
    Change-Id: I7731fd33ca0c925cc52f8d105275b44fc625a1e2
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5058
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 57 Anand Avati 2013-05-23 03:16:29 UTC

REVIEW: http://review.gluster.org/5071 (syncop: synctask shouldn't yawn, it could miss a 'wake') posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 58 Anand Avati 2013-05-23 14:25:19 UTC

COMMIT: http://review.gluster.org/5071 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 3d68ed8fbb41be06c222aa7754b36e77edced92a
Author: Krishnan Parthasarathi <kparthas>
Date:   Wed May 22 00:18:04 2013 +0530

    syncop: synctask shouldn't yawn, it could miss a 'wake'
    
            Backport of http://review.gluster.org/5058
    
    Change-Id: I7731fd33ca0c925cc52f8d105275b44fc625a1e2
    BUG: 948686
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5071
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 59 Anand Avati 2013-05-24 10:23:02 UTC

REVIEW: http://review.gluster.org/5087 (glusterd-syncop: Fix unlocking and collating errors) posted (#1) for review on master by Kaushal M (kaushal)

Comment 60 Anand Avati 2013-05-26 16:22:08 UTC

REVIEW: http://review.gluster.org/5087 (glusterd-syncop: Fix unlocking and collating errors) posted (#2) for review on master by Kaushal M (kaushal)

Comment 61 Anand Avati 2013-06-04 20:28:08 UTC

COMMIT: http://review.gluster.org/5087 committed in master by Anand Avati (avati) 
------
commit 0c1916f3e1eb81aa81dc2d62e97f46880390838c
Author: Kaushal M <kaushal>
Date:   Thu May 23 12:19:33 2013 +0530

    glusterd-syncop: Fix unlocking and collating errors
    
    * Only those peers which were locked need to be unlocked.
    * Fix location of collating errors in callbacks. The callback functions
      could miss collating errors if there was an rpc error.
    
    Change-Id: Ie27c2f1ec197da4f5077a4d6e032127954ce87cd
    BUG: 948686
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/5087
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Gluster Build System <jenkins.com>

Comment 62 Niels de Vos 2014-07-11 15:48:38 UTC

*** Bug 924296 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.