REVIEW: http://review.gluster.org/8106 (features/changelog: prevent deadlock on thread cancellation) posted (#1) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/8106 (features/changelog: prevent deadlock on thread cancellation) posted (#2) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/8106 (features/changelog: prevent deadlock on thread cancellation) posted (#3) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/8106 (features/changelog: prevent deadlock on thread cancellation) posted (#4) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/8106 (features/changelog: prevent deadlock on thread cancellation) posted (#5) for review on master by Venky Shankar (vshankar)
COMMIT: http://review.gluster.org/8106 committed in master by Vijay Bellur (vbellur) ------ commit 6532a65b56a652622612a6edcd03fff90fbeff0f Author: Venky Shankar <vshankar> Date: Wed Jun 18 23:36:48 2014 +0530 features/changelog: prevent deadlock on thread cancellation helper threads (fsync, rollover) wake up periodically and perform their respective operation under a lock (crt->lock). These threads are also subjected to cancellation under some circumstance such as disabling changelog. This is inherently dangerous when funtions which are cancellation points for pthread_cancel(3) are used in the locked region. Consider this pthread_mutex_lock(&mutex); { /* ... */ ret = fsync (fd); <-- cancellation point /* ... */ } pthread_mutex_unlock(&mutex); A pthread_cancel(3) by another thread just before fsync(3) but after pthread_mutex_lock(3) would result in the thread getting cancelled when fsync(3) is invoked, thereby never unlocking the mutex. Moreover, in case of changelog translator, the locked region (under crt->lock in changelog-rt.c) is also the code path for fop changelog updation. Therefore, unlocking the mutex in thread cleanup handler (pthread_cleanup_pop(3)) might prematurely release the mutex during fop updation path. This patch fixes such problems existing in fsync and rollover threads. Fix is to enter the locked region with cancellation disabled and enable it after mutex unlock. Also, test for a cancellation request early on in case none of the functions are cancellation points. Change-Id: I1795627a12827609c1da659d07fc1457ffa033de BUG: 1110917 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/8106 Reviewed-by: Kotresh HR <khiremat> Reviewed-by: Vijay Bellur <vbellur> Tested-by: Gluster Build System <jenkins.com>
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users