+++ This bug was initially created as a clone of Bug #1509189 +++ Description of problem: As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1508817#c4, there is a chance of hitting race between gf_timer_registry_destroy(). gf_timer_call_cancel() and gf_timer_proc() leading to use_after_free. As explained by Dan, the flow is as below - gf_timer_proc() is called, locks reg, and gets an event. It unlocks reg, and calls the callback. Now, gf_timer_registry_destroy() is called, and removes reg from ctx, and joins on gf_timer_proc(). Now, gf_timer_call_cancel() is called on the event being processed. It cannot find reg (since it's been removed from reg), so it frees event. Now the callback returns into gf_timer_proc(), and it tries to free event, but it's already free, so double free. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2017-11-03 06:44:11 EDT --- REVIEW: https://review.gluster.org/18652 (timer: Fix possible race during cleanup) posted (#1) for review on master by soumya k --- Additional comment from Worker Ant on 2017-11-21 03:56:56 EST --- COMMIT: https://review.gluster.org/18652 committed in master by \"soumya k\" <skoduri> with a commit message- timer: Fix possible race during cleanup As mentioned in bug1509189, there is a possible race between gf_timer_cancel(), gf_timer_proc() and gf_timer_registry_destroy() leading to use_after_free. Problem: 1) gf_timer_proc() is called, locks reg, and gets an event. It unlocks reg, and calls the callback. 2) Meanwhile gf_timer_registry_destroy() is called, and removes reg from ctx, and joins on gf_timer_proc(). 3) gf_timer_call_cancel() is called on the event being processed. It cannot find reg (since it's been removed from reg), so it frees event. 4) the callback returns into gf_timer_proc(), and it tries to free event, but it's already free, so double free. Solution: The fix is to bail out in gf_timer_cancel() when registry is not found. The logic behind this is that, gf_timer_cancel() is called only on any existing event. That means there was a valid registry earlier while creating that event. And the only reason we cannot find that registry now is that it must have got set to NULL when context cleanup is started. Since gf_timer_proc() takes care of releasing all the remaining events active on that registry, it seems safe to bail out in gf_timer_cancel(). Change-Id: Ia9b088533141c3bb335eff2fe06b52d1575bb34f BUG: 1509189 Reported-by: Daniel Gryniewicz <dang> Signed-off-by: Soumya Koduri <skoduri> --- Additional comment from Shyamsundar on 2018-03-15 07:19:42 EDT --- This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/
*** Bug 1568373 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607