Bug 1247850 - Glusterfsd crashes because of thread-unsafe code in gf_authenticate
Summary: Glusterfsd crashes because of thread-unsafe code in gf_authenticate
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.7.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On: 1247765
Blocks: 1247851
TreeView+ depends on / blocked
 
Reported: 2015-07-29 05:10 UTC by Raghavendra G
Modified: 2015-09-09 09:38 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.4
Clone Of: 1247765
Environment:
Last Closed: 2015-09-09 09:38:41 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra G 2015-07-29 05:10:31 UTC
+++ This bug was initially created as a clone of Bug #1247765 +++

This caused a failure on a regression test for an unrelated patch.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12889/console

The relevant part of the stack trace looks like this.

#2  0x00007f3f84194d18 in dict_get (this=0x7f3f680023ec
#3  0x00007f3f744a4a45 in gf_auth (input_params=0x7f3f680023ec
#4  0x00007f3f748d9894 in map (this=0x7f3f70031dec
#5  0x00007f3f84196885 in dict_foreach_match (dict=0x7f3f70031dec
#6  0x00007f3f8419675d in dict_foreach (dict=0x7f3f70031dec
#7  0x00007f3f748d99d5 in gf_authenticate

In frame 2, "this" looks like a valid pointer, but the structure it points to seems to contain garbage.  The problem becomes evident when we look at frame 7.

(gdb) p input_params
$3 = (dict_t *) 0x7f3f700665ac
(gdb) p __input_params
$4 = (dict_t *) 0x7f3f680023ec

So the value we're using is from the global __input_params.  This should match the parameter input_params (which looks OK) but that's not the case.  Apparently some other thread came in and stomped the unprotected global variable while we were still using it.  This concurrency issue was actually reported - and fixed! - on September 13, 2013.

http://review.gluster.org/#/c/5846/

Unfortunately, since most people weren't running multiple network threads, they never saw the problem, didn't take it seriously, and bikeshedded the patch to death.  Now that we have multi-threaded epoll, it's going to keep biting everyone until it's fixed.

--- Additional comment from Anand Avati on 2015-07-28 16:27:49 EDT ---

REVIEW: http://review.gluster.org/11780 (rpc: fix concurrency bug in gf_authenticate) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 1 Anand Avati 2015-07-29 14:14:51 UTC
COMMIT: http://review.gluster.org/11785 committed in release-3.7 by Niels de Vos (ndevos) 
------
commit cc2ebbd7f5a043cb953521bb9d65ddc3235cae43
Author: Niels de Vos <ndevos>
Date:   Wed Jul 29 09:51:45 2015 +0200

    rpc: fix concurrency bug in gf_authenticate
    
    The basic problem is that gf_authenticate abused global variables to
    pass info through dict_foreach.  This is not thread-safe, but it wasn't
    affecting most people until multi-threaded epoll came along.  Now, if
    two threads get into this code at the same time - and there's nothing to
    prevent it - one of them could free one of the dicts involved while the
    other is still using it.
    
    The fix is to pass this same information using a temporary structure
    instead of globals.  This makes the code smaller and more efficient too.
    
    Cherry picked from commit 8f04ec33bc86aa464a5f8b77f9d64e5608cb6f1b:
    > Change-Id: I72cccc440bb40d5f7ff695250dd014762c7efb73
    > BUG: 1247765
    > Signed-off-by: Jeff Darcy <jdarcy>
    > Reviewed-on: http://review.gluster.org/11780
    > Tested-by: NetBSD Build System <jenkins.org>
    > Reviewed-by: Niels de Vos <ndevos>
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Raghavendra G <rgowdapp>
    
    BUG: 1247850
    Change-Id: I151dad436b859c64985421394f3dea572723c2aa
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/11785
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 2 Kaushal 2015-09-09 09:38:41 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.