Bug 1247765 - Glusterfsd crashes because of thread-unsafe code in gf_authenticate
Glusterfsd crashes because of thread-unsafe code in gf_authenticate
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Jeff Darcy
:
Depends On:
Blocks: 1247850
  Show dependency treegraph
 
Reported: 2015-07-28 15:59 EDT by Jeff Darcy
Modified: 2016-06-16 09:27 EDT (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1247850 1247851 (view as bug list)
Environment:
Last Closed: 2016-06-16 09:27:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Darcy 2015-07-28 15:59:35 EDT
This caused a failure on a regression test for an unrelated patch.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12889/console

The relevant part of the stack trace looks like this.

#2  0x00007f3f84194d18 in dict_get (this=0x7f3f680023ec
#3  0x00007f3f744a4a45 in gf_auth (input_params=0x7f3f680023ec
#4  0x00007f3f748d9894 in map (this=0x7f3f70031dec
#5  0x00007f3f84196885 in dict_foreach_match (dict=0x7f3f70031dec
#6  0x00007f3f8419675d in dict_foreach (dict=0x7f3f70031dec
#7  0x00007f3f748d99d5 in gf_authenticate

In frame 2, "this" looks like a valid pointer, but the structure it points to seems to contain garbage.  The problem becomes evident when we look at frame 7.

(gdb) p input_params
$3 = (dict_t *) 0x7f3f700665ac
(gdb) p __input_params
$4 = (dict_t *) 0x7f3f680023ec

So the value we're using is from the global __input_params.  This should match the parameter input_params (which looks OK) but that's not the case.  Apparently some other thread came in and stomped the unprotected global variable while we were still using it.  This concurrency issue was actually reported - and fixed! - on September 13, 2013.

http://review.gluster.org/#/c/5846/

Unfortunately, since most people weren't running multiple network threads, they never saw the problem, didn't take it seriously, and bikeshedded the patch to death.  Now that we have multi-threaded epoll, it's going to keep biting everyone until it's fixed.
Comment 1 Anand Avati 2015-07-28 16:27:49 EDT
REVIEW: http://review.gluster.org/11780 (rpc: fix concurrency bug in gf_authenticate) posted (#1) for review on master by Jeff Darcy (jdarcy@redhat.com)
Comment 2 Mike McCune 2016-03-28 18:45:31 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 3 Niels de Vos 2016-06-16 09:27:16 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.