Bug 1219026 - glusterd crashes when brick option validation fails
Summary: glusterd crashes when brick option validation fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Jeff Darcy
QA Contact:
URL:
Whiteboard:
: 1220057 (view as bug list)
Depends On: 1211749 1220057
Blocks: glusterfs-3.7.0
TreeView+ depends on / blocked
 
Reported: 2015-05-06 12:34 UTC by Jeff Darcy
Modified: 2015-05-14 17:47 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1211749
Environment:
Last Closed: 2015-05-14 17:29:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Darcy 2015-05-06 12:34:35 UTC
+++ This bug was initially created as a clone of Bug #1211749 +++

While working on server-side AFR/NSR support, I added the following test to tests/basic/afr/read-subvol-data.t:

   TEST ! set_read_subvol $V0 no-such-xlator

The result was a glusterd crash, like this:

   (gdb) bt
   #0  0x00007f0e5d5920b0 in pthread_spin_lock () from /lib64/libpthread.so.0
   #1  0x00007f0e5e46dd69 in __gf_free (...)
       at mem-pool.c:303
   #2  0x00007f0e59e251b4 in gd_sync_task_begin (...)
       at glusterd-syncop.c:1767
   #3  0x00007f0e59e25260 in glusterd_op_begin_synctask (...)
       at glusterd-syncop.c:1787
   #4  0x00007f0e59d770b2 in __glusterd_handle_set_volume (...)
       at glusterd-handler.c:1871

The __gf_free in question is for op_errstr, explaining the nature of the validation error.  Here's a relevant comment from the patch I'll be posting momentarily, at the point where op_errstr is set, describing the real problem and a fix/workaround.

   * In the validation-error code path, the graph is freed
   * before op_errstr is.  Therefore, if the memory block for
   * op_errstr still contains a reference to a translator within
   * that graph, we'll crash.  Make sure the reference is to a
   * translator that's not going away instead.

--- Additional comment from Anand Avati on 2015-04-14 15:35:44 EDT ---

REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option validation fails) posted (#1) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Anand Avati on 2015-04-14 16:05:21 EDT ---

REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option validation fails) posted (#2) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Anand Avati on 2015-04-15 12:36:03 EDT ---

REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option validation fails) posted (#3) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Anand Avati on 2015-04-28 05:05:51 EDT ---

REVIEW: http://review.gluster.org/10417 (core: use reference counting for mem_acct structures) posted (#1) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Anand Avati on 2015-04-28 07:44:15 EDT ---

REVIEW: http://review.gluster.org/10417 (core: use reference counting for mem_acct structures) posted (#2) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Anand Avati on 2015-05-04 10:17:33 EDT ---

REVIEW: http://review.gluster.org/10417 (core: use reference counting for mem_acct structures) posted (#3) for review on master by Jeff Darcy (jdarcy)

Comment 1 Vijay Bellur 2015-05-09 15:49:37 UTC
*** Bug 1220057 has been marked as a duplicate of this bug. ***

Comment 2 Anand Avati 2015-05-09 15:50:24 UTC
REVIEW: http://review.gluster.org/10723 (core: use reference counting for mem_acct structures) posted (#2) for review on release-3.7 by Vijay Bellur (vbellur)

Comment 3 Anand Avati 2015-05-09 21:28:02 UTC
COMMIT: http://review.gluster.org/10723 committed in release-3.7 by Niels de Vos (ndevos) 
------
commit a3af10a801a40fe990ee5db63c6dd6cb97713e4c
Author: Jeff Darcy <jdarcy>
Date:   Tue Apr 28 04:40:00 2015 -0400

    core: use reference counting for mem_acct structures
    
    When freeing memory, our memory-accounting code expects to be able to
    dereference from the (previously) allocated block to its owning
    translator.  However, as we have already found once in option
    validation and twice in logging, that translator might itself have
    been freed and the dereference attempt causes on of our daemons to
    crash with SIGSEGV.  This patch attempts to fix that as follows:
    
     * We no longer embed a struct mem_acct directly in a struct xlator,
       but instead allocate it separately.
    
     * Allocated memory blocks now contain a pointer to the mem_acct
       instead of the xlator.
    
     * The mem_acct structure contains a reference count, manipulated in
       both the normal and translator allocate/free code using atomic
       increments and decrements.
    
     * Because it's now a separate structure, we can defer freeing the
       mem_acct until its reference count reaches zero (either way).
    
     * Some unit tests were disabled, because they embedded their own
       copies of the implementation for what they were supposedly testing.
       Life's too short to spend time fixing tests that seem designed to
       impede progress by requiring a certain implementation as well as
       behavior.
    
    Change-Id: Id929b11387927136f78626901729296b6c0d0fd7
    BUG: 1219026
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: http://review.gluster.org/10417
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Niels de Vos <ndevos>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-on: http://review.gluster.org/10723
    Tested-by: NetBSD Build System

Comment 4 Niels de Vos 2015-05-14 17:29:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:36:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:38:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:47:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.