Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1412941 - Regression caused by enabling client-io-threads by default
Summary: Regression caused by enabling client-io-threads by default
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: io-threads
Version: 3.8
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1381830
Blocks: 1387894
TreeView+ depends on / blocked
 
Reported: 2017-01-13 07:50 UTC by Pranith Kumar K
Modified: 2017-02-20 12:34 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.8.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1381830
Environment:
Last Closed: 2017-02-20 12:34:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pranith Kumar K 2017-01-13 07:50:46 UTC
+++ This bug was initially created as a clone of Bug #1381830 +++

Description of problem:

As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1380619#c11,   there is a regression caused to gfapi applications with making client-io-threads option on by default. 
    
iot-worker threads spawned are not cleaned up as part of xlator->fini() and they could end up accessing invalid/freed memory.

we need to fix io-thread->fini() to cleanup those threads before exiting. Since it could be an intricate fix, we could try disabling io-threads by default till then.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2016-10-05 05:35:56 EDT ---

REVIEW: http://review.gluster.org/15616 (Revert "mgmt/glusterd: Enable client-io-threads by default") posted (#2) for review on master by soumya k (skoduri@redhat.com)

--- Additional comment from Worker Ant on 2016-10-09 12:18:41 EDT ---

REVIEW: http://review.gluster.org/15620 (performance/io-threads: Exit all threads on PARENT_DOWN) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Worker Ant on 2016-10-21 07:38:04 EDT ---

REVIEW: http://review.gluster.org/15620 (performance/io-threads: Exit all threads on PARENT_DOWN) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Worker Ant on 2016-10-22 03:59:06 EDT ---

REVIEW: http://review.gluster.org/15620 (performance/io-threads: Exit all threads on PARENT_DOWN) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Worker Ant on 2016-10-23 05:32:10 EDT ---

COMMIT: http://review.gluster.org/15620 committed in master by Raghavendra G (rgowdapp@redhat.com) 
------
commit d7a5ca16911caca03cec1112d4be56a9cda2ee30
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Sun Oct 9 21:36:40 2016 +0530

    performance/io-threads: Exit all threads on PARENT_DOWN
    
    Problem:
    When glfs_fini() is called on a volume where client.io-threads is enabled,
    fini() will free up iothread xl's private structure but there would be some
    threads that are sleeping which would wakeup after the timedwait completes
    leading to accessing already free'd memory.
    
    Fix:
    As part of parent-down, exit all sleeping threads.
    
    BUG: 1381830
    Change-Id: I0bb8d90241112c355fb22ee3fbfd7307f475b339
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/15620
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>

Comment 1 Worker Ant 2017-01-13 08:01:57 UTC
REVIEW: http://review.gluster.org/16396 (performance/io-threads: Exit all threads on PARENT_DOWN) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 2 Worker Ant 2017-01-13 08:02:01 UTC
REVIEW: http://review.gluster.org/16397 (performance/io-threads: Exit threads in fini() as well) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 3 Worker Ant 2017-01-17 15:58:19 UTC
COMMIT: http://review.gluster.org/16396 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 9e3fea1e02b97781af6e6b46d8fd0fb6da47e6c9
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Sun Oct 9 21:36:40 2016 +0530

    performance/io-threads: Exit all threads on PARENT_DOWN
    
    Problem:
    When glfs_fini() is called on a volume where client.io-threads is enabled,
    fini() will free up iothread xl's private structure but there would be some
    threads that are sleeping which would wakeup after the timedwait completes
    leading to accessing already free'd memory.
    
    Fix:
    As part of parent-down, exit all sleeping threads.
    
    Please note that the upstream patch differs from this a little bit,
    because least-prio-throttling feature is removed from master, 3.9
    
     >BUG: 1381830
     >Change-Id: I0bb8d90241112c355fb22ee3fbfd7307f475b339
     >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
     >Reviewed-on: http://review.gluster.org/15620
     >Smoke: Gluster Build System <jenkins@build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
     >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
     >Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
     >(cherry picked from commit d7a5ca16911caca03cec1112d4be56a9cda2ee30)
    
    BUG: 1412941
    Change-Id: I6341156251279b24ab2323cedf1b9722e42da671
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/16396
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Comment 4 Worker Ant 2017-01-17 15:59:48 UTC
REVIEW: http://review.gluster.org/16397 (performance/io-threads: Exit threads in fini() as well) posted (#2) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 5 Worker Ant 2017-01-18 02:46:38 UTC
COMMIT: http://review.gluster.org/16397 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 2cfb7bc2419bbd38eaec070fbd2c874dd748f16b
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Fri Nov 18 13:30:08 2016 +0530

    performance/io-threads: Exit threads in fini() as well
    
    Problem:
    io-threads starts the thread in 'init()' but doesn't clean them up
    on 'fini()'. It relies on PARENT_DOWN to exit threads but there can
    be cases where event before PARENT_UP the graph init code can think
    of issuing fini(). This code path is hit when glfs_init() is called
    on a volume that is in 'stopped' state. It leads to a crash in ganesha
    process, because the io-thread tries to access freed memory.
    
    Fix:
    Ideal fix would be to wait for all fops in io-thread list to be completed on
    PARENT_DOWN, and have fini() do cleanup of threads. Because there is no proper
    documentation about how PARENT_DOWN/fini are supposed to be used,
    we are getting different kinds of sequences in different higher level protocols.
    So for now cleaning up in both PARENT_DOWN and fini(). Fuse doesn't call fini()
    gfapi is not calling PARENT_DOWN in some cases, so for now I don't see
    another way out.
    
     >BUG: 1396793
     >Change-Id: I9c9154e7d57198dbaff0f30d3ffc25f6d8088aec
     >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
     >Reviewed-on: http://review.gluster.org/15888
     >Smoke: Gluster Build System <jenkins@build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
     >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
     >Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
     >(cherry picked from commit 25817a8c868b6c1b8149117f13e4216a99e453aa)
    
    BUG: 1412941
    Change-Id: I5e36a7d253f2ef8abce507eced1eb7073cff930c
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/16397
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Smoke: Gluster Build System <jenkins@build.gluster.org>

Comment 6 Niels de Vos 2017-02-20 12:34:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.9, please open a new bug report.

glusterfs-3.8.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-February/000066.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.