Bug 1242515

Summary: racy condition in nfs/auth-cache feature
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.7.2CC: bugs, gluster-bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/11052/focus=11109
Whiteboard:
Fixed In Version: glusterfs-3.7.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1226717 Environment:
Last Closed: 2015-07-30 09:50:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1226717    
Bug Blocks: 1233025    

Description Niels de Vos 2015-07-13 13:13:11 UTC
+++ This bug was initially created as a clone of Bug #1226717 +++

Description of problem:
The auth-cache feature contains a function called auth_cache_purge(). This function replaces the auth_cache->cache_dict with a new dictionary that should contain fresh caches. The placing is triggered by the _mnt3_auth_param_refresh_thread().

There is no locking of the actual auth_cache_entry structures, and auth_cache_purge() can cause the free'ing of these entries while other threads are using them.

It is very rare to notice a problem, because the auth_cache_entry structures are used only very briefly. A chance for corruption is really small. Our regression tests seem to have hit this issue only once or twice in the last few months.

Version-Release number of selected component (if applicable):
3.7 and mainline

How reproducible:
extremely difficult. 

Additional info:
http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/11052/focus=11109

Comment 1 Anand Avati 2015-07-13 13:18:38 UTC
REVIEW: http://review.gluster.org/11645 (nfs: add a gf_lock_t for the auth_cache->cache_dict) posted (#1) for review on release-3.7 by Niels de Vos (ndevos)

Comment 2 Anand Avati 2015-07-13 13:18:40 UTC
REVIEW: http://review.gluster.org/11646 (nfs: refcount each auth_cache_entry and related data_t) posted (#1) for review on release-3.7 by Niels de Vos (ndevos)

Comment 3 Anand Avati 2015-07-13 13:18:42 UTC
REVIEW: http://review.gluster.org/11647 (refcount: correct the documentation) posted (#1) for review on release-3.7 by Niels de Vos (ndevos)

Comment 4 Anand Avati 2015-07-14 10:17:23 UTC
COMMIT: http://review.gluster.org/11647 committed in release-3.7 by Krishnan Parthasarathi (kparthas) 
------
commit dd66dd9d6c249282711d56678bdfe22c2a8d0975
Author: Niels de Vos <ndevos>
Date:   Mon Jul 13 12:16:33 2015 +0200

    refcount: correct the documentation
    
    The only check that _gf_ref_get() needs is "== 0" for detecting a
    failure. The actual return value is not guaranteed to be the number of
    active refences (they can change in other threads anyway).
    
    Cherry picked from commit c7f309116d8fa62f6b9fd6ff2902e8ce4bfa192d:
    > BUG: 1163543
    > Change-Id: I8801601eab37046f5a5ee0bce5a62606115ca151
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: http://review.gluster.org/11328
    > Tested-by: NetBSD Build System <jenkins.org>
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle>
    
    Change-Id: I8801601eab37046f5a5ee0bce5a62606115ca151
    BUG: 1242515
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/11647
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 5 Anand Avati 2015-07-18 08:35:20 UTC
COMMIT: http://review.gluster.org/11645 committed in release-3.7 by Niels de Vos (ndevos) 
------
commit 3d6dacd69ca439e338ad59bfab53ce6c72b028d0
Author: Niels de Vos <ndevos>
Date:   Mon Jul 13 12:14:53 2015 +0200

    nfs: add a gf_lock_t for the auth_cache->cache_dict
    
    This is the 1st step towards implementing reference counters for the
    auth_cache_entry structure. Access to the structures should always be
    done atomically, but this can not be guaranteed by the a dict.
    
    Cherry picked from commit 67f7562b5cc9e42774d1dc569471f86f61eef040:
    > Change-Id: Ic165221d72f11832177976c989823d861cf12f01
    > BUG: 1226717
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: http://review.gluster.org/11021
    > Tested-by: NetBSD Build System <jenkins.org>
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    
    Change-Id: Ic165221d72f11832177976c989823d861cf12f01
    BUG: 1242515
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/11645
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: jiffin tony Thottan <jthottan>

Comment 6 Anand Avati 2015-07-18 08:36:18 UTC
COMMIT: http://review.gluster.org/11646 committed in release-3.7 by Niels de Vos (ndevos) 
------
commit 85a7ad784e92f4b0bedb44f7e64bf4e9adfae5ce
Author: Niels de Vos <ndevos>
Date:   Mon Jul 13 12:16:04 2015 +0200

    nfs: refcount each auth_cache_entry and related data_t
    
    This makes sure that all the auth_cache_entry structures are only free'd
    when there is no reference to it anymore. When it is free'd, the
    associated data_t from the auth_cache->cache_dict gets unref'd too.
    
    Upon calling auth_cache_purge(), the auth_cache->cache_dict will free
    each auth_cache_entry in a secure way.
    
    Cherry picked from commit 7b51bd636fc5e5e1ae48a4e7cba48d0d20878d15:
    > Change-Id: If097cc11838e43599040f5414f82b30fc0fd40c6
    > BUG: 1226717
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: http://review.gluster.org/11023
    > Reviewed-by: Xavier Hernandez <xhernandez>
    > Tested-by: Gluster Build System <jenkins.com>
    > Tested-by: NetBSD Build System <jenkins.org>
    
    Change-Id: If097cc11838e43599040f5414f82b30fc0fd40c6
    BUG: 1242515
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/11646
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 7 Kaushal 2015-07-30 09:50:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Kaushal 2015-07-30 09:50:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 9 Kaushal 2015-07-30 09:51:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user