This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1471870 - cthon04 can cause segfault in gNFS/NLM
cthon04 can cause segfault in gNFS/NLM
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
3.10
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Niels de Vos
: Triaged
Depends On: 1467313
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-17 10:50 EDT by Niels de Vos
Modified: 2017-08-21 09:40 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.10.5
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1467313
Environment:
Last Closed: 2017-08-21 09:40:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Niels de Vos 2017-07-17 10:50:10 EDT
+++ This bug was initially created as a clone of Bug #1467313 +++

Description of problem:
While running cthon04 tests against Gluster/NFS, the following crash was observed (RHGS backports gnfs/nlm fixes to 3.8.4):

ify?! [Invalid argument]
[2017-06-19 13:08:46.117375] W [socket.c:595:__socket_rwv] 0-NLM-client: readv on 10.70.37.142:34033 failed (No data available)
[2017-06-19 13:08:46.117529] W [socket.c:595:__socket_rwv] 0-NLM-client: readv on 10.70.37.142:34033 failed (No data available)
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2017-06-19 13:08:48
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f6ec83b54b2]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f6ec83befe4]
/lib64/libc.so.6(+0x35270)[0x7f6ec6a1e270]
/lib64/libc.so.6(+0x165921)[0x7f6ec6b4e921]
/usr/lib64/glusterfs/3.8.4/xlator/nfs/server.so(+0x3f9aa)[0x7f6eba13a9aa]
/usr/lib64/glusterfs/3.8.4/xlator/nfs/server.so(+0x42349)[0x7f6eba13d349]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x214)[0x7f6ec817eb54]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f6ec817a9e3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x51d7)[0x7f6ebcfa71d7]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9918)[0x7f6ebcfab918]
/lib64/libglusterfs.so.0(+0x849d6)[0x7f6ec840f9d6]
/lib64/libpthread.so.0(+0x7e25)[0x7f6ec7214e25]
/lib64/libc.so.6(clone+0x6d)[0x7f6ec6ae134d]


Version-Release number of selected component (if applicable):
mainline (reported against RHGS with glusterfs-3.8.4 w/ backports)

How reproducible:
Run cthon04 tests against Gluster/NFS. When using EC volumes the problem hit-ration is highest.

Steps to Reproduce:
1. configure a Gluster volume
2. on a nfs-client (nor part of the TSP)
   1. git clone git://git.linux-nfs.org/projects/steved/cthon04.git
   2. compile the tests and make sure dependencies are installed
   3. run like
      # mount -t nfs -o vers=3 vm015.example.com:/one-brick /mnt/nfsv3
      # ./server -a -p /one-brick -m /mnt/nfsv3 vm015.example.com

Actual results:
Occasional, but regularly segfaults of Gluster/NFS.

Expected results:
No segfaults (duh!) and passing of the crhon04 tests.

Additional info:

--- Additional comment from Worker Ant on 2017-07-04 22:04:34 CEST ---

REVIEW: https://review.gluster.org/17696 (nfs: make nfs3_call_state_t refcounted) posted (#1) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-04 22:04:38 CEST ---

REVIEW: https://review.gluster.org/17697 (nfs/nlm: unref fds in nlm_client_free()) posted (#1) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-04 22:04:49 CEST ---

REVIEW: https://review.gluster.org/17698 (nfs/nlm: handle reconnect for non-NLM4_LOCK requests) posted (#1) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-04 22:04:57 CEST ---

REVIEW: https://review.gluster.org/17699 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#1) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-04 22:05:03 CEST ---

REVIEW: https://review.gluster.org/17700 (nfs/nlm: keep track of the call-state and frame for notifications) posted (#1) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-06 14:22:24 CEST ---

COMMIT: https://review.gluster.org/17696 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit daed52b8ebcac7ef36f11e944f83826f46593867
Author: Niels de Vos <ndevos@redhat.com>
Date:   Fri Jun 23 10:01:27 2017 +0200

    nfs: make nfs3_call_state_t refcounted
    
    There is no refcounting done of the nfs3_call_state_t structure, which
    seems to result in use-after-free problems in the NLM part of
    Gluster/NFS. The structure is initialized with two different functions,
    it is easier to have a single place to do this.
    
    The Gluster/NFS part will not use the refcounting, for now. This is
    being added to make the NLM code more stable. nfs3_call_state_wipe()
    will behave as before for Gluster/NFS, but cleanup is triggered through
    the refcounting now. This prevents major changes to the stable part of
    the NFS-server, and makes it possible to improve the NLM component
    separately.
    
    Change-Id: I2e15bcf12af74e8a46c2727e4a160e9444d29ece
    BUG: 1467313
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17696
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Amar Tumballi <amarts@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>

--- Additional comment from Worker Ant on 2017-07-06 14:22:38 CEST ---

COMMIT: https://review.gluster.org/17697 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit e9a482f94e748ea12e73ddd2e275bad9aa314b4c
Author: Niels de Vos <ndevos@redhat.com>
Date:   Fri Jun 30 17:54:34 2017 +0200

    nfs/nlm: unref fds in nlm_client_free()
    
    When a nlm_clnt is getting free'd, the FDs associated with this client
    should be unref'd as well.
    
    Change-Id: Ifa4ea4b7ed45a454413cfc0c820f2516c534a9aa
    BUG: 1467313
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17697
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Amar Tumballi <amarts@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>

--- Additional comment from Worker Ant on 2017-07-07 10:56:34 CEST ---

REVIEW: https://review.gluster.org/17699 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#2) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-07 12:17:06 CEST ---

REVIEW: https://review.gluster.org/17698 (nfs/nlm: handle reconnect for non-NLM4_LOCK requests) posted (#2) for review on master by Niels de Vos (ndevos@redhat.com)

--- Additional comment from Worker Ant on 2017-07-09 11:13:02 CEST ---

COMMIT: https://review.gluster.org/17698 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit fafe1491ead527ba1024c521013aa90d2ee2b355
Author: Niels de Vos <ndevos@redhat.com>
Date:   Wed Jun 21 16:25:33 2017 +0200

    nfs/nlm: handle reconnect for non-NLM4_LOCK requests
    
    When a reply on an NLM-procedure gets stuck, the NFS-client will resend
    the request. This can happen through a re-connect in case the connection
    was terminated (long delay in the reply on the initial request). Once
    that happens, not all NLM-procedures are handled correctly.
    
    Testing this is difficult and time-consuming. There still may be
    problems with certain operations, but this definitely makes it behave
    much better than before.
    
    The problem occured due to a problem in EC, change-id I18a782903ba
    addressed the root cause.
    
    Change-Id: I23b385568e27232951fa3fbd7198a0e5d775a8c2
    BUG: 1467313
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17698
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

--- Additional comment from Worker Ant on 2017-07-09 11:13:47 CEST ---

COMMIT: https://review.gluster.org/17699 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit 01bfdd4d1759423681d311da33f4ac2346ace445
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 3 16:24:53 2017 +0200

    nfs/nlm: use refcounting for nfs3_call_state_t
    
    In order to track down a potential use-after-free of the
    nfs3_call_state_t structure in the NLM component, add reference counting
    where teh structure is used. This should prevent premature free'ing of
    the structure.
    
    Change-Id: Ib1f13b0463ab1e012b7b49a623c91f0f3e73e1fb
    BUG: 1467313
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17699
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

--- Additional comment from Worker Ant on 2017-07-09 11:14:07 CEST ---

COMMIT: https://review.gluster.org/17700 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit b81997264f079983fa02bd5fa2b3715224942b00
Author: Niels de Vos <ndevos@redhat.com>
Date:   Tue Jul 4 20:11:11 2017 +0200

    nfs/nlm: keep track of the call-state and frame for notifications
    
    When blocking locks are used, a new frame is allocated that is used to
    send the notification to the client once once the lock becomes
    available. In all other cases, the frame that contains the request from
    the client will be used for the reply.
    
    Because there was no way to track the different clients with their
    requests (captured in the call-state), the call-state could be free'd
    before the notification was sent to the client. This caused a
    use-after-free of the call-state and could trigger segfaults of the
    Gluster/NFS server or incorrect replies on (un)lock requests.
    
    By introducing a nlm4_notify_args structure, the call-state and frame
    can be tracked better. This prevents the possibility of segfaulting when
    the call-state is used after being free'd.
    
    BUG: 1467313
    Change-Id: I285d2bc552f509e5145653b7a50afcff827cd612
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17700
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Comment 1 Worker Ant 2017-07-17 10:53:09 EDT
REVIEW: https://review.gluster.org/17792 (nfs: make nfs3_call_state_t refcounted) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 2 Worker Ant 2017-07-17 10:53:15 EDT
REVIEW: https://review.gluster.org/17793 (nfs/nlm: unref fds in nlm_client_free()) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 3 Worker Ant 2017-07-17 10:53:21 EDT
REVIEW: https://review.gluster.org/17794 (nfs/nlm: handle reconnect for non-NLM4_LOCK requests) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 4 Worker Ant 2017-07-17 10:53:26 EDT
REVIEW: https://review.gluster.org/17795 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 5 Worker Ant 2017-07-17 10:53:31 EDT
REVIEW: https://review.gluster.org/17796 (nfs/nlm: keep track of the call-state and frame for notifications) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 6 Worker Ant 2017-07-29 10:15:29 EDT
REVIEW: https://review.gluster.org/17792 (nfs: make nfs3_call_state_t refcounted) posted (#2) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 7 Worker Ant 2017-07-29 10:15:33 EDT
REVIEW: https://review.gluster.org/17793 (nfs/nlm: unref fds in nlm_client_free()) posted (#2) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 8 Worker Ant 2017-07-29 10:15:48 EDT
REVIEW: https://review.gluster.org/17794 (nfs/nlm: handle reconnect for non-NLM4_LOCK requests) posted (#2) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 9 Worker Ant 2017-07-29 10:15:55 EDT
REVIEW: https://review.gluster.org/17795 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#2) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 10 Worker Ant 2017-07-29 10:16:02 EDT
REVIEW: https://review.gluster.org/17796 (nfs/nlm: keep track of the call-state and frame for notifications) posted (#2) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 11 Worker Ant 2017-07-29 10:16:10 EDT
REVIEW: https://review.gluster.org/17913 (refcount: typecast function for calling on free) posted (#1) for review on release-3.10 by Niels de Vos (ndevos@redhat.com)
Comment 12 Worker Ant 2017-08-11 06:52:02 EDT
COMMIT: https://review.gluster.org/17792 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit 6c2077618d336edea299d946084ebb4edbbfd47e
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 17 16:43:30 2017 +0200

    nfs: make nfs3_call_state_t refcounted
    
    There is no refcounting done of the nfs3_call_state_t structure, which
    seems to result in use-after-free problems in the NLM part of
    Gluster/NFS. The structure is initialized with two different functions,
    it is easier to have a single place to do this.
    
    The Gluster/NFS part will not use the refcounting, for now. This is
    being added to make the NLM code more stable. nfs3_call_state_wipe()
    will behave as before for Gluster/NFS, but cleanup is triggered through
    the refcounting now. This prevents major changes to the stable part of
    the NFS-server, and makes it possible to improve the NLM component
    separately.
    
    Cherry picked from commit daed52b8ebcac7ef36f11e944f83826f46593867:
    > Change-Id: I2e15bcf12af74e8a46c2727e4a160e9444d29ece
    > BUG: 1467313
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/17696
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Amar Tumballi <amarts@redhat.com>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    
    Change-Id: I2e15bcf12af74e8a46c2727e4a160e9444d29ece
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17792
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 13 Worker Ant 2017-08-11 06:52:11 EDT
COMMIT: https://review.gluster.org/17793 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit 0710984e457786e750847b7119a6e10919973e6a
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 17 16:43:43 2017 +0200

    nfs/nlm: unref fds in nlm_client_free()
    
    When a nlm_clnt is getting free'd, the FDs associated with this client
    should be unref'd as well.
    
    Cherry picked from commit e9a482f94e748ea12e73ddd2e275bad9aa314b4c:
    > Change-Id: Ifa4ea4b7ed45a454413cfc0c820f2516c534a9aa
    > BUG: 1467313
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/17697
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Amar Tumballi <amarts@redhat.com>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    
    Change-Id: Ifa4ea4b7ed45a454413cfc0c820f2516c534a9aa
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17793
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Comment 14 Worker Ant 2017-08-11 06:52:20 EDT
COMMIT: https://review.gluster.org/17794 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit 4a1bdc7581cdf599122b7e20595c5d5662b51ce8
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 17 16:44:38 2017 +0200

    nfs/nlm: handle reconnect for non-NLM4_LOCK requests
    
    When a reply on an NLM-procedure gets stuck, the NFS-client will resend
    the request. This can happen through a re-connect in case the connection
    was terminated (long delay in the reply on the initial request). Once
    that happens, not all NLM-procedures are handled correctly.
    
    Testing this is difficult and time-consuming. There still may be
    problems with certain operations, but this definitely makes it behave
    much better than before.
    
    The problem occured due to a problem in EC, change-id I18a782903ba
    addressed the root cause.
    
    Cherry picked from commit fafe1491ead527ba1024c521013aa90d2ee2b355:
    > Change-Id: I23b385568e27232951fa3fbd7198a0e5d775a8c2
    > BUG: 1467313
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/17698
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    
    Change-Id: I23b385568e27232951fa3fbd7198a0e5d775a8c2
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17794
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 15 Worker Ant 2017-08-11 06:52:50 EDT
REVIEW: https://review.gluster.org/17795 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#3) for review on release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com)
Comment 16 Worker Ant 2017-08-11 07:22:46 EDT
COMMIT: https://review.gluster.org/17913 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit dc413d4126d02be71a014786e17e7b605443e887
Author: Niels de Vos <ndevos@redhat.com>
Date:   Sat Jul 29 14:16:07 2017 +0200

    refcount: typecast function for calling on free
    
    All of the functions called to free the refcounted structure are doing a
    typecast from (void*) to their own type taht is being free'd. This
    really is not needed and the refcount interface is made a little simpler
    without the requirement of typecasting.
    
    With this small improvement in the API, all callers are updated too.
    
    Cherry picked from commit f2ca301bd741e3e3f076cd3f72fcd377bcef2a1a:
    > Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1
    > BUG: 1416889
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/16471
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    
    Backport note: This patch makes it easier to backport changes that use
                   gf_refcount_t. There is no functional change.
    
    Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17913
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 17 Worker Ant 2017-08-11 07:23:02 EDT
REVIEW: https://review.gluster.org/17795 (nfs/nlm: use refcounting for nfs3_call_state_t) posted (#4) for review on release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com)
Comment 18 Worker Ant 2017-08-11 07:48:42 EDT
COMMIT: https://review.gluster.org/17795 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit dd8400bfba8a55d59a393af127de2bdc907f4447
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 17 16:45:22 2017 +0200

    nfs/nlm: use refcounting for nfs3_call_state_t
    
    In order to track down a potential use-after-free of the
    nfs3_call_state_t structure in the NLM component, add reference counting
    where teh structure is used. This should prevent premature free'ing of
    the structure.
    
    Cherry picked from commit 01bfdd4d1759423681d311da33f4ac2346ace445:
    > Change-Id: Ib1f13b0463ab1e012b7b49a623c91f0f3e73e1fb
    > BUG: 1467313
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/17699
    > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    
    Change-Id: Ib1f13b0463ab1e012b7b49a623c91f0f3e73e1fb
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17795
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 19 Worker Ant 2017-08-11 07:49:25 EDT
REVIEW: https://review.gluster.org/17796 (nfs/nlm: keep track of the call-state and frame for notifications) posted (#3) for review on release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com)
Comment 20 Worker Ant 2017-08-11 07:59:33 EDT
COMMIT: https://review.gluster.org/17796 committed in release-3.10 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit bfc241ab7d0fbb2c9202c8f88a2d543cb4605f80
Author: Niels de Vos <ndevos@redhat.com>
Date:   Mon Jul 17 16:45:47 2017 +0200

    nfs/nlm: keep track of the call-state and frame for notifications
    
    When blocking locks are used, a new frame is allocated that is used to
    send the notification to the client once once the lock becomes
    available. In all other cases, the frame that contains the request from
    the client will be used for the reply.
    
    Because there was no way to track the different clients with their
    requests (captured in the call-state), the call-state could be free'd
    before the notification was sent to the client. This caused a
    use-after-free of the call-state and could trigger segfaults of the
    Gluster/NFS server or incorrect replies on (un)lock requests.
    
    By introducing a nlm4_notify_args structure, the call-state and frame
    can be tracked better. This prevents the possibility of segfaulting when
    the call-state is used after being free'd.
    
    Cherry picked from commit b81997264f079983fa02bd5fa2b3715224942b00:
    > BUG: 1467313
    > Change-Id: I285d2bc552f509e5145653b7a50afcff827cd612
    > Signed-off-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-on: https://review.gluster.org/17700
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    
    Change-Id: I285d2bc552f509e5145653b7a50afcff827cd612
    BUG: 1471870
    Signed-off-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: https://review.gluster.org/17796
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
Comment 21 Shyamsundar 2017-08-21 09:40:58 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.5, please open a new bug report.

glusterfs-3.10.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-August/000079.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.