Bug 1450380 - GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
Summary: GNFS crashed while taking lock on a file from 2 different clients having same...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.8
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On: 1381970 glusterfs-3.8.13
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-12 11:35 UTC by Niels de Vos
Modified: 2017-06-29 09:54 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8.13
Clone Of:
Environment:
Last Closed: 2017-06-29 09:54:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Niels de Vos 2017-05-12 11:35:30 UTC
Description of problem:
Mount a volume from 2 different server to 2 different clients.
Create a file.
Take lock from 2 different clients on the same file.
In that case GNFS server got crashed


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it
2.Mount the volume from two different servers to 2 different clients
3.Create 512 Bytes of file from 1 client on mount point
4.Take lock from client 1.Lock is acquired
5.Try taking lock from client 2.Lock is blocked (as already being taken by
client 1)
6.Release lock from client1.Take lock from client2
7.Again try taking lock from client 1.

Actual results:
Lock is being granted to client1.Which should not
Issue is reported in bug-https://bugzilla.redhat.com/show_bug.cgi?id=1411338
GNFS server got crashed

Expected results:
GNFS should handle taking lock from 2 different client on same volume mounted from 2 different servers

Additional info:

--- Additional comment from Niels de Vos on 2017-01-10 13:30 CET ---

While working on the attached test-script I managed to get a coredump too. This happened while manually executing the commands I wanted to put in the script. Now the script is running and has already with 100+ iterations and still no crashes...

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
164             movdqu  (%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
#1  0x00007fafa65986f2 in nlm_set_rpc_clnt (rpc_clnt=0x7faf8c005200, caller_name=0x0) at nlm4.c:345
#2  0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200, mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
#3  0x00007fafb48a0a84 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7faf8c005230, event=<optimized out>, data=0x7faf8c00cd70) at rpc-clnt.c:994
#4  0x00007fafb489c973 in rpc_transport_notify (this=this@entry=0x7faf8c00cd70, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7faf8c00cd70) at rpc-transport.c:541
#5  0x00007fafa9391c67 in socket_connect_finish (this=0x7faf8c00cd70) at socket.c:2343
#6  0x00007fafa9396315 in socket_event_handler (fd=<optimized out>, idx=10, data=0x7faf8c00cd70, poll_in=0, poll_out=4, poll_err=0) at socket.c:2386
#7  0x00007fafb4b2ece0 in event_dispatch_epoll_handler (event=0x7faf9e568e80, event_pool=0x7fafb545e6e0) at event-epoll.c:571
#8  event_dispatch_epoll_worker (data=0x7fafa0033d50) at event-epoll.c:674
#9  0x00007fafb3937df5 in start_thread (arg=0x7faf9e569700) at pthread_create.c:308
#10 0x00007fafb327e1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

(gdb) f 2
#2  0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200, mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
930                     ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
(gdb) l
925             cs = mydata;
926             caller_name = cs->args.nlm4_lockargs.alock.caller_name;
927
928             switch (fn) {
929             case RPC_CLNT_CONNECT:
930                     ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
931                     if (ret == -1) {
932                             gf_msg (GF_NLM, GF_LOG_ERROR, 0,
933                                     NFS_MSG_RPC_CLNT_ERROR, "Failed to set "
934                                     "rpc clnt");
(gdb) p cs->args.nlm4_lockargs                                                
$1 = {
  cookie = {
    nlm4_netobj_len = 0, 
    nlm4_netobj_val = 0x0
  }, 
  block = 0, 
  exclusive = 0, 
  alock = {
    caller_name = 0x0, 
    fh = {
      nlm4_netobj_len = 0, 
      nlm4_netobj_val = 0x0
    }, 
    oh = {
      nlm4_netobj_len = 0, 
      nlm4_netobj_val = 0x0
    }, 
    svid = 0, 
    l_offset = 0, 
    l_len = 0
  }, 
  reclaim = 0, 
  state = 0
}


It seems that the nlm4_lockargs are empty... No idea how that can happen, will investigate a little more.

Comment 1 Worker Ant 2017-05-12 12:25:59 UTC
REVIEW: https://review.gluster.org/17274 (nfs/nlm: unref rpc-client after nlm4svc_send_granted()) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)

Comment 2 Worker Ant 2017-05-15 08:00:15 UTC
REVIEW: https://review.gluster.org/17276 (nfs/nlm: log the caller_name if nlm_client_t can be found) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)

Comment 3 Worker Ant 2017-05-15 08:01:20 UTC
REVIEW: https://review.gluster.org/17275 (nfs/nlm: ignore notify when there is no matching rpc request) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)

Comment 4 Worker Ant 2017-05-15 08:01:55 UTC
REVIEW: https://review.gluster.org/17278 (nfs/nlm: remove lock request from the list after cancel) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)

Comment 5 Worker Ant 2017-05-15 08:02:18 UTC
REVIEW: https://review.gluster.org/17277 (nfs/nlm: free the nlm_client upon RPC_DISCONNECT) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)

Comment 6 Worker Ant 2017-06-16 09:56:43 UTC
COMMIT: https://review.gluster.org/17274 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit f11ef6869d7fbe6ba91297ed814593b909da9c88
Author: Niels de Vos <ndevos>
Date:   Fri Jan 13 16:05:02 2017 +0100

    nfs/nlm: unref rpc-client after nlm4svc_send_granted()
    
    nlm4svc_send_granted() uses the rpc_clnt by getting it from the
    call-state structure. It is safer to unref the rpc_clnt after the
    function is done with it.
    
    Cherry picked from commit 52c28c0c04722a9ffaa7c39c49ffebdf0a5c75e1:
    > Change-Id: I7cb7c4297801463d21259c58b50d7df7c57aec5e
    > BUG: 1381970
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17187
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: soumya k <skoduri>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    Change-Id: I7cb7c4297801463d21259c58b50d7df7c57aec5e
    BUG: 1450380
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17274
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 7 Worker Ant 2017-06-16 09:56:56 UTC
COMMIT: https://review.gluster.org/17275 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 9701e19fb6103b5c3d5ad38a1995c3f8f184e3a6
Author: Niels de Vos <ndevos>
Date:   Fri Jan 13 14:02:45 2017 +0100

    nfs/nlm: ignore notify when there is no matching rpc request
    
    In certain (unclear) occasions it seems to happen that there are
    notifications sent to the Gluster/NFS NLM service, but no call-state can
    be found. Instead of segfaulting, log an error but keep on running.
    
    Cherry picked from commit e997d752ba08f80b1b00d2c0035874befafe5200:
    > Change-Id: I0f186e56e46a86ca40314d230c1cc7719c61f0b5
    > BUG: 1381970
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17185
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: soumya k <skoduri>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    Change-Id: I0f186e56e46a86ca40314d230c1cc7719c61f0b5
    BUG: 1450380
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17275
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 8 Worker Ant 2017-06-16 09:57:13 UTC
COMMIT: https://review.gluster.org/17276 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 6aaa4cd9a0eb75d791a47d88dd43fbea90285245
Author: Niels de Vos <ndevos>
Date:   Fri Jan 13 14:46:17 2017 +0100

    nfs/nlm: log the caller_name if nlm_client_t can be found
    
    In order to help tracking possible misbehaving clients down, log the
    'caller_name' (hostname of the NFS client) that does not have a matching
    nlm_client_t structure.
    
    Cherry picked from commit 9bfb74a39954a7e63bfd762c816efc7e64b9df65:
    > Change-Id: Ib514a78d1809719a3d0274acc31ee632727d746d
    > BUG: 1381970
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17186
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: soumya k <skoduri>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    Change-Id: Ib514a78d1809719a3d0274acc31ee632727d746d
    BUG: 1450380
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17276
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 9 Worker Ant 2017-06-16 09:57:29 UTC
COMMIT: https://review.gluster.org/17277 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit cfa46e1774178d0af7cabc010d397d62fc0501a6
Author: Niels de Vos <ndevos>
Date:   Fri Jan 20 14:15:31 2017 +0100

    nfs/nlm: free the nlm_client upon RPC_DISCONNECT
    
    When an NLM client disconnects, it should be removed from the list and
    free'd.
    
    > Cherry picked from commit 6897ba5c51b29c05b270c447adb1a34cb8e61911:
    > Change-Id: Ib427c896bfcdc547a3aee42a652578ffd076e2ad
    > BUG: 1381970
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17189
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    
    Change-Id: Ib427c896bfcdc547a3aee42a652578ffd076e2ad
    BUG: 1450380
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17277
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 10 Worker Ant 2017-06-16 09:57:51 UTC
COMMIT: https://review.gluster.org/17278 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 96ff4eab53814c483d8cf7b2dd4026b0f6576436
Author: Niels de Vos <ndevos>
Date:   Fri Jan 13 13:02:23 2017 +0100

    nfs/nlm: remove lock request from the list after cancel
    
    Once an NLM client cancels a lock request, it should be removed from the
    list. The list can also be cleaned of unneeded entries once the client
    does not have any outstanding lock/share requests/granted.
    
    Cherry picked from commit 71cb7f3eb4fb706aab7f83906592942a2ff2e924:
    > Change-Id: I2f2b666b627dcb52cddc6d5b95856e420b2b2e26
    > BUG: 1381970
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17188
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    
    Change-Id: I2f2b666b627dcb52cddc6d5b95856e420b2b2e26
    BUG: 1450380
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17278
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 11 Niels de Vos 2017-06-29 09:54:50 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.13, please open a new bug report.

glusterfs-3.8.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-June/000075.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.