Description of problem: Mount a volume from 2 different server to 2 different clients. Create a file. Take lock from 2 different clients on the same file. In that case GNFS server got crashed Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it 2.Mount the volume from two different servers to 2 different clients 3.Create 512 Bytes of file from 1 client on mount point 4.Take lock from client 1.Lock is acquired 5.Try taking lock from client 2.Lock is blocked (as already being taken by client 1) 6.Release lock from client1.Take lock from client2 7.Again try taking lock from client 1. Actual results: Lock is being granted to client1.Which should not Issue is reported in bug-https://bugzilla.redhat.com/show_bug.cgi?id=1411338 GNFS server got crashed Expected results: GNFS should handle taking lock from 2 different client on same volume mounted from 2 different servers Additional info: --- Additional comment from Niels de Vos on 2017-01-10 13:30 CET --- While working on the attached test-script I managed to get a coredump too. This happened while manually executing the commands I wanted to put in the script. Now the script is running and has already with 100+ iterations and still no crashes... Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'. Program terminated with signal 11, Segmentation fault. #0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 164 movdqu (%rdi), %xmm1 (gdb) bt #0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 #1 0x00007fafa65986f2 in nlm_set_rpc_clnt (rpc_clnt=0x7faf8c005200, caller_name=0x0) at nlm4.c:345 #2 0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200, mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930 #3 0x00007fafb48a0a84 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7faf8c005230, event=<optimized out>, data=0x7faf8c00cd70) at rpc-clnt.c:994 #4 0x00007fafb489c973 in rpc_transport_notify (this=this@entry=0x7faf8c00cd70, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7faf8c00cd70) at rpc-transport.c:541 #5 0x00007fafa9391c67 in socket_connect_finish (this=0x7faf8c00cd70) at socket.c:2343 #6 0x00007fafa9396315 in socket_event_handler (fd=<optimized out>, idx=10, data=0x7faf8c00cd70, poll_in=0, poll_out=4, poll_err=0) at socket.c:2386 #7 0x00007fafb4b2ece0 in event_dispatch_epoll_handler (event=0x7faf9e568e80, event_pool=0x7fafb545e6e0) at event-epoll.c:571 #8 event_dispatch_epoll_worker (data=0x7fafa0033d50) at event-epoll.c:674 #9 0x00007fafb3937df5 in start_thread (arg=0x7faf9e569700) at pthread_create.c:308 #10 0x00007fafb327e1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) f 2 #2 0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200, mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930 930 ret = nlm_set_rpc_clnt (rpc_clnt, caller_name); (gdb) l 925 cs = mydata; 926 caller_name = cs->args.nlm4_lockargs.alock.caller_name; 927 928 switch (fn) { 929 case RPC_CLNT_CONNECT: 930 ret = nlm_set_rpc_clnt (rpc_clnt, caller_name); 931 if (ret == -1) { 932 gf_msg (GF_NLM, GF_LOG_ERROR, 0, 933 NFS_MSG_RPC_CLNT_ERROR, "Failed to set " 934 "rpc clnt"); (gdb) p cs->args.nlm4_lockargs $1 = { cookie = { nlm4_netobj_len = 0, nlm4_netobj_val = 0x0 }, block = 0, exclusive = 0, alock = { caller_name = 0x0, fh = { nlm4_netobj_len = 0, nlm4_netobj_val = 0x0 }, oh = { nlm4_netobj_len = 0, nlm4_netobj_val = 0x0 }, svid = 0, l_offset = 0, l_len = 0 }, reclaim = 0, state = 0 } It seems that the nlm4_lockargs are empty... No idea how that can happen, will investigate a little more.
REVIEW: https://review.gluster.org/17274 (nfs/nlm: unref rpc-client after nlm4svc_send_granted()) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)
REVIEW: https://review.gluster.org/17276 (nfs/nlm: log the caller_name if nlm_client_t can be found) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)
REVIEW: https://review.gluster.org/17275 (nfs/nlm: ignore notify when there is no matching rpc request) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)
REVIEW: https://review.gluster.org/17278 (nfs/nlm: remove lock request from the list after cancel) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)
REVIEW: https://review.gluster.org/17277 (nfs/nlm: free the nlm_client upon RPC_DISCONNECT) posted (#2) for review on release-3.8 by Niels de Vos (ndevos)
COMMIT: https://review.gluster.org/17274 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) ------ commit f11ef6869d7fbe6ba91297ed814593b909da9c88 Author: Niels de Vos <ndevos> Date: Fri Jan 13 16:05:02 2017 +0100 nfs/nlm: unref rpc-client after nlm4svc_send_granted() nlm4svc_send_granted() uses the rpc_clnt by getting it from the call-state structure. It is safer to unref the rpc_clnt after the function is done with it. Cherry picked from commit 52c28c0c04722a9ffaa7c39c49ffebdf0a5c75e1: > Change-Id: I7cb7c4297801463d21259c58b50d7df7c57aec5e > BUG: 1381970 > Signed-off-by: Niels de Vos <ndevos> > Reviewed-on: https://review.gluster.org/17187 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: soumya k <skoduri> > Reviewed-by: Jeff Darcy <jeff.us> Change-Id: I7cb7c4297801463d21259c58b50d7df7c57aec5e BUG: 1450380 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: https://review.gluster.org/17274 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle>
COMMIT: https://review.gluster.org/17275 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) ------ commit 9701e19fb6103b5c3d5ad38a1995c3f8f184e3a6 Author: Niels de Vos <ndevos> Date: Fri Jan 13 14:02:45 2017 +0100 nfs/nlm: ignore notify when there is no matching rpc request In certain (unclear) occasions it seems to happen that there are notifications sent to the Gluster/NFS NLM service, but no call-state can be found. Instead of segfaulting, log an error but keep on running. Cherry picked from commit e997d752ba08f80b1b00d2c0035874befafe5200: > Change-Id: I0f186e56e46a86ca40314d230c1cc7719c61f0b5 > BUG: 1381970 > Signed-off-by: Niels de Vos <ndevos> > Reviewed-on: https://review.gluster.org/17185 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: soumya k <skoduri> > Reviewed-by: jiffin tony Thottan <jthottan> > Reviewed-by: Jeff Darcy <jeff.us> Change-Id: I0f186e56e46a86ca40314d230c1cc7719c61f0b5 BUG: 1450380 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: https://review.gluster.org/17275 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle>
COMMIT: https://review.gluster.org/17276 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) ------ commit 6aaa4cd9a0eb75d791a47d88dd43fbea90285245 Author: Niels de Vos <ndevos> Date: Fri Jan 13 14:46:17 2017 +0100 nfs/nlm: log the caller_name if nlm_client_t can be found In order to help tracking possible misbehaving clients down, log the 'caller_name' (hostname of the NFS client) that does not have a matching nlm_client_t structure. Cherry picked from commit 9bfb74a39954a7e63bfd762c816efc7e64b9df65: > Change-Id: Ib514a78d1809719a3d0274acc31ee632727d746d > BUG: 1381970 > Signed-off-by: Niels de Vos <ndevos> > Reviewed-on: https://review.gluster.org/17186 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: soumya k <skoduri> > Reviewed-by: Jeff Darcy <jeff.us> Change-Id: Ib514a78d1809719a3d0274acc31ee632727d746d BUG: 1450380 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: https://review.gluster.org/17276 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle> CentOS-regression: Gluster Build System <jenkins.org>
COMMIT: https://review.gluster.org/17277 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) ------ commit cfa46e1774178d0af7cabc010d397d62fc0501a6 Author: Niels de Vos <ndevos> Date: Fri Jan 20 14:15:31 2017 +0100 nfs/nlm: free the nlm_client upon RPC_DISCONNECT When an NLM client disconnects, it should be removed from the list and free'd. > Cherry picked from commit 6897ba5c51b29c05b270c447adb1a34cb8e61911: > Change-Id: Ib427c896bfcdc547a3aee42a652578ffd076e2ad > BUG: 1381970 > Signed-off-by: Niels de Vos <ndevos> > Reviewed-on: https://review.gluster.org/17189 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Kaleb KEITHLEY <kkeithle> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: jiffin tony Thottan <jthottan> Change-Id: Ib427c896bfcdc547a3aee42a652578ffd076e2ad BUG: 1450380 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: https://review.gluster.org/17277 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle>
COMMIT: https://review.gluster.org/17278 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) ------ commit 96ff4eab53814c483d8cf7b2dd4026b0f6576436 Author: Niels de Vos <ndevos> Date: Fri Jan 13 13:02:23 2017 +0100 nfs/nlm: remove lock request from the list after cancel Once an NLM client cancels a lock request, it should be removed from the list. The list can also be cleaned of unneeded entries once the client does not have any outstanding lock/share requests/granted. Cherry picked from commit 71cb7f3eb4fb706aab7f83906592942a2ff2e924: > Change-Id: I2f2b666b627dcb52cddc6d5b95856e420b2b2e26 > BUG: 1381970 > Signed-off-by: Niels de Vos <ndevos> > Reviewed-on: https://review.gluster.org/17188 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Kaleb KEITHLEY <kkeithle> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: jiffin tony Thottan <jthottan> Change-Id: I2f2b666b627dcb52cddc6d5b95856e420b2b2e26 BUG: 1450380 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: https://review.gluster.org/17278 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.13, please open a new bug report. glusterfs-3.8.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2017-June/000075.html [2] https://www.gluster.org/pipermail/gluster-users/