Bug 763804 (GLUSTER-2072)

Summary: NFS server crash in __nfs_rpcsvc_program_actor
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, saurabh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Vikas Gorur 2010-11-09 17:47:30 EST
(gdb) bt
#0  0x00007f5ecbe05c6c in __nfs_rpcsvc_program_actor (req=0x7f5ec9dad024, 
    prg=0x40818f10) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1277
#1  0x00007f5ecbe076e2 in nfs_rpcsvc_request_create (conn=0x64c8b0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:1919
#2  0x00007f5ecbe077f6 in nfs_rpcsvc_handle_rpc_call (conn=0x64c8b0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:1962
#3  0x00007f5ecbe087bf in nfs_rpcsvc_record_update_state (conn=0x64c8b0, 
    dataread=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2461
#4  0x00007f5ecbe08935 in nfs_rpcsvc_conn_data_poll_in (conn=0x64c8b0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:2504
#5  0x00007f5ecbe08dc9 in nfs_rpcsvc_conn_data_handler (fd=11, idx=1, 
    data=0x64c8b0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:2633
#6  0x00007f5ece7edd6a in event_dispatch_epoll_handler (event_pool=0x63a4e0, 
    events=0x64c9b0, i=0) at event.c:812
#7  0x00007f5ece7edf59 in event_dispatch_epoll (event_pool=0x63a4e0)
    at event.c:876
#8  0x00007f5ece7ee2b5 in event_dispatch (event_pool=0x63a4e0) at event.c:984
#9  0x00007f5ecbe0306e in nfs_rpcsvc_stage_proc (arg=0x62cc90)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:64
#10 0x00007f5ecdf593f7 in start_thread () from /lib/libpthread.so.0
#11 0x00007f5ecdcc8bbd in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

(gdb) fr 0
#0  0x00007f5ecbe05c6c in __nfs_rpcsvc_program_actor (req=0x7f5ec9dad024, 
    prg=0x40818f10) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1277
1277                    if (req->prognum != program->prognum)

(gdb) p program
$3 = (rpcsvc_program_t *) 0x0

Before this happened I had manually attempted to deregister RPC programs by doing (rpcinfo -d) because the NFS server wasn't starting.
Comment 1 Shehjar Tikoo 2010-11-09 20:49:41 EST
Is this mainline or 3.1?
Comment 2 Vikas Gorur 2010-11-09 20:55:32 EST
This was observed with 3.1.1 QA5.
Comment 3 Shehjar Tikoo 2010-11-09 21:09:20 EST
Whats the volume options in nfs?

Which program did you unregister? nfs or mountd?

Do you have log in trace?

After unregistering, what it is that you did that caused the crash? I think you may have tried to mount with port numbers specified as mount options. Correct?
Comment 4 Shehjar Tikoo 2010-11-09 21:10:24 EST
(In reply to comment #3)

> After unregistering, what it is that you did that caused the crash? I think you
> may have tried to mount with port numbers specified as mount options. Correct?

Like this?

 mount -o mountport=38467,port=38467 localhost:/posix /mnt
Comment 5 Vikas Gorur 2010-11-09 21:16:38 EST
(In reply to comment #3)
> Whats the volume options in nfs?

Defaults as generated by gluster volume create.
 
> Which program did you unregister? nfs or mountd?

Both.

> Do you have log in trace?

No.
 
> After unregistering, what it is that you did that caused the crash? I think you
> may have tried to mount with port numbers specified as mount options. Correct?

The crash was triggered by just doing showmount -e localhost. Mount command used was:

# mount -t nfs -o mountproto=tcp bal-2:/qa5 /mnt/gfs-nfs/
Comment 6 Vikas Gorur 2010-11-09 21:18:22 EST
> The crash was triggered by just doing showmount -e localhost. Mount command
> used was:
> 
> # mount -t nfs -o mountproto=tcp bal-2:/qa5 /mnt/gfs-nfs/

To clarify, the crash happened as soon as I ran showmount, even before I tried to mount. The command-line above is just the mount command being used in this setup.
Comment 7 Shehjar Tikoo 2010-11-09 22:04:42 EST
Vikas, can you try with the following patch. I think I saw the same crash as
yours and it might be fixed with this patch.

http://dev.gluster.com/~shehjart/0001-rpcsvc-Fix-crash-in-program-search-after-portmap-reg.patch
Comment 8 Anand Avati 2010-11-10 20:07:36 EST
PATCH: http://patches.gluster.com/patch/5671 in master (rpcsvc: Fix crash in program search after portmap registration failure)
Comment 9 Shehjar Tikoo 2010-11-10 20:53:27 EST
Vikas, if you havent been table to test yet, please try with qa6. This patch is needed for a crash, which I believe is the same as you observed. Let me know so I can close this bug. Thanks.
Comment 10 Shehjar Tikoo 2010-11-10 21:40:52 EST
Upgrading to blocker. Though the patch is committed, I'd like Vikas to confirm before closing the bug.
Comment 11 Anand Avati 2010-11-13 07:02:46 EST
PATCH: http://patches.gluster.com/patch/5672 in master (nfsrpc: Change log levels for RPC program search messages)
Comment 12 Saurabh 2011-02-24 03:32:16 EST
Tried to verify this bug using these steps,

mount as mentioned by vikas and then showmount


10.1.12.107:/dist1 on /mnt/nfs-test type nfs (rw,mountproto=tcp,addr=10.1.12.107)



gluster@ubuntu1:/$ showmount -e 10.1.12.107
Export list for 10.1.12.107:
/dist1 *


If this is not enough please let me know about the further testplan.
Comment 13 Saurabh 2011-08-25 02:19:34 EDT
verified on git head,


commit 9848ac8bf7a6854e9d4dee2dcb53621c67b33d6e
Author: Raghavendra Bhat <raghavendrabhat@gluster.com>
Date:   Wed Aug 24 12:49:48 2011 +0530

    features/locks: avoid using reqlock to prevent race
    
    Change-Id: Id8613f9641f748f996062342878070ba8fb27339
    BUG: 2473
    Reviewed-on: http://review.gluster.com/312
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>