Bug 808390 (nfs)

Summary: [glusterfs-3.3.0qa32]: nfs server crashed trying to access NULL client object
Product: [Community] GlusterFS Reporter: Amit Chauhan <amit.gluster>
Component: nfsAssignee: Vinayaga Raman <vraman>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: pre-releaseCC: amit.gluster, gluster-bugs, rwheeler, saujain, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: nfs
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-02 04:58:03 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Amit Chauhan 2012-03-30 06:24:14 EDT
Description of problem:
nfs server crashed in the nsm_monitor trying to access the NULL client object.
This is the backtrace.

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id
gluster/nfs -p /etc/gluster'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0
"gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
510             clnt_destroy(clnt);
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0
"gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
#1  0x00007f308f39c6ab in nlm4_establish_callback (csarg=0x7f308d9fd480) at
../../../../../xlators/nfs/server/src/nlm4.c:870
#2  0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003c380e592d in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0
"gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
510             clnt_destroy(clnt);
(gdb) p clnt
$1 = (CLIENT *) 0x0
(gdb) l nsm_monitor
460             STACK_DESTROY (frame->root);
461             return 0;
462     }
463
464     int nsm_monitor(char *host)
465     {
466             CLIENT *clnt = NULL;
467             enum clnt_stat ret;
468             struct mon nsm_mon;
469             struct sm_stat_res res;
(gdb) 
470             struct timeval tout = { 5, 0 };
471             int retstat = -1;
472
473             nsm_mon.mon_id.mon_name = gf_strdup(host);
474             nsm_mon.mon_id.my_id.my_name = gf_strdup("localhost");
475             nsm_mon.mon_id.my_id.my_prog = NLMCBK_PROGRAM;
476             nsm_mon.mon_id.my_id.my_vers = NLMCBK_V1;
477             nsm_mon.mon_id.my_id.my_proc = NLMCBK_SM_NOTIFY;
478             /* nothing to put in the private data */
479     #define SM_PROG 100024
(gdb) 480     #define SM_VERS 1
481     #define SM_MON 2
482
483             /* create a connection to nsm on the localhost */
484             clnt = clnt_create("localhost", SM_PROG, SM_VERS, "tcp");
485             if(!clnt)
486             {
487                     gf_log (GF_NLM, GF_LOG_ERROR, "Clnt_create()");
488                     goto out;
489             }
(gdb) 
490
491             ret = clnt_call(clnt, SM_MON,
492                             (xdrproc_t) xdr_mon, (caddr_t) & nsm_mon,
493                             (xdrproc_t) xdr_sm_stat_res, (caddr_t) & res,
tout);
494             if(ret != RPC_SUCCESS)
495             {
496                     gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s",
497                             clnt_sperrno(ret));
498                     goto out;
499             }
(gdb) 
500             if(res.res_stat != STAT_SUCC)
501             {
502                     gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s",
503                             clnt_sperrno(ret));
504                     goto out;
505             }
506             retstat = 0;
507     out:
508             GF_FREE(nsm_mon.mon_id.mon_name);
509             GF_FREE(nsm_mon.mon_id.my_id.my_name);
(gdb) 
510             clnt_destroy(clnt);
511             return retstat;
512     }


In the above function we create the client object by clnt_create. If the client
object cannot be created, then we go to out where we try to destroy the client
object without checking if it is NULL or not.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

nfs server crashed.

Expected results:


Additional info:


[2012-03-29 02:55:19.380171] I [afr-common.c:1329:afr_launch_self_heal]
0-mirror-replicate-0: background  meta-data data entry missing-entry gfid
self-heal triggered. path: /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver,
reason: lookup detected pending operations
[2012-03-29 02:55:19.381447] I
[afr-self-heal-common.c:1823:afr_sh_post_nb_entrylk_conflicting_sh_cbk]
0-mirror-replicate-0: Non blocking entrylks failed.
[2012-03-29 02:55:19.381476] I
[afr-self-heal-common.c:917:afr_sh_missing_entries_done] 0-mirror-replicate-0:
split brain found, aborting selfheal of
/linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver
[2012-03-29 02:55:19.381494] E
[afr-self-heal-common.c:2038:afr_self_heal_completion_cbk]
0-mirror-replicate-0: background  meta-data data entry missing-entry gfid
self-heal failed on /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver
[2012-03-29 02:57:32.730902] E [nlm4.c:487:nsm_monitor] 0-nfs-NLM:
Clnt_create()
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-29 02:57:32
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa32
/lib64/libc.so.6[0x3c38032900]
/usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nsm_monitor+0x229)[0x7f308f39b136]
/usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nlm4_establish_callback+0x8a)[0x7f308f39c6ab]
/lib64/libpthread.so.0[0x3c388077f1]
/lib64/libc.so.6(clone+0x6d)[0x3c380e592d]


gluster volume info

Volume Name: mirror
Type: Replicate
Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.156.9:/export/mirror
Brick2: 10.16.156.12:/export/mirror
Brick3: 10.16.156.15:/export/mirror
Options Reconfigured:
performance.flush-behind: off
performance.stat-prefetch: off
performance.client-io-threads: on
Comment 1 Vijay Bellur 2012-04-02 04:58:03 EDT

*** This bug has been marked as a duplicate of bug 808341 ***
Comment 2 Anand Avati 2012-04-02 05:42:10 EDT
CHANGE: http://review.gluster.com/3061 (nlm: print the reason of failure when clnt_create fails to create the client object) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 3 Vijay Bellur 2012-04-02 06:08:47 EDT
*** Bug 797060 has been marked as a duplicate of this bug. ***