Bug 808341 - [glusterfs-3.3.0qa32]: nfs server crashed trying to access NULL client object
Summary: [glusterfs-3.3.0qa32]: nfs server crashed trying to access NULL client object
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
: nfs (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-30 07:57 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:12:58 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa40
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-30 07:57:07 UTC
Description of problem:
nfs server crashed in the nsm_monitor trying to access the NULL client object.
This is the backtrace.

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /etc/gluster'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
510             clnt_destroy(clnt);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
#1  0x00007f308f39c6ab in nlm4_establish_callback (csarg=0x7f308d9fd480) at ../../../../../xlators/nfs/server/src/nlm4.c:870
#2  0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003c380e592d in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com")
    at ../../../../../xlators/nfs/server/src/nlm4.c:510
510             clnt_destroy(clnt);
(gdb) p clnt
$1 = (CLIENT *) 0x0
(gdb) l nsm_monitor
460             STACK_DESTROY (frame->root);
461             return 0;
462     }
463
464     int nsm_monitor(char *host)
465     {
466             CLIENT *clnt = NULL;
467             enum clnt_stat ret;
468             struct mon nsm_mon;
469             struct sm_stat_res res;
(gdb) 
470             struct timeval tout = { 5, 0 };
471             int retstat = -1;
472
473             nsm_mon.mon_id.mon_name = gf_strdup(host);
474             nsm_mon.mon_id.my_id.my_name = gf_strdup("localhost");
475             nsm_mon.mon_id.my_id.my_prog = NLMCBK_PROGRAM;
476             nsm_mon.mon_id.my_id.my_vers = NLMCBK_V1;
477             nsm_mon.mon_id.my_id.my_proc = NLMCBK_SM_NOTIFY;
478             /* nothing to put in the private data */
479     #define SM_PROG 100024
(gdb) 480     #define SM_VERS 1
481     #define SM_MON 2
482
483             /* create a connection to nsm on the localhost */
484             clnt = clnt_create("localhost", SM_PROG, SM_VERS, "tcp");
485             if(!clnt)
486             {
487                     gf_log (GF_NLM, GF_LOG_ERROR, "Clnt_create()");
488                     goto out;
489             }
(gdb) 
490
491             ret = clnt_call(clnt, SM_MON,
492                             (xdrproc_t) xdr_mon, (caddr_t) & nsm_mon,
493                             (xdrproc_t) xdr_sm_stat_res, (caddr_t) & res, tout);
494             if(ret != RPC_SUCCESS)
495             {
496                     gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s",
497                             clnt_sperrno(ret));
498                     goto out;
499             }
(gdb) 
500             if(res.res_stat != STAT_SUCC)
501             {
502                     gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s",
503                             clnt_sperrno(ret));
504                     goto out;
505             }
506             retstat = 0;
507     out:
508             GF_FREE(nsm_mon.mon_id.mon_name);
509             GF_FREE(nsm_mon.mon_id.my_id.my_name);
(gdb) 
510             clnt_destroy(clnt);
511             return retstat;
512     }


In the above function we create the client object by clnt_create. If the client object cannot be created, then we go to out where we try to destroy the client object without checking if it is NULL or not.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

nfs server crashed.

Expected results:


Additional info:


[2012-03-29 02:55:19.380171] I [afr-common.c:1329:afr_launch_self_heal] 0-mirror-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver, reason: lookup detected pending operations
[2012-03-29 02:55:19.381447] I [afr-self-heal-common.c:1823:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-mirror-replicate-0: Non blocking entrylks failed.
[2012-03-29 02:55:19.381476] I [afr-self-heal-common.c:917:afr_sh_missing_entries_done] 0-mirror-replicate-0: split brain found, aborting selfheal of /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver
[2012-03-29 02:55:19.381494] E [afr-self-heal-common.c:2038:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver
[2012-03-29 02:57:32.730902] E [nlm4.c:487:nsm_monitor] 0-nfs-NLM: Clnt_create()
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-29 02:57:32
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa32
/lib64/libc.so.6[0x3c38032900]
/usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nsm_monitor+0x229)[0x7f308f39b136]
/usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nlm4_establish_callback+0x8a)[0x7f308f39c6ab]
/lib64/libpthread.so.0[0x3c388077f1]
/lib64/libc.so.6(clone+0x6d)[0x3c380e592d]


gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.156.9:/export/mirror
Brick2: 10.16.156.12:/export/mirror
Brick3: 10.16.156.15:/export/mirror
Options Reconfigured:
performance.flush-behind: off
performance.stat-prefetch: off
performance.client-io-threads: on

Comment 1 Anand Avati 2012-03-30 22:50:50 UTC
CHANGE: http://review.gluster.com/3046 (nlm: do not destroy the NULL client object) merged in master by Anand Avati (avati)

Comment 2 Vijay Bellur 2012-04-02 08:58:04 UTC
*** Bug 808390 has been marked as a duplicate of this bug. ***

Comment 3 Raghavendra Bhat 2012-05-09 09:27:57 UTC
Checked with glusterfs-3.3.0qa40 and the crash is not seen since we are checking the client object to be NULL before accessing it.


Note You need to log in before you can comment on or make changes to this bug.