Description of problem: nfs server crashed in the nsm_monitor trying to access the NULL client object. This is the backtrace. Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /etc/gluster'. Program terminated with signal 11, Segmentation fault. #0 0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com") at ../../../../../xlators/nfs/server/src/nlm4.c:510 510 clnt_destroy(clnt); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64 (gdb) bt #0 0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com") at ../../../../../xlators/nfs/server/src/nlm4.c:510 #1 0x00007f308f39c6ab in nlm4_establish_callback (csarg=0x7f308d9fd480) at ../../../../../xlators/nfs/server/src/nlm4.c:870 #2 0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003c380e592d in clone () from /lib64/libc.so.6 (gdb) f 0 #0 0x00007f308f39b136 in nsm_monitor (host=0x47be8b0 "gqas007.sbu.lab.eng.bos.redhat.com") at ../../../../../xlators/nfs/server/src/nlm4.c:510 510 clnt_destroy(clnt); (gdb) p clnt $1 = (CLIENT *) 0x0 (gdb) l nsm_monitor 460 STACK_DESTROY (frame->root); 461 return 0; 462 } 463 464 int nsm_monitor(char *host) 465 { 466 CLIENT *clnt = NULL; 467 enum clnt_stat ret; 468 struct mon nsm_mon; 469 struct sm_stat_res res; (gdb) 470 struct timeval tout = { 5, 0 }; 471 int retstat = -1; 472 473 nsm_mon.mon_id.mon_name = gf_strdup(host); 474 nsm_mon.mon_id.my_id.my_name = gf_strdup("localhost"); 475 nsm_mon.mon_id.my_id.my_prog = NLMCBK_PROGRAM; 476 nsm_mon.mon_id.my_id.my_vers = NLMCBK_V1; 477 nsm_mon.mon_id.my_id.my_proc = NLMCBK_SM_NOTIFY; 478 /* nothing to put in the private data */ 479 #define SM_PROG 100024 (gdb) 480 #define SM_VERS 1 481 #define SM_MON 2 482 483 /* create a connection to nsm on the localhost */ 484 clnt = clnt_create("localhost", SM_PROG, SM_VERS, "tcp"); 485 if(!clnt) 486 { 487 gf_log (GF_NLM, GF_LOG_ERROR, "Clnt_create()"); 488 goto out; 489 } (gdb) 490 491 ret = clnt_call(clnt, SM_MON, 492 (xdrproc_t) xdr_mon, (caddr_t) & nsm_mon, 493 (xdrproc_t) xdr_sm_stat_res, (caddr_t) & res, tout); 494 if(ret != RPC_SUCCESS) 495 { 496 gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s", 497 clnt_sperrno(ret)); 498 goto out; 499 } (gdb) 500 if(res.res_stat != STAT_SUCC) 501 { 502 gf_log (GF_NLM, GF_LOG_ERROR, "clnt_call(): %s", 503 clnt_sperrno(ret)); 504 goto out; 505 } 506 retstat = 0; 507 out: 508 GF_FREE(nsm_mon.mon_id.mon_name); 509 GF_FREE(nsm_mon.mon_id.my_id.my_name); (gdb) 510 clnt_destroy(clnt); 511 return retstat; 512 } In the above function we create the client object by clnt_create. If the client object cannot be created, then we go to out where we try to destroy the client object without checking if it is NULL or not. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: nfs server crashed. Expected results: Additional info: [2012-03-29 02:55:19.380171] I [afr-common.c:1329:afr_launch_self_heal] 0-mirror-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver, reason: lookup detected pending operations [2012-03-29 02:55:19.381447] I [afr-self-heal-common.c:1823:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-mirror-replicate-0: Non blocking entrylks failed. [2012-03-29 02:55:19.381476] I [afr-self-heal-common.c:917:afr_sh_missing_entries_done] 0-mirror-replicate-0: split brain found, aborting selfheal of /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver [2012-03-29 02:55:19.381494] E [afr-self-heal-common.c:2038:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on /linux-2.6.31.1/arch/x86/kernel/apic/.tmp_apic.ver [2012-03-29 02:57:32.730902] E [nlm4.c:487:nsm_monitor] 0-nfs-NLM: Clnt_create() pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-03-29 02:57:32 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0qa32 /lib64/libc.so.6[0x3c38032900] /usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nsm_monitor+0x229)[0x7f308f39b136] /usr/local/lib/glusterfs/3.3.0qa32/xlator/nfs/server.so(nlm4_establish_callback+0x8a)[0x7f308f39c6ab] /lib64/libpthread.so.0[0x3c388077f1] /lib64/libc.so.6(clone+0x6d)[0x3c380e592d] gluster volume info Volume Name: mirror Type: Replicate Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.16.156.9:/export/mirror Brick2: 10.16.156.12:/export/mirror Brick3: 10.16.156.15:/export/mirror Options Reconfigured: performance.flush-behind: off performance.stat-prefetch: off performance.client-io-threads: on
*** This bug has been marked as a duplicate of bug 808341 ***
CHANGE: http://review.gluster.com/3061 (nlm: print the reason of failure when clnt_create fails to create the client object) merged in master by Vijay Bellur (vijay)
*** Bug 797060 has been marked as a duplicate of this bug. ***