Bug 772542

Summary: nfs server crashed due to stack overflow
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: replicateAssignee: Vinayaga Raman <vraman>
Status: CLOSED UPSTREAM QA Contact: Raghavendra Bhat <rabhat>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.5CC: gluster-bugs, rwheeler, vbellur, vinaraya
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-23 09:42:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Raghavendra Bhat 2012-01-09 06:38:45 UTC
Description of problem:
2x2 distributed replicate setup. Quota and profile on, with quota limit set to 50GB. 4 fuse clients and 1 nfs client. 4 fuse clients were running sanity tests in a loop. and the nfs client was doing find | xargs stat on the mount point in a while true loop. The nfs server from which the nfs client was mounted crashed due to stack overflow.

This is the backtrace of the core.

Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003972643d23 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) info thr
  4 Thread 0x7f1fbce0d700 (LWP 9217)  0x000000397271176d in xdr_callmsg_internal () from /lib64/libc.so.6
  3 Thread 0x7f1fba598700 (LWP 9218)  0x00000039726aab9d in nanosleep () from /lib64/libc.so.6
  2 Thread 0x7f1fbee48700 (LWP 9216)  0x0000003972e0f245 in sigwait () from /lib64/libpthread.so.0
* 1 Thread 0x7f1fbfc48700 (LWP 9215)  0x0000003972643d23 in vfprintf () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003972643d23 in vfprintf () from /lib64/libc.so.6
#1  0x000000397266e9c2 in vsnprintf () from /lib64/libc.so.6
#2  0x000000397264e9d3 in snprintf () from /lib64/libc.so.6
#3  0x00007f1fc00a9697 in _gf_log (domain=0x13e8050 "mirror-replicate-1", 
    file=0x7f1fbdff7838 "../../../../../xlators/cluster/afr/src/afr.h", function=0x7f1fbdff7c83 "AFR_LOCAL_INIT", line=893, 
    level=GF_LOG_INFO, fmt=0x7f1fbdff7820 "no subvolumes up") at ../../../libglusterfs/src/logging.c:528
#4  0x00007f1fbdf9d24d in AFR_LOCAL_INIT (local=0xd84278e0, priv=0x1439680) at ../../../../../xlators/cluster/afr/src/afr.h:893
#5  0x00007f1fbdfa0423 in afr_do_readdir (frame=0xbc86ec20, this=0x13e8870, fd=0x7f1fbb677194, size=65536, offset=0, whichop=40)
    at ../../../../../xlators/cluster/afr/src/afr-dir-read.c:662
#6  0x00007f1fbdfa0e41 in afr_readdirp (frame=0xbc86ec20, this=0x13e8870, fd=0x7f1fbb677194, size=65536, offset=0)
    at ../../../../../xlators/cluster/afr/src/afr-dir-read.c:748
#7  0x00007f1fbdd783b7 in dht_readdirp_cbk (frame=0xc9858b20, cookie=0xbb1eaa20, this=0x13e94b0, op_ret=-1, op_errno=107, 
    orig_entries=0x7fff89424e50) at ../../../../../xlators/cluster/dht/src/dht-common.c:3125
#8  0x00007f1fbdfa00fb in afr_readdirp_cbk (frame=0xbb1eaa20, cookie=0x1, this=0x13e7610, op_ret=-1, op_errno=107, entries=0x7fff89424e50)
    at ../../../../../xlators/cluster/afr/src/afr-dir-read.c:633
#9  0x00007f1fbe227b45 in client3_1_readdirp_cbk (req=0x7f1fbac0fde0, iov=0x0, count=0, myframe=0xd07e8c00)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1939
#10 0x00007f1fbfe7f8ea in rpc_clnt_submit (rpc=0x143e160, prog=0x7f1fbe445260, procnum=40, cbkfn=0x7f1fbe22784f <client3_1_readdirp_cbk>, 
    proghdr=0x7fff89425110, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=0x33b30810, frame=0xd07e8c00, rsphdr=0x7fff894251d0, 
    rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:1450
#11 0x00007f1fbe214c02 in client_submit_request (this=0x13e49e0, req=0x7fff894252e0, frame=0xd07e8c00, prog=0x7f1fbe445260, procnum=40, 
    cbk=0x7f1fbe22784f <client3_1_readdirp_cbk>, iobref=0x0, sfunc=0x7f1fbfc604a9 <xdr_from_readdirp_req>, rsphdr=0x7fff894251d0, 
    rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at ../../../../../xlators/protocol/client/src/client.c:124
#12 0x00007f1fbe234a93 in client3_1_readdirp (frame=0xd07e8c00, this=0x13e49e0, data=0x7fff894253d0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:5305
#13 0x00007f1fbe21c057 in client_readdirp (frame=0xd07e8c00, this=0x13e49e0, fd=0x7f1fbb677194, size=65536, off=0)
    at ../../../../../xlators/protocol/client/src/client.c:1695
#14 0x00007f1fbdfa0bdd in afr_do_readdir (frame=0xbb1eaa20, this=0x13e7610, fd=0x7f1fbb677194, size=65536, offset=0, whichop=40)
    at ../../../../../xlators/cluster/afr/src/afr-dir-read.c:720
#15 0x00007f1fbdfa0e41 in afr_readdirp (frame=0xbb1eaa20, this=0x13e7610, fd=0x7f1fbb677194, size=65536, offset=0)
    at ../../../../../xlators/cluster/afr/src/afr-dir-read.c:748
#16 0x00007f1fbdd793d9 in dht_do_readdir (frame=0xc9858b20, this=0x13e94b0, fd=0x7f1fbb677194, size=65536, yoff=0, whichop=40)
    at ../../../../../xlators/cluster/dht/src/dht-common.c:3276
#17 0x00007f1fbdd796bc in dht_readdirp (frame=0xc9858b20, this=0x13e94b0, fd=0x7f1fbb677194, size=65536, yoff=0)
    at ../../../../../xlators/cluster/dht/src/dht-common.c:3320
#18 0x00007f1fc00b7da3 in default_readdirp (frame=0xa777f1a0, this=0x13ea780, fd=0x7f1fbb677194, size=65536, off=0)
    at ../../../libglusterfs/src/defaults.c:1122
#19 0x00007f1fc00b7da3 in default_readdirp (frame=0x563deb90, this=0x13ebba0, fd=0x7f1fbb677194, size=65536, off=0)
    at ../../../libglusterfs/src/defaults.c:1122
#20 0x00007f1fc00b7da3 in default_readdirp (frame=0xc6df3030, this=0x13ece40, fd=0x7f1fbb677194, size=65536, off=0)
    at ../../../libglusterfs/src/defaults.c:1122
#21 0x00007f1fc00b7da3 in default_readdirp (frame=0xc97d4970, this=0x13ee050, fd=0x7f1fbb677194, size=65536, off=0)
    at ../../../libglusterfs/src/defaults.c:1122
#22 0x00007f1fc00b7da3 in default_readdirp (frame=0x888defd0, this=0x13ef220, fd=0x7f1fbb677194, size=65536, off=0)
---Type <return> to continue, or q <return> to quit---
 f 67125
#67125 0x00007f1fbafe4af9 in socket_event_poll_in (this=0x143e310) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1647
1647                    ret = rpc_transport_notify (this, RPC_TRANSPORT_MSG_RECEIVED,
(gdb) f 67130
#67130 0x000000000040700c in main (argc=7, argv=0x7fff8a022e18) at ../../../glusterfsd/src/glusterfsd.c:1509
1509            ret = event_dispatch (ctx->event_pool);
(gdb) 



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2012-03-19 22:29:43 UTC
CHANGE: http://review.gluster.com/2793 (nfs: If entry is not found return ENOENT instead of entering an infinite loop.) merged in release-3.2 by Anand Avati (avati)

Comment 2 Vijay Bellur 2012-03-28 15:07:19 UTC
Krishna, Do we need anything more for this bug?

Comment 3 Krishna Srinivas 2012-03-29 03:19:55 UTC
Vijay, No.