Bug 804645

Summary: [glusterfs-3.3.0qa29]: nfs server crashed since the frame pointer was NULL
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:11:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Raghavendra Bhat 2012-03-19 14:00:48 UTC
Description of problem:
3 replica volume with 6 fuse and 6 nfs clients running heavy i/o. nfs server crashed with the below backtrace.

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/sbin/glusterfs...done.
[New Thread 6260]
[New Thread 6261]
[New Thread 6263]
[New Thread 6264]
[New Thread 6262]
[New Thread 6265]
Reading symbols from /usr/local/lib/libglusterfs.so.0...done.
Loaded symbols for /usr/local/lib/libglusterfs.so.0
Reading symbols from /usr/local/lib/libgfrpc.so.0...done.
Loaded symbols for /usr/local/lib/libgfrpc.so.0
Reading symbols from /usr/local/lib/libgfxdr.so.0...done.
Loaded symbols for /usr/local/lib/libgfxdr.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa29/rpc-transport/socket.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa29/rpc-transport/socket.so
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa29/xlator/protocol/client.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa29/xlator/protocol/client.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa29/xlator/cluster/replicate.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa29/xlator/cluster/replicate.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa29/xlator/debug/io-stats.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa29/xlator/debug/io-stats.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa29/xlator/nfs/server.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa29/xlator/nfs/server.so
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /etc/gluster'.
Program terminated with signal 6, Aborted.
#0  0x0000003c38032885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003c38032885 in raise () from /lib64/libc.so.6
#1  0x0000003c38034065 in abort () from /lib64/libc.so.6
#2  0x0000003c3802b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003c3802bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007faac14cce57 in afr_sh_data_erase_pending_cbk (frame=0x7faac485ed5c, cookie=0x1, this=0x2255960, op_ret=0, op_errno=22, 
    xattr=0x221fb20) at ../../../../../xlators/cluster/afr/src/afr-self-heal-data.c:373
#5  0x00007faac173b8e6 in client3_1_fxattrop_cbk (req=0x7faaba1fd7b8, iov=0x7faaba1fd7f8, count=1, myframe=0x7faac4a59e5c)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1453
#6  0x00007faac59fa9fc in rpc_clnt_handle_reply (clnt=0x22b3620, pollin=0x23eae30) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:797
#7  0x00007faac59fad99 in rpc_clnt_notify (trans=0x22c31b0, mydata=0x22b3650, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23eae30)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:916
#8  0x00007faac59f6e7c in rpc_transport_notify (this=0x22c31b0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23eae30)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#9  0x00007faac2580270 in socket_event_poll_in (this=0x22c31b0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#10 0x00007faac25807f4 in socket_event_handler (fd=21, idx=12, data=0x22c31b0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#11 0x00007faac5c5100c in event_dispatch_epoll_handler (event_pool=0x221ec50, events=0x224cdf0, i=0)
    at ../../../libglusterfs/src/event.c:794
#12 0x00007faac5c5122f in event_dispatch_epoll (event_pool=0x221ec50) at ../../../libglusterfs/src/event.c:856
#13 0x00007faac5c515ba in event_dispatch (event_pool=0x221ec50) at ../../../libglusterfs/src/event.c:956
#14 0x0000000000408057 in main (argc=11, argv=0x7fffb39538e8) at ../../../glusterfsd/src/glusterfsd.c:1647
(gdb) f 4
#4  0x00007faac14cce57 in afr_sh_data_erase_pending_cbk (frame=0x7faac485ed5c, cookie=0x1, this=0x2255960, op_ret=0, op_errno=22, 
    xattr=0x221fb20) at ../../../../../xlators/cluster/afr/src/afr-self-heal-data.c:373
373	                GF_ASSERT (sh->old_loop_frame);
(gdb) l
368	                sh = &local->self_heal;
369	                if (!IA_ISREG (sh->type)) {
370	                        afr_sh_data_finish (frame, this);
371	                        goto out;
372	                }
373	                GF_ASSERT (sh->old_loop_frame);
374	                afr_sh_data_lock (frame, this, 0, 0,
375	                                  afr_post_sh_big_lock_success,
376	                                  afr_post_sh_big_lock_failure);
377	        }
(gdb) info thr
  6 Thread 0x7faabb34b700 (LWP 6265)  0x0000003c380dc273 in poll () from /lib64/libc.so.6
  5 Thread 0x7faac3b8c700 (LWP 6262)  0x0000003c3880b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7faac2360700 (LWP 6264)  0x0000003c3880eccd in nanosleep () from /lib64/libpthread.so.0
  3 Thread 0x7faac318b700 (LWP 6263)  0x0000003c3880b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7faac458d700 (LWP 6261)  0x0000003c3880f245 in sigwait () from /lib64/libpthread.so.0
* 1 Thread 0x7faac57c3700 (LWP 6260)  0x0000003c38032885 in raise () from /lib64/libc.so.6
(gdb) t 6
[Switching to thread 6 (Thread 0x7faabb34b700 (LWP 6265))]#0  0x0000003c380dc273 in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c380dc273 in poll () from /lib64/libc.so.6
#1  0x0000003c38112d60 in svc_run () from /lib64/libc.so.6
#2  0x00007faac10634c7 in nsm_thread (argv=0x0) at ../../../../../xlators/nfs/server/src/nlmcbk_svc.c:118
#3  0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003c380e592d in clone () from /lib64/libc.so.6
(gdb) quit

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
nfs server crashed trying to access the null frame pointer

Expected results:

frame pointer should not become null

Additional info:

Comment 1 Anand Avati 2012-03-29 11:53:00 UTC
CHANGE: http://review.gluster.com/3031 (cluster/afr: handle fstat failure in data-self-heal) merged in master by Vijay Bellur (vijay)

Comment 2 Raghavendra Bhat 2012-04-05 09:37:54 UTC
Checked with glusterfs-3.3.0qa33. Bug is not seen anymore.