Bug 773225

Summary: [glusterfs-3.3.0qa19]: nfs server crashed since gfid was NULL
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, rfortier, vbhat, vlad.craciunoiu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:44:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
fs-perf-test which was being executed at the time of the crash
none
multi threaded program which was being executed on the fuse mount at the time of crash none

Description Raghavendra Bhat 2012-01-11 10:11:59 UTC
Created attachment 552060 [details]
fs-perf-test which was being executed at the time of the crash

Description of problem:
nfs server crashed in client3_1_open due to assert since it cannot find the gfid for the file to be opened.

setup: 2x2 distributed replicate with quota and profile enabled, client io-threads on.

volume set operation (flush-behind off and on) was being performed in a loop with  some sleep time between each set. 1 fuse client and 1 nfs client. 
fuse client executing a multi-threaded application (which spawns multiple threads, with each thread opening the same file and then performing different fops in their respective fds), and nfs client executing the fs-perf-test.

Brought one of the bricks down, slept for some time and then woke the brick up.

This is the backtrace of the core.

Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x00000034c2232905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) #0  0x00000034c2232905 in raise () from /lib64/libc.so.6
#1  0x00000034c22340e5 in abort () from /lib64/libc.so.6
#2  0x00000034c222b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000034c222ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f36bcd6f3e8 in client3_1_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, data=0x7fffd5aff0c0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:3477
#5  0x00007f36bcd597af in client_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/protocol/client/src/client.c:743
#6  0x00007f36bcaf9ca8 in afr_open (frame=0x7f36be6aa87c, this=0x1eed700, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/cluster/afr/src/afr-open.c:245
#7  0x00007f36bc8b99e5 in dht_open (frame=0x7f36be6b0f48, this=0x1eee330, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/cluster/dht/src/dht-inode-read.c:127
#8  0x00007f36bfa82663 in default_open (frame=0x7f36be6ad8dc, this=0x1eef600, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../libglusterfs/src/defaults.c:906
#9  0x00007f36bc45e7d9 in wb_open (frame=0x7f36be6b30e0, this=0x1ef0a40, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/write-behind/src/write-behind.c:1413
#10 0x00007f36bc24bf14 in ra_open (frame=0x7f36be6b2574, this=0x1ef1e00, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/read-ahead/src/read-ahead.c:205
#11 0x00007f36bc03bcca in ioc_open (frame=0x7f36be6b0330, this=0x1ef3020, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/io-cache/src/io-cache.c:822
#12 0x00007f36bbe2140a in qr_open (frame=0x7f36be6b3390, this=0x1ef41d0, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/quick-read/src/quick-read.c:774
#13 0x00007f36bbc1190b in io_stats_open (frame=0x7f36be6b0e9c, this=0x1ef5490, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/debug/io-stats/src/io-stats.c:2002
#14 0x00007f36bb9c65b3 in nfs_fop_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, 
    wbflags=2, cbk=0x7f36bb9c819d <nfs_inode_open_cbk>, local=0x7f36bb98cb48) at ../../../../../xlators/nfs/server/src/nfs-fops.c:1202
#15 0x00007f36bb9c8415 in nfs_inode_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, loc=0x7f36b9ff8784, flags=2, wbflags=2, 
    cbk=0x7f36bb9ea57e <nfs3_file_open_cbk>, local=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs-inodes.c:236
#16 0x00007f36bb9c9cd7 in nfs_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, pathloc=0x7f36b9ff8784, flags=2, 
    cbk=0x7f36bb9ea57e <nfs3_file_open_cbk>, local=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs-generics.c:190
#17 0x00007f36bb9eac6a in __nfs3_file_open_and_resume (cs=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2098
#18 0x00007f36bb9eae98 in nfs3_file_open_and_resume (cs=0x7f36b9ff835c, resume=0x7f36bb9d71b3 <nfs3_write_resume>)
    at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2147
#19 0x00007f36bb9d7501 in nfs3_write_open_resume (carg=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs3.c:2162
#20 0x00007f36bb9ec43f in nfs3_fh_resolve_entry_lookup_cbk (frame=0x7f36be433724, cookie=0x1ef5490, this=0x1ef6980, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2515
#21 0x00007f36bb9bfc01 in nfs_fop_lookup_cbk (frame=0x7f36be433724, cookie=0x1ef5490, this=0x1ef6980, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/nfs/server/src/nfs-fops.c:316
#22 0x00007f36bbc0b822 in io_stats_lookup_cbk (frame=0x7f36be6acec8, cookie=0x7f36be6aab2c, this=0x1ef5490, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/debug/io-stats/src/io-stats.c:1468
#23 0x00007f36bbe1fbd9 in qr_lookup_cbk (frame=0x7f36be6aab2c, cookie=0x7f36be6aa1c4, this=0x1ef41d0, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0)
    at ../../../../../xlators/performance/quick-read/src/quick-read.c:439
#24 0x00007f36bc03971a in ioc_lookup_cbk (frame=0x7f36be6aa1c4, cookie=0x7f36be6a9e68, this=0x1ef3020, op_ret=0, op_errno=0, 
inode=0x7f36bac32c08, stbuf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/performance/io-cache/src/io-cache.c:266
#25 0x00007f36bfa77fd4 in default_lookup_cbk (frame=0x7f36be6a9e68, cookie=0x7f36be6b7008, this=0x1ef1e00, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../libglusterfs/src/defaults.c:46
#26 0x00007f36bfa77fd4 in default_lookup_cbk (frame=0x7f36be6b7008, cookie=0x7f36be6b4dc4, this=0x1ef0a40, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../libglusterfs/src/defaults.c:46
#27 0x00007f36bc66e054 in quota_lookup_cbk (frame=0x7f36be6b4dc4, cookie=0x7f36be6a9908, this=0x1eef600, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/features/quota/src/quota.c:662
#28 0x00007f36bc89f014 in dht_lookup_cbk (frame=0x7f36be6a9908, cookie=0x7f36be6b674c, this=0x1eee330, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, stbuf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/cluster/dht/src/dht-common.c:1026
#29 0x00007f36bcb2c5f4 in afr_lookup_done (frame=0x7f36be6b674c, this=0x1eed700) at ../../../../../xlators/cluster/afr/src/afr-common.c:1750
#30 0x00007f36bcb2ccef in afr_lookup_cbk (frame=0x7f36be6b674c, cookie=0x1, this=0x1eed700, op_ret=0, op_errno=22, inode=0x7f36bac32c08, 
    buf=0x7fffd5b003e0, xattr=0x1fbeae0, postparent=0x7fffd5b00370) at ../../../../../xlators/cluster/afr/src/afr-common.c:1913
#31 0x00007f36bcd6a7b9 in client3_1_lookup_cbk (req=0x7f36b93aae28, iov=0x7f36b93aae68, count=1, myframe=0x7f36be6b18b0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2254
#32 0x00007f36bf8527a0 in rpc_clnt_handle_reply (clnt=0x1f06cc0, pollin=0x1f0c8a0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#33 0x00007f36bf852b27 in rpc_clnt_notify (trans=0x1f06ff0, mydata=0x1f06cf0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1f0c8a0)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908
#34 0x00007f36bf84ed04 in rpc_transport_notify (this=0x1f06ff0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1f0c8a0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#35 0x00007f36ba04423d in socket_event_poll_in (this=0x1f06ff0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#36 0x00007f36ba0447c1 in socket_event_handler (fd=14, idx=6, data=0x1f06ff0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#37 0x00007f36bfaa3808 in event_dispatch_epoll_handler (event_pool=0x1ee2b80, events=0x1f0c9b0, i=0) at ../../../libglusterfs/src/event.c:794
#38 0x00007f36bfaa3a2b in event_dispatch_epoll (event_pool=0x1ee2b80) at ../../../libglusterfs/src/event.c:856
#39 0x00007f36bfaa3db6 in event_dispatch (event_pool=0x1ee2b80) at ../../../libglusterfs/src/event.c:956
#40 0x0000000000407abe in main (argc=7, argv=0x7fffd5b00ac8) at ../../../glusterfsd/src/glusterfsd.c:1601
(gdb) f 4
#4  0x00007f36bcd6f3e8 in client3_1_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, data=0x7fffd5aff0c0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:3477
3477            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
(gdb) l
3472            if (!uuid_is_null (args->loc->inode->gfid))
3473                    memcpy (req.gfid,  args->loc->inode->gfid, 16);
3474            else
3475                    memcpy (req.gfid, args->loc->gfid, 16);
3476
3477            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
3478                                           !uuid_is_null (*((uuid_t*)req.gfid)),
3479                                           unwind, op_errno, EINVAL);
3480            req.flags = gf_flags_from_flags (args->flags);
3481            req.wbflags = args->wbflags;
(gdb) p *args->loc
$1 = {path = 0x1f16450 "/playground/sync_field/file-390", name = 0x1f16467 "file-390", inode = 0x7f36bac32c08, parent = 0x7f36bac2e864, 
  gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
(gdb) p *args->loc->inode
$2 = {table = 0x1ef97b0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 10, ia_type = IA_INVAL, fd_list = {
    next = 0x7f36bac32c38, prev = 0x7f36bac32c38}, dentry_list = {next = 0x7f36bac32c48, prev = 0x7f36bac32c48}, hash = {
    next = 0x7f36bac32c58, prev = 0x7f36bac32c58}, list = {next = 0x7f36bac32bd4, prev = 0x7f36bac32cfc}, _ctx = 0x1fdca40}
(gdb)  p local->cont.lookup.buf[0]
Structure has no component named operator[].
(gdb) p local->inodelk_count 
$4 = 1
(gdb) p local->cont.lookup.buf[0]
Structure has no component named operator[].
(gdb) p local->cont.lookup.bufs
$5 = (struct iatt *) 0x1faf130
(gdb) p local->cont.lookup.bufs[0]
$6 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', 
    sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
      exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, 
  ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, 
  ia_ctime_nsec = 0}
(gdb) p local->cont.lookup.bufs[1]
$7 = {ia_ino = 10081082189615010605, ia_gfid = "\356\262\342\310\352xO~\213\347\062\326\357\205\023-", ia_dev = 2065, ia_type = IA_IFREG, 
  ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {
      read = 1 '\001', write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, 
  ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 729088, ia_blksize = 4096, ia_blocks = 1424, ia_atime = 1326272607, 
  ia_atime_nsec = 97893047, ia_mtime = 1326272714, ia_mtime_nsec = 858904076, ia_ctime = 1326272714, ia_ctime_nsec = 858904076}
(gdb) 





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Raghavendra Bhat 2012-01-11 10:22:55 UTC
Created attachment 552064 [details]
multi threaded program which was being executed on the fuse mount at the time of crash

Comment 2 Mike Foley 2012-01-16 20:41:49 UTC
*** Bug 773237 has been marked as a duplicate of this bug. ***

Comment 3 Pranith Kumar K 2012-01-27 11:53:41 UTC
*** Bug 783250 has been marked as a duplicate of this bug. ***

Comment 4 Pranith Kumar K 2012-01-27 11:55:20 UTC
*** Bug 772552 has been marked as a duplicate of this bug. ***

Comment 5 Pranith Kumar K 2012-03-04 06:44:16 UTC
*** Bug 798608 has been marked as a duplicate of this bug. ***

Comment 6 Anand Avati 2012-03-06 06:55:22 UTC
CHANGE: http://review.gluster.com/2858 (cluster/afr: handle node failures in lookup) merged in master by Vijay Bellur (vijay)

Comment 7 Amar Tumballi 2012-03-12 09:46:21 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 8 Pranith Kumar K 2012-03-14 07:06:05 UTC
*** Bug 791089 has been marked as a duplicate of this bug. ***

Comment 9 Raghavendra Bhat 2012-04-05 10:32:00 UTC
Checked with glusterfs-3.3.0qa33. This crash is not seen anymore.