Bug 773225 - [glusterfs-3.3.0qa19]: nfs server crashed since gfid was NULL
Summary: [glusterfs-3.3.0qa19]: nfs server crashed since gfid was NULL
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: 772552 783250 791089 798608 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-01-11 10:11 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:44:51 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
fs-perf-test which was being executed at the time of the crash (2.76 KB, text/x-csrc)
2012-01-11 10:11 UTC, Raghavendra Bhat
no flags Details
multi threaded program which was being executed on the fuse mount at the time of crash (3.70 KB, application/x-bzip)
2012-01-11 10:22 UTC, Raghavendra Bhat
no flags Details

Description Raghavendra Bhat 2012-01-11 10:11:59 UTC
Created attachment 552060 [details]
fs-perf-test which was being executed at the time of the crash

Description of problem:
nfs server crashed in client3_1_open due to assert since it cannot find the gfid for the file to be opened.

setup: 2x2 distributed replicate with quota and profile enabled, client io-threads on.

volume set operation (flush-behind off and on) was being performed in a loop with  some sleep time between each set. 1 fuse client and 1 nfs client. 
fuse client executing a multi-threaded application (which spawns multiple threads, with each thread opening the same file and then performing different fops in their respective fds), and nfs client executing the fs-perf-test.

Brought one of the bricks down, slept for some time and then woke the brick up.

This is the backtrace of the core.

Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x00000034c2232905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) #0  0x00000034c2232905 in raise () from /lib64/libc.so.6
#1  0x00000034c22340e5 in abort () from /lib64/libc.so.6
#2  0x00000034c222b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000034c222ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f36bcd6f3e8 in client3_1_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, data=0x7fffd5aff0c0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:3477
#5  0x00007f36bcd597af in client_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/protocol/client/src/client.c:743
#6  0x00007f36bcaf9ca8 in afr_open (frame=0x7f36be6aa87c, this=0x1eed700, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/cluster/afr/src/afr-open.c:245
#7  0x00007f36bc8b99e5 in dht_open (frame=0x7f36be6b0f48, this=0x1eee330, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/cluster/dht/src/dht-inode-read.c:127
#8  0x00007f36bfa82663 in default_open (frame=0x7f36be6ad8dc, this=0x1eef600, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../libglusterfs/src/defaults.c:906
#9  0x00007f36bc45e7d9 in wb_open (frame=0x7f36be6b30e0, this=0x1ef0a40, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/write-behind/src/write-behind.c:1413
#10 0x00007f36bc24bf14 in ra_open (frame=0x7f36be6b2574, this=0x1ef1e00, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/read-ahead/src/read-ahead.c:205
#11 0x00007f36bc03bcca in ioc_open (frame=0x7f36be6b0330, this=0x1ef3020, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/io-cache/src/io-cache.c:822
#12 0x00007f36bbe2140a in qr_open (frame=0x7f36be6b3390, this=0x1ef41d0, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/performance/quick-read/src/quick-read.c:774
#13 0x00007f36bbc1190b in io_stats_open (frame=0x7f36be6b0e9c, this=0x1ef5490, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, wbflags=2)
    at ../../../../../xlators/debug/io-stats/src/io-stats.c:2002
#14 0x00007f36bb9c65b3 in nfs_fop_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, loc=0x7f36b9ff8784, flags=2, fd=0x7f36ba251550, 
    wbflags=2, cbk=0x7f36bb9c819d <nfs_inode_open_cbk>, local=0x7f36bb98cb48) at ../../../../../xlators/nfs/server/src/nfs-fops.c:1202
#15 0x00007f36bb9c8415 in nfs_inode_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, loc=0x7f36b9ff8784, flags=2, wbflags=2, 
    cbk=0x7f36bb9ea57e <nfs3_file_open_cbk>, local=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs-inodes.c:236
#16 0x00007f36bb9c9cd7 in nfs_open (nfsx=0x1ef6980, xl=0x1ef5490, nfu=0x7fffd5aff860, pathloc=0x7f36b9ff8784, flags=2, 
    cbk=0x7f36bb9ea57e <nfs3_file_open_cbk>, local=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs-generics.c:190
#17 0x00007f36bb9eac6a in __nfs3_file_open_and_resume (cs=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2098
#18 0x00007f36bb9eae98 in nfs3_file_open_and_resume (cs=0x7f36b9ff835c, resume=0x7f36bb9d71b3 <nfs3_write_resume>)
    at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2147
#19 0x00007f36bb9d7501 in nfs3_write_open_resume (carg=0x7f36b9ff835c) at ../../../../../xlators/nfs/server/src/nfs3.c:2162
#20 0x00007f36bb9ec43f in nfs3_fh_resolve_entry_lookup_cbk (frame=0x7f36be433724, cookie=0x1ef5490, this=0x1ef6980, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/nfs/server/src/nfs3-helpers.c:2515
#21 0x00007f36bb9bfc01 in nfs_fop_lookup_cbk (frame=0x7f36be433724, cookie=0x1ef5490, this=0x1ef6980, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/nfs/server/src/nfs-fops.c:316
#22 0x00007f36bbc0b822 in io_stats_lookup_cbk (frame=0x7f36be6acec8, cookie=0x7f36be6aab2c, this=0x1ef5490, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/debug/io-stats/src/io-stats.c:1468
#23 0x00007f36bbe1fbd9 in qr_lookup_cbk (frame=0x7f36be6aab2c, cookie=0x7f36be6aa1c4, this=0x1ef41d0, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0)
    at ../../../../../xlators/performance/quick-read/src/quick-read.c:439
#24 0x00007f36bc03971a in ioc_lookup_cbk (frame=0x7f36be6aa1c4, cookie=0x7f36be6a9e68, this=0x1ef3020, op_ret=0, op_errno=0, 
inode=0x7f36bac32c08, stbuf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/performance/io-cache/src/io-cache.c:266
#25 0x00007f36bfa77fd4 in default_lookup_cbk (frame=0x7f36be6a9e68, cookie=0x7f36be6b7008, this=0x1ef1e00, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../libglusterfs/src/defaults.c:46
#26 0x00007f36bfa77fd4 in default_lookup_cbk (frame=0x7f36be6b7008, cookie=0x7f36be6b4dc4, this=0x1ef0a40, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../libglusterfs/src/defaults.c:46
#27 0x00007f36bc66e054 in quota_lookup_cbk (frame=0x7f36be6b4dc4, cookie=0x7f36be6a9908, this=0x1eef600, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, buf=0x1f2d840, dict=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/features/quota/src/quota.c:662
#28 0x00007f36bc89f014 in dht_lookup_cbk (frame=0x7f36be6a9908, cookie=0x7f36be6b674c, this=0x1eee330, op_ret=0, op_errno=0, 
    inode=0x7f36bac32c08, stbuf=0x1f2d840, xattr=0x0, postparent=0x1f2d8b0) at ../../../../../xlators/cluster/dht/src/dht-common.c:1026
#29 0x00007f36bcb2c5f4 in afr_lookup_done (frame=0x7f36be6b674c, this=0x1eed700) at ../../../../../xlators/cluster/afr/src/afr-common.c:1750
#30 0x00007f36bcb2ccef in afr_lookup_cbk (frame=0x7f36be6b674c, cookie=0x1, this=0x1eed700, op_ret=0, op_errno=22, inode=0x7f36bac32c08, 
    buf=0x7fffd5b003e0, xattr=0x1fbeae0, postparent=0x7fffd5b00370) at ../../../../../xlators/cluster/afr/src/afr-common.c:1913
#31 0x00007f36bcd6a7b9 in client3_1_lookup_cbk (req=0x7f36b93aae28, iov=0x7f36b93aae68, count=1, myframe=0x7f36be6b18b0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2254
#32 0x00007f36bf8527a0 in rpc_clnt_handle_reply (clnt=0x1f06cc0, pollin=0x1f0c8a0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#33 0x00007f36bf852b27 in rpc_clnt_notify (trans=0x1f06ff0, mydata=0x1f06cf0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1f0c8a0)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908
#34 0x00007f36bf84ed04 in rpc_transport_notify (this=0x1f06ff0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1f0c8a0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#35 0x00007f36ba04423d in socket_event_poll_in (this=0x1f06ff0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#36 0x00007f36ba0447c1 in socket_event_handler (fd=14, idx=6, data=0x1f06ff0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#37 0x00007f36bfaa3808 in event_dispatch_epoll_handler (event_pool=0x1ee2b80, events=0x1f0c9b0, i=0) at ../../../libglusterfs/src/event.c:794
#38 0x00007f36bfaa3a2b in event_dispatch_epoll (event_pool=0x1ee2b80) at ../../../libglusterfs/src/event.c:856
#39 0x00007f36bfaa3db6 in event_dispatch (event_pool=0x1ee2b80) at ../../../libglusterfs/src/event.c:956
#40 0x0000000000407abe in main (argc=7, argv=0x7fffd5b00ac8) at ../../../glusterfsd/src/glusterfsd.c:1601
(gdb) f 4
#4  0x00007f36bcd6f3e8 in client3_1_open (frame=0x7f36be6b6e04, this=0x1eeb5c0, data=0x7fffd5aff0c0)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:3477
3477            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
(gdb) l
3472            if (!uuid_is_null (args->loc->inode->gfid))
3473                    memcpy (req.gfid,  args->loc->inode->gfid, 16);
3474            else
3475                    memcpy (req.gfid, args->loc->gfid, 16);
3476
3477            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
3478                                           !uuid_is_null (*((uuid_t*)req.gfid)),
3479                                           unwind, op_errno, EINVAL);
3480            req.flags = gf_flags_from_flags (args->flags);
3481            req.wbflags = args->wbflags;
(gdb) p *args->loc
$1 = {path = 0x1f16450 "/playground/sync_field/file-390", name = 0x1f16467 "file-390", inode = 0x7f36bac32c08, parent = 0x7f36bac2e864, 
  gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
(gdb) p *args->loc->inode
$2 = {table = 0x1ef97b0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 10, ia_type = IA_INVAL, fd_list = {
    next = 0x7f36bac32c38, prev = 0x7f36bac32c38}, dentry_list = {next = 0x7f36bac32c48, prev = 0x7f36bac32c48}, hash = {
    next = 0x7f36bac32c58, prev = 0x7f36bac32c58}, list = {next = 0x7f36bac32bd4, prev = 0x7f36bac32cfc}, _ctx = 0x1fdca40}
(gdb)  p local->cont.lookup.buf[0]
Structure has no component named operator[].
(gdb) p local->inodelk_count 
$4 = 1
(gdb) p local->cont.lookup.buf[0]
Structure has no component named operator[].
(gdb) p local->cont.lookup.bufs
$5 = (struct iatt *) 0x1faf130
(gdb) p local->cont.lookup.bufs[0]
$6 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', 
    sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
      exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, 
  ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, 
  ia_ctime_nsec = 0}
(gdb) p local->cont.lookup.bufs[1]
$7 = {ia_ino = 10081082189615010605, ia_gfid = "\356\262\342\310\352xO~\213\347\062\326\357\205\023-", ia_dev = 2065, ia_type = IA_IFREG, 
  ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {
      read = 1 '\001', write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, 
  ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 729088, ia_blksize = 4096, ia_blocks = 1424, ia_atime = 1326272607, 
  ia_atime_nsec = 97893047, ia_mtime = 1326272714, ia_mtime_nsec = 858904076, ia_ctime = 1326272714, ia_ctime_nsec = 858904076}
(gdb) 





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Raghavendra Bhat 2012-01-11 10:22:55 UTC
Created attachment 552064 [details]
multi threaded program which was being executed on the fuse mount at the time of crash

Comment 2 Mike Foley 2012-01-16 20:41:49 UTC
*** Bug 773237 has been marked as a duplicate of this bug. ***

Comment 3 Pranith Kumar K 2012-01-27 11:53:41 UTC
*** Bug 783250 has been marked as a duplicate of this bug. ***

Comment 4 Pranith Kumar K 2012-01-27 11:55:20 UTC
*** Bug 772552 has been marked as a duplicate of this bug. ***

Comment 5 Pranith Kumar K 2012-03-04 06:44:16 UTC
*** Bug 798608 has been marked as a duplicate of this bug. ***

Comment 6 Anand Avati 2012-03-06 06:55:22 UTC
CHANGE: http://review.gluster.com/2858 (cluster/afr: handle node failures in lookup) merged in master by Vijay Bellur (vijay)

Comment 7 Amar Tumballi 2012-03-12 09:46:21 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 8 Pranith Kumar K 2012-03-14 07:06:05 UTC
*** Bug 791089 has been marked as a duplicate of this bug. ***

Comment 9 Raghavendra Bhat 2012-04-05 10:32:00 UTC
Checked with glusterfs-3.3.0qa33. This crash is not seen anymore.


Note You need to log in before you can comment on or make changes to this bug.