Bug 765479 (GLUSTER-3747) - [glusterfs-3.2.5qa2]: glusterfs client crashed because of gfid being NULL
Summary: [glusterfs-3.2.5qa2]: glusterfs client crashed because of gfid being NULL
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3747
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-21 07:53 UTC by Raghavendra Bhat
Modified: 2011-10-28 12:48 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra Bhat 2011-10-21 07:53:03 UTC
Created a volume, started it and mounted it. Stopped the volume, put some data in one of the backends directly, and then started the volume. Did find . | xargs stat on the mount point.

Again stopped the volume and erased the backend data  and put data in one of the backends, started the volume and again did find . |xargs stat on the mount point and glusterfs client crashed.

This is the backtrace.

Program terminated with signal 6, Aborted.
#0  0x00007fa81e290d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: Transport endpoint is not connected.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007fa81e290d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa81e294ab6 in abort () at abort.c:92
#2  0x00007fa81e2897c5 in __assert_fail (assertion=0x7fa81b78719d ", aborting self-heal", file=<value optimized out>, line=1778, 
    function=<value optimized out>) at assert.c:81
#3  0x00007fa81b757cfa in afr_sh_common_lookup (frame=0x7fa81d5020cc, this=0x1b0e7d0, loc=0x1b32418, 
    lookup_done=0x7fa81b756293 <afr_sh_missing_entries_lookup_done>, gfid=0x1b33ef0 "", flags=3)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1778
#4  0x00007fa81b756dbb in afr_sh_purge_stale_entries_done (frame=0x7fa81d5020cc, this=0x1b0e7d0)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1429
#5  0x00007fa81b75701b in afr_sh_purge_entry_common (frame=0x7fa81d5020cc, this=0x1b0e7d0, 
    purge_condition=0x7fa81b756ec5 <afr_sh_purge_stale_entry_condition>)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1506
#6  0x00007fa81b75733d in afr_sh_purge_stale_entry (frame=0x7fa81d5020cc, this=0x1b0e7d0)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1569
#7  0x00007fa81b75774b in afr_sh_children_lookup_done (frame=0x7fa81d5020cc, this=0x1b0e7d0, op_ret=0, op_errno=61)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1646
#8  0x00007fa81b756852 in afr_sh_common_lookup_cbk (frame=0x7fa81d5020cc, cookie=0x0, this=0x1b0e7d0, op_ret=0, op_errno=61, 
    inode=0x7fa81a3225c0, buf=0x7fffde5bef00, xattr=0x1b2f640, postparent=0x7fffde5bee90)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1326
#9  0x00007fa81b9b65c4 in client3_1_lookup_cbk (req=0x7fa81a608710, iov=0x7fa81a608750, count=1, myframe=0x7fa81d76bc70)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2233
#10 0x00007fa81ec400de in rpc_clnt_handle_reply (clnt=0x1b21ce0, pollin=0x1b2d7a0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:741
#11 0x00007fa81ec4041a in rpc_clnt_notify (trans=0x1b21f00, mydata=0x1b21d10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1b2d7a0)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:854
#12 0x00007fa81ec3c7d7 in rpc_transport_notify (this=0x1b21f00, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1b2d7a0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:919
#13 0x00007fa81c5eacc0 in socket_event_poll_in (this=0x1b21f00) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1647
#14 0x00007fa81c5eb234 in socket_event_handler (fd=7, idx=1, data=0x1b21f00, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1762
#15 0x00007fa81ee96b61 in event_dispatch_epoll_handler (event_pool=0x1b01bc0, events=0x1b06770, i=0) at ../../../libglusterfs/src/event.c:794
#16 0x00007fa81ee96d7b in event_dispatch_epoll (event_pool=0x1b01bc0) at ../../../libglusterfs/src/event.c:856
#17 0x00007fa81ee970ed in event_dispatch (event_pool=0x1b01bc0) at ../../../libglusterfs/src/event.c:956
#18 0x00000000004072d0 in main (argc=6, argv=0x7fffde5bf458) at ../../../glusterfsd/src/glusterfsd.c:1509
(gdb) f 3
#3  0x00007fa81b757cfa in afr_sh_common_lookup (frame=0x7fa81d5020cc, this=0x1b0e7d0, loc=0x1b32418, 
    lookup_done=0x7fa81b756293 <afr_sh_missing_entries_lookup_done>, gfid=0x1b33ef0 "", flags=3)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1778
1778	                        GF_ASSERT (!uuid_is_null (gfid));
(gdb) p gfid
$1 = (unsigned char *) 0x1b33ef0 ""
(gdb) f 8
#8  0x00007fa81b756852 in afr_sh_common_lookup_cbk (frame=0x7fa81d5020cc, cookie=0x0, this=0x1b0e7d0, op_ret=0, op_errno=61, 
    inode=0x7fa81a3225c0, buf=0x7fffde5bef00, xattr=0x1b2f640, postparent=0x7fffde5bee90)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1326
1326	        sh->lookup_done (frame, this, op_ret, op_errno);
p local.cont.lookup 
$5 = {gfid_req = '\000' <repeats 15 times>, inode = 0x7fa81a3225c0, buf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, 
    ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', 
        exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
        exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
    ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, postparent = {ia_ino = 0, 
    ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
      owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {
        read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
    ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, 
  ino = 0, gen = 0, parent_ino = 0, xattrs = 0x0, xattr = 0x1b2fea0, postparents = 0x0, bufs = 0x0, read_child = 0, child_success = 0x0, 
  sources = 0x0}


gfid is NULL in the lookup call only.


fuse in case of fresh lookup generates new uuid in state->gfid and copies it to the dict while sending the call. But for revalidates it does not generate uuid, hence the dict will not contain any gfid, in which case gfid can be obtained by inode->gfid. 

When the second time the backend was deleted and files were added the client process still had the inodes for the files in the inode table and fuse did not send the gfid in dict considering it as a revalidate lookup. But afr instead of looking into the inode->gfid, used the dict value which is NULL. This for revalidates inode->gfid should be used. If inode->gfid is NULL then the dict must contain the gfid.

Comment 1 Anand Avati 2011-10-28 08:50:24 UTC
CHANGE: http://review.gluster.com/633 (Change-Id: I1904aa63d9365ebda3e979449454ac08db85d93d) merged in release-3.2 by Vijay Bellur (vijay)

Comment 2 Anand Avati 2011-10-28 08:52:28 UTC
CHANGE: http://review.gluster.com/634 (Afr needs to send the xattr_req without gfid so instead of modifying) merged in release-3.2 by Vijay Bellur (vijay)

Comment 3 Anand Avati 2011-10-28 08:54:11 UTC
CHANGE: http://review.gluster.com/635 (Change-Id: I895574dd6fa411784eb5282c799ccf3ff7c65625) merged in release-3.2 by Vijay Bellur (vijay)

Comment 4 Anand Avati 2011-10-28 09:00:57 UTC
CHANGE: http://review.gluster.com/636 (Change-Id: I1904aa63d9365ebda3e979449454ac08db85d93d) merged in master by Vijay Bellur (vijay)

Comment 5 Anand Avati 2011-10-28 09:01:14 UTC
CHANGE: http://review.gluster.com/637 (Change-Id: Iddf5b59d3534c517dcd3c0d7b819e3768f6e915a) merged in master by Vijay Bellur (vijay)

Comment 6 Anand Avati 2011-10-28 09:01:28 UTC
CHANGE: http://review.gluster.com/638 (Change-Id: I600120252445c06d9cc3e7aa24022c2559b6abe2) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.