Bug 761760 (GLUSTER-28) - Deleting a backend export directory in an AFR setup can cause a segfault while trying to self heal
Summary: Deleting a backend export directory in an AFR setup can cause a segfault whil...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-28
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-18 10:05 UTC by Pavan Vilas Sondur
Modified: 2009-09-08 10:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Vikas Gorur 2009-06-18 07:09:59 UTC
Changed component to "protocol" because it is client-protocol which is accessing loc->parent->ino without doing a NULL check.

Comment 1 Raghavendra G 2009-06-18 07:18:05 UTC
While its true that there should be a NULL check in client-protocol. The actual bug is concerned with afr sending mkdir on a path whose parent is not present, which is wrong. May be another bug should be filed on this.

(In reply to comment #1)
> Changed component to "protocol" because it is client-protocol which is
> accessing loc->parent->ino without doing a NULL check.

Comment 2 Pavan Vilas Sondur 2009-06-18 10:05:47 UTC
If the backend export directory is deleted, self heal tries to mkdir and while doing so accesses loc->parent which is NULL and segfaults.

#0  0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
1022            if (loc->parent->ino && ret < 0) {
(gdb) bt
#0  0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
#1  0x00002b28d41f14b8 in sh_missing_entries_mkdir (frame=0x1975cac0, this=0x19753d40) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:949
#2  0x00002b28d41f1cb5 in sh_missing_entries_create (frame=0x1975cac0, this=0x19753d40)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1115
#3  0x00002b28d41f1f21 in sh_missing_entries_lookup_cbk (frame=0x1975cac0, cookie=0x1, this=0x19753d40, op_ret=-1, op_errno=2, inode=0x1975a440, 
    buf=0x7fffd75d35f0, xattr=0x0) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1173
#4  0x00002b28d3fba59f in client_lookup_cbk (frame=0x1975eb70, hdr=0x1975ec60, hdrlen=112, iobuf=0x0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:4783
#5  0x00002b28d3fbd02c in protocol_client_interpret (this=0x19753930, trans=0x19759b30, hdr_p=0x1975ec60 "", hdrlen=112, iobuf=0x0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:5880
#6  0x00002b28d3fbdc93 in protocol_client_pollin (this=0x19753930, trans=0x19759b30) at ../../../../../xlators/protocol/client/src/client-protocol.c:6171
#7  0x00002b28d3fbde27 in notify (this=0x19753930, event=2, data=0x19759b30) at ../../../../../xlators/protocol/client/src/client-protocol.c:6215
#8  0x00002aaaaaaaebce in socket_event_poll_in (this=0x19759b30) at ../../../../transport/socket/src/socket.c:713
#9  0x00002aaaaaaaeecc in socket_event_handler (fd=10, idx=3, data=0x19759b30, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../transport/socket/src/socket.c:813
#10 0x00002b28d3508209 in event_dispatch_epoll_handler (event_pool=0x1974e6b0, events=0x1975bb00, i=0) at ../../../libglusterfs/src/event.c:804
#11 0x00002b28d35083de in event_dispatch_epoll (event_pool=0x1974e6b0) at ../../../libglusterfs/src/event.c:867
#12 0x00002b28d35086f4 in event_dispatch (event_pool=0x1974e6b0) at ../../../libglusterfs/src/event.c:975
#13 0x00000000004051a0 in main (argc=6, argv=0x7fffd75d41a8) at ../../../glusterfsd/src/glusterfsd.c:1154

(gdb) f 0
#0  0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
1022            if (loc->parent->ino && ret < 0) {
(gdb) l
1017            frame->local = local;
1018
1019            pathlen = STRLEN_0(loc->path);
1020            baselen = STRLEN_0(loc->name);
1021            ret = inode_ctx_get (loc->parent, this, &par);
1022            if (loc->parent->ino && ret < 0) {            <<<<<<
1023                    gf_log (this->name, GF_LOG_DEBUG,
1024                            "MKDIR %"PRId64"/%s (%s): failed to get remote inode "
1025                            "number for parent", 
1026                            loc->parent->ino, loc->name, loc->path);
(gdb) p ret
$1 = -1
(gdb) p loc->parent
$2 = (inode_t *) 0x0

Comment 3 Basavanagowda Kanur 2009-07-23 06:35:59 UTC
(In reply to comment #1)
> Changed component to "protocol" because it is client-protocol which is
> accessing loc->parent->ino without doing a NULL check.

the exact cause of this bug is:
afr is trying to re-create the "/" (root directory) on one of the subvolumes.

mkdir() does two tasks: allocate an inode and add an entry to the parent directory to point to the created inode.

it is the responsibility of the mkdir() call initiater (afr in this case) to ensure that loc->parent is a valid inode.

fix afr to handle the case gracefully. i think afr should abort self-heal.

see protocol/server and mount/fuse mkdir() implementations for instance. they abort the mkdir operation if parent inode cannot be determined.

this above rule also applies to creat(), mknod(), symlink(), link(), rename(), unlink(), rmdir().

Comment 4 Anand Avati 2009-09-08 07:23:17 UTC
PATCH: http://patches.gluster.com/patch/1238 in master (cluster/afr: Do not try to self-heal "/")


Note You need to log in before you can comment on or make changes to this bug.