| Summary: | Deleting a backend export directory in an AFR setup can cause a segfault while trying to self heal | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Pavan Vilas Sondur <pavan> |
| Component: | replicate | Assignee: | Vikas Gorur <vikas> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | mainline | CC: | gluster-bugs, raghavendra, vikas |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vikas Gorur
2009-06-18 07:09:59 UTC
While its true that there should be a NULL check in client-protocol. The actual bug is concerned with afr sending mkdir on a path whose parent is not present, which is wrong. May be another bug should be filed on this. (In reply to comment #1) > Changed component to "protocol" because it is client-protocol which is > accessing loc->parent->ino without doing a NULL check. If the backend export directory is deleted, self heal tries to mkdir and while doing so accesses loc->parent which is NULL and segfaults.
#0 0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
1022 if (loc->parent->ino && ret < 0) {
(gdb) bt
#0 0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
#1 0x00002b28d41f14b8 in sh_missing_entries_mkdir (frame=0x1975cac0, this=0x19753d40) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:949
#2 0x00002b28d41f1cb5 in sh_missing_entries_create (frame=0x1975cac0, this=0x19753d40)
at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1115
#3 0x00002b28d41f1f21 in sh_missing_entries_lookup_cbk (frame=0x1975cac0, cookie=0x1, this=0x19753d40, op_ret=-1, op_errno=2, inode=0x1975a440,
buf=0x7fffd75d35f0, xattr=0x0) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1173
#4 0x00002b28d3fba59f in client_lookup_cbk (frame=0x1975eb70, hdr=0x1975ec60, hdrlen=112, iobuf=0x0)
at ../../../../../xlators/protocol/client/src/client-protocol.c:4783
#5 0x00002b28d3fbd02c in protocol_client_interpret (this=0x19753930, trans=0x19759b30, hdr_p=0x1975ec60 "", hdrlen=112, iobuf=0x0)
at ../../../../../xlators/protocol/client/src/client-protocol.c:5880
#6 0x00002b28d3fbdc93 in protocol_client_pollin (this=0x19753930, trans=0x19759b30) at ../../../../../xlators/protocol/client/src/client-protocol.c:6171
#7 0x00002b28d3fbde27 in notify (this=0x19753930, event=2, data=0x19759b30) at ../../../../../xlators/protocol/client/src/client-protocol.c:6215
#8 0x00002aaaaaaaebce in socket_event_poll_in (this=0x19759b30) at ../../../../transport/socket/src/socket.c:713
#9 0x00002aaaaaaaeecc in socket_event_handler (fd=10, idx=3, data=0x19759b30, poll_in=1, poll_out=0, poll_err=0)
at ../../../../transport/socket/src/socket.c:813
#10 0x00002b28d3508209 in event_dispatch_epoll_handler (event_pool=0x1974e6b0, events=0x1975bb00, i=0) at ../../../libglusterfs/src/event.c:804
#11 0x00002b28d35083de in event_dispatch_epoll (event_pool=0x1974e6b0) at ../../../libglusterfs/src/event.c:867
#12 0x00002b28d35086f4 in event_dispatch (event_pool=0x1974e6b0) at ../../../libglusterfs/src/event.c:975
#13 0x00000000004051a0 in main (argc=6, argv=0x7fffd75d41a8) at ../../../glusterfsd/src/glusterfsd.c:1154
(gdb) f 0
#0 0x00002b28d3fae972 in client_mkdir (frame=0x1975e7a0, this=0x19753930, loc=0x1975d4d8, mode=16877)
at ../../../../../xlators/protocol/client/src/client-protocol.c:1022
1022 if (loc->parent->ino && ret < 0) {
(gdb) l
1017 frame->local = local;
1018
1019 pathlen = STRLEN_0(loc->path);
1020 baselen = STRLEN_0(loc->name);
1021 ret = inode_ctx_get (loc->parent, this, &par);
1022 if (loc->parent->ino && ret < 0) { <<<<<<
1023 gf_log (this->name, GF_LOG_DEBUG,
1024 "MKDIR %"PRId64"/%s (%s): failed to get remote inode "
1025 "number for parent",
1026 loc->parent->ino, loc->name, loc->path);
(gdb) p ret
$1 = -1
(gdb) p loc->parent
$2 = (inode_t *) 0x0
(In reply to comment #1) > Changed component to "protocol" because it is client-protocol which is > accessing loc->parent->ino without doing a NULL check. the exact cause of this bug is: afr is trying to re-create the "/" (root directory) on one of the subvolumes. mkdir() does two tasks: allocate an inode and add an entry to the parent directory to point to the created inode. it is the responsibility of the mkdir() call initiater (afr in this case) to ensure that loc->parent is a valid inode. fix afr to handle the case gracefully. i think afr should abort self-heal. see protocol/server and mount/fuse mkdir() implementations for instance. they abort the mkdir operation if parent inode cannot be determined. this above rule also applies to creat(), mknod(), symlink(), link(), rename(), unlink(), rmdir(). PATCH: http://patches.gluster.com/patch/1238 in master (cluster/afr: Do not try to self-heal "/") |