Bug 1602866
Summary: | dht: Crash seen in thread dht_dir_attr_heal | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | bugs, khiremat, rhs-bugs, sankarshan, spalai, storage-qa-internal, tdesala |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-5.0 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1601331 | Environment: | |
Last Closed: | 2018-10-23 15:14:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1601331 | ||
Bug Blocks: |
Description
Nithya Balachandran
2018-07-18 16:37:06 UTC
RCA: The process crashed because both loc->gfid and loc->inode->gfid are null. The dht_dir_attr_heal was triggered in dht_lookup_dir_cbk (). This is a fresh named lookup so loc->gfid is null and loc->inode->gfid is null because loc->inode has not been linked. (gdb) bt #0 0x00007f260e71b765 in raise () from ./lib64/libc.so.6 #1 0x00007f260e71d36a in abort () from ./lib64/libc.so.6 #2 0x00007f260e713f97 in __assert_fail_base () from ./lib64/libc.so.6 #3 0x00007f260e714042 in __assert_fail () from ./lib64/libc.so.6 #4 0x00007f2602149ec2 in client_pre_inodelk (this=0x7f25fc00ef20, req=0x7f25e4217670, loc=0x7f25e400a298, cmd=6, flock=0x7f25e400a4b8, volume=0x7f25fc0132b0 "slave-replicate-1", xdata=0x0) at client-common.c:841 #5 0x00007f2602138b24 in client3_3_inodelk (frame=0x7f25e4015290, this=0x7f25fc00ef20, data=0x7f25e4217760) at client-rpc-fops.c:5307 #6 0x00007f260210d9d9 in client_inodelk (frame=0x7f25e4015290, this=0x7f25fc00ef20, volume=0x7f25fc0132b0 "slave-replicate-1", loc=0x7f25e400a298, cmd=6, lock=0x7f25e400a4b8, xdata=0x0) at client.c:1679 #7 0x00007f2601ea4444 in afr_nonblocking_inodelk (frame=0x7f25e400f680, this=0x7f25fc015230) at afr-lk-common.c:1093 #8 0x00007f2601e9d149 in afr_lock (frame=0x7f25e400f680, this=0x7f25fc015230) at afr-transaction.c:1652 #9 0x00007f2601e9eb84 in afr_transaction_start (local=0x7f25e4009e60, this=0x7f25fc015230) at afr-transaction.c:2333 #10 0x00007f2601e9eec0 in afr_transaction (frame=0x7f25e400f680, this=0x7f25fc015230, type=AFR_METADATA_TRANSACTION) at afr-transaction.c:2402 #11 0x00007f2601e875d7 in afr_setattr (frame=0x7f25e400ece0, this=0x7f25fc015230, loc=0x7f25e4008e58, buf=0x7f25e4008f58, valid=7, xdata=0x0) at afr-inode-write.c:895 #12 0x00007f261011681d in syncop_setattr (subvol=0x7f25fc015230, loc=0x7f25e4008e58, iatt=0x7f25e4008f58, valid=7, preop=0x0, postop=0x0, xdata_in=0x0, xdata_out=0x0) at syncop.c:1811 #13 0x00007f2601bc0448 in dht_dir_attr_heal (data=0x7f25e4007c60) at dht-selfheal.c:2497 #14 0x00007f261010f894 in synctask_wrap () at syncop.c:375 #15 0x00007f260e72fb60 in ?? () from ./lib64/libc.so.6 #16 0x0000000000000000 in ?? () (gdb) f 13 #13 0x00007f2601bc0448 in dht_dir_attr_heal (data=0x7f25e4007c60) at dht-selfheal.c:2497 2497 ret = syncop_setattr (subvol, &local->loc, &local->mds_stbuf, (gdb) p local->loc $119 = {path = 0x7f25e4016d50 "/.gfid/00000000-0000-0000-0000-", '0' <repeats 11 times>, "1/rsnapshot_symlinkbug", name = 0x7f25e4016d7c "rsnapshot_symlinkbug", inode = 0x7f25ec030d30, parent = 0x7f25fc078870, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, "\001"} (gdb) p local->loc->gfid $120 = '\000' <repeats 15 times> (gdb) p *local->loc->inode $121 = {table = 0x7f25fc078770, gfid = '\000' <repeats 15 times>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 3, ia_type = IA_INVAL, fd_list = {next = 0x7f25ec030d88, prev = 0x7f25ec030d88}, dentry_list = {next = 0x7f25ec030d98, prev = 0x7f25ec030d98}, hash = {next = 0x7f25ec030da8, prev = 0x7f25ec030da8}, list = {next = 0x7f25ec03aa98, prev = 0x7f25fc0787d0}, _ctx = 0x7f25ec032580} In dht_lookup_dir_cbk, the function unwinds immediately after calling dht_dir_attr_heal. ret = synctask_new (this->ctx->env, dht_dir_attr_heal, dht_dir_attr_heal_done, copy, copy); if (ret) { gf_msg (this->name, GF_LOG_ERROR, ENOMEM, DHT_MSG_DIR_ATTR_HEAL_FAILED, "Synctask creation failed to heal attr " "for path %s gfid %s ", local->loc.path, local->gfid); DHT_STACK_DESTROY (copy); } } } skip_attr_heal: DHT_STRIP_PHASE1_FLAGS (&local->stbuf); dht_set_fixed_dir_stat (&local->postparent); /* Delete mds xattr at the time of STACK UNWIND */ if (local->xattr) GF_REMOVE_INTERNAL_XATTR (conf->mds_xattr_key, local->xattr); DHT_STACK_UNWIND (lookup, frame, local->op_ret, local->op_errno, local->inode, &local->stbuf, local->xattr, &local->postparent); } Most of the time, the unwind reaches the fuse layer and the inode is linked before the dht_attr_dir_heal call stack hits client_pre_inodelk. Once the inode is linked, loc->inode->gfid is non null and client_pre_inodelk proceeds successfully. If the inode has not been linked at this point, it asserts. I was able to hit this by: 1. setting local->need_attrheal to 1 2. putting a sleep(10) before the line DHT_STRIP_PHASE1_FLAGS (&local->stbuf) 3. performing a fresh lookup on an existing directory REVIEW: https://review.gluster.org/20530 (cluster/dht: Set loc->gfid before healing attr) posted (#1) for review on master by N Balachandran COMMIT: https://review.gluster.org/20530 committed in master by "N Balachandran" <nbalacha> with a commit message- cluster/dht: Set loc->gfid before healing attr AFR takes inodelks when setting attrs. The loc->gfid and loc->inode->gfid were both null when dht_dir_attr_heal was called during a fresh lookup of an existing directory. As the gfid is null, client_pre_inodelk asserts in the gfid check. We now set the loc->gfid before calling dht_dir_attr_heal. Change-Id: I457f5a73fd301d97a03ca032587e73d4803298ac fixes: bz#1602866 Signed-off-by: N Balachandran <nbalacha> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/ |