Bug 1296134
Summary: | Rebalance crashed after detach tier. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Bhaskarakiran <byarlaga> | ||||
Component: | tier | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | ||||
Status: | CLOSED ERRATA | QA Contact: | Bhaskarakiran <byarlaga> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | asrivast, dlambrig, mzywusko, nbalacha, rcyriac, rhs-bugs, sankarshan, storage-qa-internal | ||||
Target Milestone: | --- | Keywords: | Regression, ZStream | ||||
Target Release: | RHGS 3.1.2 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.5-16 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1296611 (view as bug list) | Environment: | |||||
Last Closed: | 2016-03-01 06:06:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1296611, 1297309 | ||||||
Attachments: |
|
Description
Bhaskarakiran
2016-01-06 12:12:28 UTC
The rebalance log file output : pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) [2016-01-06 10:14:14.816071] W [MSGID: 114031] [client-rpc-fops.c:2325:client3_3_setattr_cbk] 0-disperse_vol1-client-13: remote operation failed [No such file or directory] frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-01-06 10:14:14 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.5 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f7a849d2002] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f7a849ee48d] /lib64/libc.so.6(+0x35670)[0x7f7a830c0670] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f7a8383f210] --------- Analysis: From the core: (gdb) bt #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 #1 0x00007f7a84a02e07 in fd_unref (fd=0x7f7a5c005b18) at fd.c:559 #2 0x00007f7a84a1d90e in syncop_close (fd=fd@entry=0x7f7a5c005b18) at syncop.c:2021 #3 0x00007f7a769fdd49 in dht_migrate_file (this=0x7f7a7001e9b0, loc=<optimized out>, from=0x7f7a7001da90, to=0x7f7a7001c5a0, flag=<optimized out>) at dht-rebalance.c:1644 #4 0x00007f7a84a13e02 in synctask_wrap (old_task=<optimized out>) at syncop.c:380 #5 0x00007f7a830d2110 in ?? () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? () loc is optimized out but tmp_loc is not. (gdb) p tmp_loc $5 = {path = 0x7f7a2000acb0 "/dirs/dir.2/testfile.919", name = 0x0, inode = 0x7f7a6e814e5c, parent = 0x0, gfid = "\315\373\232\366\212\341OŁ :&R\273\\\330>", pargfid = '\000' <repeats 15 times>} (gdb) From the rebalance log file: [2016-01-06 10:14:14.801309] E [MSGID: 109023] [dht-rebalance.c:598:__dht_rebalance_create_dst_file] 0-disperse_vol1-tier-dht: /dirs/dir.2/testfile.919: file does not existson disperse_vol1-cold-dht (No such file or directory) Examining the code, the dst_fd is unrefed twice. Once in __dht_rebalance_create_dst_file: if (dst_fd) *dst_fd = fd; ... if (-ret == ENOENT) { gf_msg (this->name, GF_LOG_ERROR, 0, DHT_MSG_MIGRATE_FILE_FAILED, "%s: file does not exists" "on %s (%s)", loc->path, to->name, strerror (-ret)); ret = -1; fd_unref (fd); goto out; } and once again in dht_migrate_file () -> syncop_close (dst_fd) The core dump does not show an invalid inode but I was able to reproduce the crash by setting ret to -ENOENT in gdb. Patch posted upstream. Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/65199/ Verified this on 3.7.5-17 and didn't see the crash. Marking this as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html |