Created attachment 897058 [details] Log and core files Description of problem: Start the rebalance process after removing existing brick and adding a new brick. After around 30 minutes rebalance process crashes, and rebalance status is shown as `failed' [root@g60ds-2 ~]# gluster volume rebalance sixtydrive status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 111468 0 0 failed 2153.00 172.17.69.1 0 0Bytes 97354 0 20149 failed 2162.00 volume rebalance: sixtydrive: success: Version-Release number of selected component (if applicable): [root@g60ds-2 ~]# gluster --version glusterfs 3.6.0.3 built on May 17 2014 10:49:46 How reproducible: Always. Steps to Reproduce: 1. Create huge amount of data 2. Remove brick (Migrates data) 3. Add brick and rebalance. Actual results: glusterfs rebalance process crashes. Additional info: Back trace: (gdb) bt #0 0x00007f6e0ef0222f in dht_layout_entry_cmp_volname (layout=0x7f6e04023ec0, i=0, j=<value optimized out>) at dht-layout.c:434 #1 0x00007f6e0ef0228d in dht_layout_sort_volname (layout=0x7f6e04023ec0) at dht-layout.c:506 #2 0x00007f6e0ef0b48b in dht_fix_layout_of_directory (frame=0x7f6e1c90c5ec, loc=0x7f6e0df97800, layout=0x14006d0) at dht-selfheal.c:776 #3 0x00007f6e0ef0cd59 in dht_fix_directory_layout (frame=<value optimized out>, dir_cbk=<value optimized out>, layout=0x14006d0) at dht-selfheal.c:915 #4 0x00007f6e0ef1ed82 in dht_setxattr (frame=0x7f6e1c90c5ec, this=0x13dded0, loc=0x7f6e0b386000, xattr=0x7f6e1c3060b4, flags=0, xdata=0x0) at dht-common.c:2621 #5 0x00007f6e1db10761 in syncop_setxattr (subvol=0x13dded0, loc=0x7f6e0b386000, dict=0x7f6e1c3060b4, flags=0) at syncop.c:1314 #6 0x00007f6e0ef07ad1 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386220, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1575 #7 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386440, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #8 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386660, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #9 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386880, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #10 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386aa0, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #11 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386cc0, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #12 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386ee0, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #13 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387100, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #14 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387320, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #15 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387540, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #16 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387760, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #17 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387980, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #18 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387ba0, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #19 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387dc0, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #20 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387f60, fix_layout=0x7f6e1c3060b4, migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586 #21 0x00007f6e0ef08086 in gf_defrag_start_crawl (data=0x13dded0) at dht-rebalance.c:1705 #22 0x00007f6e1db0a5d2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333 #23 0x0000003558a43bf0 in ?? () from /lib64/libc-2.12.so #24 0x0000000000000000 in ?? () (gdb) ================= Attached log file and core file.
Created attachment 897061 [details] Complete backtrace
https://code.engineering.redhat.com/gerrit/#/c/26972/ & https://code.engineering.redhat.com/gerrit/#/c/27062/
Verified on: glusterfs 3.6.0.22
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html