Description of problem: ======================= While remove-brick is in-progress, started removing the entire dataset on the mount point using rm -rf from multiple terminals. The rebalance logs are getting filled with many lookup failed error messages. When these lookup failed errors were logged, it is just displayed with the file name and lookup failed message. There should be some additional logging information that should get logged along with the lookup failed message which makes easy to find the cause of lookup failure. [2017-01-13 09:33:07.090970] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: vmlinux.lds.S lookup failed [2017-01-13 09:33:08.568525] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ashrdi3.S lookup failed [2017-01-13 09:33:08.571814] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ashldi3.S lookup failed [2017-01-13 09:33:08.586351] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: ashrdi3.c lookup failed [2017-01-13 09:33:08.590265] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __lshrdi3.S lookup failed [2017-01-13 09:33:08.599489] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: checksum.c lookup failed [2017-01-13 09:33:08.601561] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ucmpdi2.S lookup failed [2017-01-13 09:33:08.612718] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: delay.c lookup failed [2017-01-13 09:33:08.614396] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: bitops.c lookup failed [2017-01-13 09:33:08.618801] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: internal.h lookup failed [2017-01-13 09:33:08.620468] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: do_csum.S lookup failed [2017-01-13 09:33:08.624202] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: lshrdi3.c lookup failed [2017-01-13 09:33:08.626305] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: memcpy.S lookup failed [2017-01-13 09:33:08.631211] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: memset.S lookup failed [2017-01-13 09:33:08.636532] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: usercopy.c lookup failed Version-Release number of selected component (if applicable): 3.8.4-11.el7rhgs.x86_64 How reproducible: always Steps to Reproduce: ================== 1) Create distributed-replicate volume and start it. 2) FUSE mount the volume. 3) Under mount point, create two sub directories say /mnt/terminal{1..2} 4) Start Linux kernel untar from both sub directories that is /mnt/terminal1 and /mnt/terminal2 5) Wait for few mins and while untar is in-progress, add couple of bricks to the volume. 6) Immediately remove the added bricks in step-5 // this will start rebalance 7) Wait for few mins and while untar is in-progress issue rm -rf * from each terminal directories. Check for the rebalance logs. Actual results: =============== Lookup failed errors are seen in rebalance logs during rm -rf Expected results: ================= There should not be any lookup failed errors in rebalance logs. Additional info: ================ These lookup failures are not impacting the remove-brick rebalance. On all the nodes, remove-brick rebalance completed successfully.
I hit this on add-brick + rm on the Scale setup as well.
Verified this BZ on glusterfs version: 3.12.2-7.el7rhgs.x86_64. Now, lookup failed errors are logged with the error message. [MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /linux-4.9.27/Documentation/devicetree/bindings/phy/keystone-usb-phy.txt lookup failed [No such file or directory] [MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /a84-40 lookup failed [No such file or directory] [MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /a72-40 lookup failed [No such file or directory] Moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607