Description of problem: Rebalance operation during detach tier operation fails with the error messages seen below, failing the detach tier operation. # gluster v tier test-vol detach status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 100 1.6MB 100 0 0 completed 0:0:5 dhcp46-103.lab.eng.blr.redhat.com 0 0Bytes 100 1 0 failed 0:0:3 10.70.47.128 0 0Bytes 0 1 0 failed 0:0:3 10.70.47.171 0 0Bytes 66 15 0 failed 0:0:2 [2016-05-04 11:00:31.828317] E [MSGID: 109023] [dht-rebalance.c:1267:dht_migrate_file] 0-test-vol-tier-dht: Migrate file failed:/file-29: lookup failed on test-vol-hot-dht (No such file or directory) [2016-05-04 11:00:31.839178] I [dht-rebalance.c:2500:gf_defrag_process_dir] 0-test-vol-tier-dht: migrate data called on /.trashcan/internal_op [2016-05-04 11:00:31.847277] E [MSGID: 109023] [dht-rebalance.c:1267:dht_migrate_file] 0-test-vol-tier-dht: Migrate file failed:/file-86: lookup failed on test-vol-hot-dht (No such file or directory) [2016-05-04 11:00:31.848861] I [dht-rebalance.c:2711:gf_defrag_process_dir] 0-test-vol-tier-dht: Migration operation on dir /.trashcan/internal_op took 0.01 secs [2016-05-04 11:00:31.868437] W [dht-rebalance.c:3343:gf_tier_clear_fix_layout] 0-test-vol-tier-dht: Failed removing tier fix layout xattr from / [2016-05-04 11:00:31.868734] I [dht-rebalance.c:2109:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 3 [2016-05-04 11:00:31.869003] I [dht-rebalance.c:2109:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 4 [2016-05-04 11:00:31.921020] E [MSGID: 109023] [dht-rebalance.c:1267:dht_migrate_file] 0-test-vol-tier-dht: Migrate file failed:/file-92: lookup failed on test-vol-hot-dht (No such file or directory) [2016-05-04 11:00:31.926488] E [MSGID: 109023] [dht-rebalance.c:1267:dht_migrate_file] 0-test-vol-tier-dht: Migrate file failed:/file-57: lookup failed on test-vol-hot-dht (No such file or directory) Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-2.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. create a EC volume 2. create 100 files from fuse mount 3. Attach a 2x2 hot tier 4. constantly write to these 100 files so they are promoted 5. While promotions are in progress, perform a detach tier 6. check for detach tier status Actual results: Rebalance operation during detach tier fails Expected results: detach tier operation succeeds Additional info: sosreports shall be attached shortly.
Lookup fail is not the only failure here. Detach tier will not happen if [2016-05-04 11:00:31.868437] W [dht-rebalance.c:3343:gf_tier_clear_fix_layout] 0-test-vol-tier-dht: Failed removing tier fix layout xattr from / I have sent a upstream patch for this http://review.gluster.org/#/c/14147/ https://bugzilla.redhat.com/show_bug.cgi?id=1332136 is the bug to track it. [2016-05-04 11:00:31.926488] E [MSGID: 109023] [dht-rebalance.c:1267:dht_migrate_file] 0-test-vol-tier-dht: Migrate file failed:/file-57: lookup failed on test-vol-hot-dht (No such file or directory) is a different issue.
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1332957/
*** Bug 1331628 has been marked as a duplicate of this bug. ***
*** Bug 1333804 has been marked as a duplicate of this bug. ***
As we have two issues here [detach tier failure & lookup failure] and as per comment#13, lookup failure won't cause a detach tier to fail, a new bug will be raised to track lookup failure. This bug will be used to track the detach tier failure issue. Bug summary will be updated accordingly.
The issue reported in the bug is no more seen in build - glusterfs-3.7.9-5. Detach tier completes successfully and migrates all files to cold tier. Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240