Bug 1573227

Summary: [Remove-brick] Files are not migrated when they are renamed during a remove-brick operation
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: distributeAssignee: Susant Kumar Palai <spalai>
Status: CLOSED WONTFIX QA Contact: Sayalee <saraut>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: nchilaka, rhs-bugs, sabose, saraut, sheggodu, spalai, storage-qa-internal, tdesala, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-06 09:19:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prasad Desala 2018-04-30 14:24:03 UTC
Description of problem:
=======================
On a distribute volume, many files were not migrated from the decommissioned bricks.

Version-Release number of selected component (if applicable):
3.12.2-8.el7rhgs.x86_64

How reproducible:
=================
2/2

Steps to Reproduce:
====================
1) Create a 4 brick distribute volume and start it.
2) FUSE mount it on multiple clients.
3) 
* From one client start creating files and directories on the mount point
python /home/file_dir_ops.py create_deep_dirs_with_files -d 5 -l 5 -f 50 /mnt/dist
* From other client create files on / of mount point
for i in {1..5000};do cat /etc/redhat-release > new_cat_$i;done
4) Once step-3 is completed, start renaming all files and directories on the mount point
for i in `ls`; do mv $i $i+1;done
5) Once rename completes, remove a brick and wait till remove-brick completes.

Actual results:
===============
There are no file migration failures but many files were not migrated from the decommissioned bricks

Expected results:
=================
All files from the decommissioned bricks should get migrated successfully.

Comment 4 Nithya Balachandran 2018-05-02 06:02:25 UTC
(In reply to Prasad Desala from comment #0)
> Description of problem:
> =======================
> On a distribute volume, many files were not migrated from the decommissioned
> bricks.
> 
> Version-Release number of selected component (if applicable):
> 3.12.2-8.el7rhgs.x86_64
> 
> How reproducible:
> =================
> 2/2
> 
> Steps to Reproduce:
> ====================
> 1) Create a 4 brick distribute volume and start it.
> 2) FUSE mount it on multiple clients.
> 3) 
> * From one client start creating files and directories on the mount point
> python /home/file_dir_ops.py create_deep_dirs_with_files -d 5 -l 5 -f 50
> /mnt/dist
> * From other client create files on / of mount point
> for i in {1..5000};do cat /etc/redhat-release > new_cat_$i;done
> 4) Once step-3 is completed, start renaming all files and directories on the
> mount point
> for i in `ls`; do mv $i $i+1;done
> 5) Once rename completes, remove a brick and wait till remove-brick
> completes.
> 


It looks like the remove-brick was performed _before_ the renames completed - the rebalance logs show error messages with the older names:


[2018-04-30 14:13:40.901213] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_28 lookup failed [No such file or directory]
[2018-04-30 14:13:43.883442] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_239 lookup failed [No such file or directory]
[2018-04-30 14:13:43.899613] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_240 lookup failed [No such file or directory]
[2018-04-30 14:13:43.900365] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_241 lookup failed [No such file or directory]
[2018-04-30 14:13:43.907045] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_244 lookup failed [No such file or directory]
[2018-04-30 14:13:43.908748] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_245 lookup failed [No such file or directory]
[2018-04-30 14:13:43.915455] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_246 lookup failed [No such file or directory]
[2018-04-30 14:13:43.916884] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_254 lookup failed [No such file or directory]
[2018-04-30 14:13:43.931054] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_255 lookup failed [No such file or directory]
[2018-04-30 14:13:43.931458] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_258 lookup failed [No such file or directory]
[2018-04-30 14:13:43.937588] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-dist-dht: Migrate file failed: /new_cat_260 lookup failed [No such file or directory]



Are you sure the renames had completed before the remove-brick?