Description of problem: geo-replication failed due to ENTRY failures on slave volume Version-Release number of selected component (if applicable): glusterfs-3.8.4-18.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-18.el7rhgs.x86_64 How reproducible: Customer Environment Steps to Reproduce: 1. Stop the geo-rep session 2. Rename the directory on master volume 3. Create a new directory on master volume with previous name 4. Start the geo-rep session again Actual results: Geo-replication sync stopped and logs report about ENTRY failures Expected results: Geo-replication sync should handle the new directory as well as renamed directory Additional info:
Created attachment 1298294 [details] rsync error log
Created attachment 1298297 [details] sync error
Following is the summary of qualification done on the build mentioned at comment 35: Hybrid Rename Scenarios: ======================== => If a file (f1) is renamed to (f2) in the hybrid crawl, at slave we see both the files (Original f1 and Renamed file f2) as a hardlink to each other with the same gfid. Any subsequent creation for file f1 with different gfid will correct the slave. However if the dir is created with f1, it doesnt correct it at slave. At slave, f1 remains a file. This would require manual efforts to clean it. => If a directory is renamed in the hybrid crawl, at slave renamed directory do not appear. However all the data populated at renamed directory at master gets into the the original directory at slave (This is already as designed and known). If the directory is recreated at the master with different gfid, the slave gets the correction too. Customer Workload: ================== Customer workloads involves the following pattern: A. Create a file f1 => It gets sync to slave B. Hardlink a file f1 to f2 => Hardlink file syncs to slave C. Delete the file f1 => f1 stays at slave D. Rename file f2 to f3 => f2 stays at slave, f1 stays from C, f3 is synced from D. In the above case, at slave we consume more inodes as they are all hardlinks. No data loss but the penalty of inodes. Workload mentioned in comment 10 works: ======================================= 1. CREATE DIR1 (gfid = g1) 2. RENAME DIR1 DIR1.1 (gfid = g1) 3. CREATE DIR1 (gfid = g2) Additional testing carried on the builds: ========================================= => different fops (create,chmod,chown,chgrp,hardlink,symlink,rename,truncate) during hybrid and changelog crawl. => Brick Scenarios: Add-brick, remove-brick, brick kill scenarios => Upgrade from the 3.2.0 to hotfix build. => Creating a directory or a file at slave and then to be synced via master {Negative case, to simulate another scenario of gfid mismatch} Above testing covers the planned testing for the hotfix. However please note the following: 1. The hotfix have been qualified for very specific scenarios which are mentioned above. 2. There still exists the ambiguity if the file or directory is not recreated with the same name after rename during Hybrid crawl. 3. Only limited regression test coverage is carried. Please set the right expectations to the customer with this hotfix build. ===== Short Summary as part of recently agreed process ========== QE has qualified the hotfix build mentioned at comment 35 against the rename cases during hybrid crawl. Create, Rename, Create of a same file/directory works with the build, also sanity check is carried on the hotfix for ensuring the stability of the build along with the upgrade path validation. =================================================================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607