Bug 1470967 - [GSS] geo-replication failed due to ENTRY failures on slave volume [NEEDINFO]
[GSS] geo-replication failed due to ENTRY failures on slave volume
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.4.0
Assigned To: Kotresh HR
Depends On:
Blocks: 1503135
  Show dependency treegraph
Reported: 2017-07-14 03:24 EDT by Abhishek Kumar
Modified: 2018-05-23 04:03 EDT (History)
10 users (show)

See Also:
Fixed In Version: glusterfs-3.12.2-1
Doc Type: If docs needed, set a value
Doc Text:
Problem: Geo-rep expects gfid to be same on both master and slave. Geo-rep fails to sync the entry on to slave because a file already exists with different gfid. Cause: Changelogs of each brick are processed parallely. There could be race where already deleted file on master is still present on slave and fails sync the new file with same name created on master. Fix: The entry failures because of gfid mismatch are gracefully handled by verifying it on the master.
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bkunal: needinfo? (rallan)

Attachments (Terms of Use)
rsync error log (7.29 MB, application/x-gzip)
2017-07-14 08:19 EDT, Abhishek Kumar
no flags Details
sync error (70.50 KB, application/x-gzip)
2017-07-14 08:27 EDT, Abhishek Kumar
no flags Details

  None (edit)
Description Abhishek Kumar 2017-07-14 03:24:36 EDT
Description of problem:

geo-replication failed due to ENTRY failures on slave volume

Version-Release number of selected component (if applicable):


How reproducible:

Customer Environment
Steps to Reproduce:
1. Stop the geo-rep session
2. Rename the directory on master volume
3. Create a new directory on master volume with previous name
4. Start the geo-rep session again

Actual results:

Geo-replication sync stopped and logs report about ENTRY failures 

Expected results:

Geo-replication sync should handle the new directory as well as renamed directory

Additional info:
Comment 4 Abhishek Kumar 2017-07-14 08:19 EDT
Created attachment 1298294 [details]
rsync error log
Comment 5 Abhishek Kumar 2017-07-14 08:27 EDT
Created attachment 1298297 [details]
sync error
Comment 43 Rahul Hinduja 2017-09-21 15:02:57 EDT
Following is the summary of qualification done on the build mentioned at comment 35:

Hybrid Rename Scenarios:
=> If a file (f1) is renamed to (f2) in the hybrid crawl, at slave we see both the files (Original f1 and Renamed file f2) as a hardlink to each other with the same gfid. Any subsequent creation for file f1 with different gfid will correct the slave. 

However if the dir is created with f1, it doesnt correct it at slave. At slave, f1 remains a file. This would require manual efforts to clean it.

=> If a directory is renamed in the hybrid crawl, at slave renamed directory do not appear. However all the data populated at renamed directory at master gets into the the original directory at slave (This is already as designed and known). If the directory is recreated at the master with different gfid, the slave gets the correction too. 

Customer Workload:

Customer workloads involves the following pattern:

A. Create a file f1 => It gets sync to slave
B. Hardlink a file f1 to f2 => Hardlink file syncs to slave
C. Delete the file f1 => f1 stays at slave
D. Rename file f2 to f3 => f2 stays at slave, f1 stays from C, f3 is synced from D.

In the above case, at slave we consume more inodes as they are all hardlinks. No data loss but the penalty of inodes. 

Workload mentioned in comment 10 works:

1. CREATE DIR1  (gfid = g1)
2. RENAME DIR1 DIR1.1  (gfid = g1)
3. CREATE DIR1  (gfid = g2)

Additional testing carried on the builds:

=> different fops (create,chmod,chown,chgrp,hardlink,symlink,rename,truncate) during hybrid and changelog crawl.
=> Brick Scenarios: Add-brick, remove-brick, brick kill scenarios
=> Upgrade from the 3.2.0 to hotfix build.
=> Creating a directory or a file at slave and then to be synced via master {Negative case, to simulate another scenario of gfid mismatch}

Above testing covers the planned testing for the hotfix. However please note the following: 

1. The hotfix have been qualified for very specific scenarios which are mentioned above.
2. There still exists the ambiguity if the file or directory is not recreated with the same name after rename during Hybrid crawl.
3. Only limited regression test coverage is carried.

Please set the right expectations to the customer with this hotfix build. 

===== Short Summary as part of recently agreed process ==========

QE has qualified the hotfix build mentioned at comment 35 against the rename cases during hybrid crawl. Create, Rename, Create of a same file/directory works with the build, also sanity check is carried on the hotfix for ensuring the stability of the build along with the upgrade path validation. 


Note You need to log in before you can comment on or make changes to this bug.