Description of problem: The hardlinks to to files were created after stopping a geo-rep session and then started geo-rep session, the first xsync crawl will sync hardlinks as separate files, not as hardlinks. Consequently, total disk usage on slave will be greater than master. Version-Release number of selected component (if applicable): 3.4.0.12rhs.beta4-1.el6rhs.x86_64 How reproducible: Observed once. Steps to Reproduce: 1.Create and start a geo-rep relationship between master(DIST-REP) and slave. 2.Create files using the command, ./crefi.py -n 10 --multi -b 10 -d 10 --random --max=500K --min=10 <MNT_PNT> 3.Let it sync to slave. 4. Stop the geo-rep session, 5. create hardlinks to all the files, using the command ./crefi.py -n 10 --multi -b 10 -d 10 --random --max=500K --min=10 --fop=hardlink <MNT_PNT> 6. start the geo-rep session. 7. Check if it has completed syncing by checking the number of files on master and slave. Actual results: hardlinks to files are not synced as actual hardlinks by first xsync crawl. Expected results: Hardlinks should be synced as, actual hardlinks , not as separate files. Additional info:
First of all, the hybrid crawl does not handle hardlinks (which you mention in the "Expected Result" section). The issue is actually that the entry creation for the hardlinks should result in a NOP as the gfid already exist on the slave. So, keeping this bug in open state.
the 'hybrid crawl' mechanism we use doen't capture the hardlinks yet, and hence we don't have any option now. It was similar behavior (hardlinks used to be created as separate files) with earlier implementation.
Venky, can you review this patch? diff --git a/geo-replication/syncdaemon/master.py b/geo-replication/syncdaemon/master.py index f18a60e..5ed6796 100644 --- a/geo-replication/syncdaemon/master.py +++ b/geo-replication/syncdaemon/master.py @@ -885,7 +885,12 @@ class GMasterXsyncMixin(GMasterChangelogMixin): self.write_entry_change("E", [gfid, 'MKDIR', escape(os.path.join(pargfid, bname))]) self.crawl(e, xtr) elif stat.S_ISREG(mo): - self.write_entry_change("E", [gfid, 'CREATE', escape(os.path.join(pargfid, bname))]) + # if a file has a hardlink, create a Changelog entry as 'LINK' so the slave + # side will decide if to create the new entry, or to create link. + if st.st_nlink == 1: + self.write_entry_change("E", [gfid, 'CREATE', escape(os.path.join(pargfid, bname))]) + else: + self.write_entry_change("E", [gfid, 'LINK', escape(os.path.join(pargfid, bname))]) self.write_entry_change("D", [gfid]) elif stat.S_ISLNK(mo): self.write_entry_change("E", [gfid, 'SYMLINK', escape(os.path.join(pargfid, bname))]) With this, we may just solve it anyways.
(In reply to Amar Tumballi from comment #4) [snip] > > With this, we may just solve it anyways. This could work if the original file was already synced to the slave and was not modified when gsyncd was not running. For freshly created files and the it's hardlink, nlink would be > 1, therefore having a 'LINK' entry with a gfid that does not yet exist on the slave. If the entry was in sync and was modified and had a hardlink created to it before the first crawl, then there would be two 'LINK' entries: one would probably be OK (the actual hardlink) but what about the other one?
(In reply to Venky Shankar from comment #5) > > This could work if the original file was already synced to the slave and was > not modified when gsyncd was not running. > This works for non-existent files on the slave too now. > For freshly created files and the it's hardlink, nlink would be > 1, > therefore having a 'LINK' entry with a gfid that does not yet exist on the > slave. > > If the entry was in sync and was modified and had a hardlink created to it > before the first crawl, then there would be two 'LINK' entries: one would > probably be OK (the actual hardlink) but what about the other one? We do a 'lstat()' on the gfid on slave side and then decide if we should do 'MKNOD' (ie, a fresh create), or 'LINK'. So, sending 2 LINKs instead of one MKNOD/CREATE and another LINK is fine. Also, this case is no different than below set of operation in changelog mode (if the operations end up in same CHANGELOG file. bash# cd /mount/point; touch a; ln a b; ----- https://code.engineering.redhat.com/gerrit/#/c/12110
This bug is very much related to bug 1001498 and hence should be treated as blocker.
Verified that with the build (given in fixed in version field) the steps in description works.
verified on glusterfs-3.4.0.34rhs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html