Bug 1000948
Summary: | Dist-geo-rep: Crawling + processing for 14 million pre-existing files take very long time | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Neependra Khare <nkhare> | |
Component: | geo-replication | Assignee: | Venky Shankar <vshankar> | |
Status: | CLOSED ERRATA | QA Contact: | Neependra Khare <nkhare> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 2.1 | CC: | aavati, amarts, asriram, bengland, csaba, dshaks, kcleveng, kparthas, psriniva, racpatel, rhs-bugs, sdharane, vagarwal, vbhat, vkoppad | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.0.39rhs | Doc Type: | Bug Fix | |
Doc Text: |
Previously, when a Geo-replication session was started there were tens of millions of files on the master volume which took long time to observe the updates on the slave mount point.Now, with this update this has been fixed.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1024465 (view as bug list) | Environment: | ||
Last Closed: | 2013-11-27 15:33:05 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 957769, 1024465 |
Description
Neependra Khare
2013-08-26 07:48:17 UTC
- Sosreport from one if the master node is available at :- http://perf1.perf.lab.eng.bos.redhat.com/nkhare/bugzilla/1000948/sosreport-gprfs033.1000948-20130826035112-c79c.tar.xz - strace output 1. strace -s 500 -f -p <pid> ouput http://perf1.perf.lab.eng.bos.redhat.com/nkhare/bugzilla/1000948/gsync.strace 2. strace -s 500 -fxvt -p <pid> ouput http://perf1.perf.lab.eng.bos.redhat.com/nkhare/bugzilla/1000948/gsync1.strace 3. XSYNC changelog file. http://perf1.perf.lab.eng.bos.redhat.com/nkhare/bugzilla/1000948/XSYNC-CHANGELOG.1377249839.tar.gz Kaleb Keithley suggested a different approach. If you really have a lot of data to move to a remote site for the initial geo-rep sync, maybe we shouldn't be shipping it over the WAN. Some storage vendors such as EMC physically transport the data, this sounds bizarre and old-fashioned, but when you have to move terabytes of data, it actually can be faster and cheaper in some cases. This doesn't invalidate the suggestions above for enhancing the product, but the point is that there are physical limits to how much data you can transport over the WAN in an initial sync. I think he was suggesting that you'd take a pair of servers intended for the remote site and ship them to the master site, attach them to the same network as the master (with much higher throughput because of that), make them a geo-rep slave, do the initial sync, then detach the 2 slave servers from the master, ship them to the remote site, reattach them, and then restart geo-rep. You can then add in the remaining slave nodes if any, and then run a rebalance on the slave volume. There are a lot of little steps missing in this, but I think it's feasible and might be a more practical solution in cases where there really is a lot of data in a volume before we decide to geo-replicate it. Can anyone see a reason why this wouldn't work? For example, in network configuration -- are Gluster volumes bound to particular set of IP addresses that aren't portable? Or can you re-locate a Gluster volume to a different set of IP addresses without destroying it? The patch for this bug which is available in glusterfs-3.4.0.35rhs does pipeline the sync and crawl, but individually crawl and sync are still single-threaded. I've asked Neependra to test out this patch and working on parallelizing both crawl (and generating xsync changelogs) and syncing. Over the next few days, I'll be updating this bug on the improvements and the current state of patches. Are the crawler changes (incorporating xsync like crawling) being tracked in this same bug? I have tested with the smaller dataset and saw that data transfer starts as soon as Geo-Rep starts, rather than waiting for entire initial crawl to finish. Improvements done as part of this bug: * batched processing of xsync (or initial) crawl data set: -> working as per comment #7 * changes to the way changelog journaling is done, so we don't need to perform any 'stat()' or 'getxattr()' on the mountpoint. (but done directly on the brick). * removed the code to perform the extra stat() on slave mount before entry creation. The whole crawler code to become multi threaded, parallel crawling is not tracked as part of this bug. (bug 1029799 is filed for this enhancement) (In reply to Ben England from comment #4) > > There are a lot of little steps missing in this, but I think it's feasible > and might be a more practical solution in cases where there really is a lot > of data in a volume before we decide to geo-replicate it. > > Can anyone see a reason why this wouldn't work? For example, in network > configuration -- are Gluster volumes bound to particular set of IP addresses > that aren't portable? Or can you re-locate a Gluster volume to a different > set of IP addresses without destroying it? Ben, We surely tested this, and had this as an use-case. Bug 1005155 is filed for it, and the steps are documented. -Amar Verified on the build glusterfs-3.4.0.43rhs With this build, after the geo-rep is started with pre-populated data, it does xsync crawl and creates XSCNC-CHANGELOGS with ~8K entries. These XSYNC-CHANGELOGS are processed and files are synced to slave. We don't have to wait for the whole file system to be crawled, to see the synced files to slave. Since this bug only related to batched syncing,moving it to verified, though this mode of syncing is not as fast as changelogs and that is being tracked with the Bug 1029799 as mentioned by amar in comment 9. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |