Red Hat Bugzilla – Bug 981456
RFE: Please create an "initial offline bulk load" tool for data, for GlusterFS
Last modified: 2015-10-22 11:46:38 EDT
Description of problem:
For new adopters of GlusterFS with a large existing data set, the
initial time to load their data into Gluster can take days.
We should be able to improve this significantly by creating a
specialised "bulk data load" tool for Gluster.
So far, people have been able to use rsync() to copy data to the
individual bricks in order to achieve something similar. But it
doesn't work with striped nor distributed volumes, where each host only
has one part of the total data.
This tool should support all Gluster volume types, including both
striped and distributed volumes, and set the extended attributes
correctly as it goes.
To support striped and distributed volumes, it should send the
appropriate file data to each host, as the gluster* processes
would expect to find it.
The tool may need to run while glusterd and glusterfs* are offline, so
no conflict occurs during operation.
The thinking behind this RFE is from awareness of similar tools for
SQL databases. With a SQL database, if a person loads a large data set
using the normal transaction processing (one transaction / commit per insert
statement, all triggers fired each time), the data load can take ages.
(also days) So, most SQL databases have the ability to do bulk loading,
which disables the transaction features (eg. one commit at start and end,
triggers deferred until end of bulk loading). Each SQL database project
/ vendor has their own way of doing it, but the high level principle is
Version-Release number of selected component (if applicable):
Upstream git master, as of Thur 4th July 2013.
Initial loading of data can take days.
Initial loading of data should not be significantly longer than what
an rsync() would achieve.
We should save a significant amount of time this way, by cutting out the
stat() calls (and similar) that would otherwise occur between hosts during
normal Gluster operation.
The only way I could see this being done safely is offline, and the only way I could see it being done efficiently is by maximizing local I/O on the source host(s). Message traffic is the thing that makes normal loading slow - not just stat etc. but extra operations around writes, lots of small messages to deal with lots of small files, and so on. Here's a very brief sketch of what such a tool, running on a source host, would have to do for each file.
* Parse the volfile.
* For each file, do the basic DHT elastic-hashing calculation (*not* an actual DHT lookup which might generate multiple messages) to figure out where the file "should" go.
* Do the same to calculate locations through AFR and stripe.
* Add the relevant file contents, plus newly generated GFIDs, to a series of archive files (e.g. tar/cpio), one per brick.
* Ship the archive files in bulk to the bricks.
* On each brick, "execute" the archive by creating and writing the individual files, including creation of necessary xattrs other than GFID.
This is only the tip of the iceberg. Many other issues need to be considered and dealt with, such as the need to ensure that other copies of a file do *not* exist on DHT subvolumes other than the one we're populating as part of the bulk load. The stub/white-out files needed for this, plus the possibility of sparse files on striped volumes, probably means that none of the existing archive-file formats are sufficient and we'll need to create our own. :(
My biggest concern is verifying the correctness of the result. The bulk-load tool would have to be very closely locked to a specific version of the regular I/O code, because any tiny change to that I/O code could make the bulk-load result "incorrect" in subtle ways that could potentially lead to data loss. The QA and support risks need to be very carefully considered, and more than ordinary efforts made to mitigate them, before we could even consider supporting such a bulk-load tool itself.
Good thoughts. :)
I wonder if adjusting the Gluster I/O code, so it can also be called by other utils, would help there. (eg making that code into a shared library?)
Then, if we ship such a bulk load tool as part of every Gluster release, it would be automatically using the correct (matching) I/O code.
Feature requests make most sense against the 'mainline' release, there is no ETA for an implementation and requests might get forgotten when filed against a particular version.
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.
If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.