Description of problem: ----------------------- With sharding enabled on the replica 3 volume, when an image file is copied from the local filesystem ( say /home/vm1.img ) to the fuse mounted sharded replica 3 volume, checksum of source file and the copied file no longer matches. Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHGS 3.2.0 ( glusterfs-3.8.4-18.el7rhgs ) How reproducible: ----------------- Always Steps to Reproduce: -------------------- 1. Enable sharding on the replica 3 volume and fuse mount the volume 2. Create a VM image file locally on the machine ( /home/vm1.img ) and install with OS 3. Calculate the sha256sum of the image file 4. Copy the file to the fuse mounted gluster volume. 5. Calculate the checksum of the copied image file Actual results: --------------- sha256sum of source file and the copied file is no longer matching Expected results: ----------------- sha256sum of source file and the copied file should be the same
The issue is size mismatch between the src and dst file upon `cp` where the dst file is on gluster mount and sharded, leading to checksum mismatch. This particular bug is in shard's aggregated size accounting and is exposed when there are parallel writes and extending truncate on the file. And `cp` does a truncate on the dst file before writing to it. The parallelization comes in when write-behind flushes cached writes and an extending truncate in parallel. Note that the data integrity of the vm image is *not* affected by this bug. What is affected is the size of the file. To confirm this, I truncated the extra bytes off the dst file to make its size same as size of src file and computed checksum again. In this case checksums did match. I asked Satheesaran also to verify the same and he confirmed it works. Basically md5sum,sha256sum etc fetch file size and read till the end of the file size. So in the dst file, the excess portion is filled with zeroes and checksum calculated on this region too. FWIW, the same checksum test exists in upstream master regression test suite - https://github.com/gluster/glusterfs/blob/master/tests/bugs/shard/bug-1272986.t. The reason it passes there consistently is because the script performs copy through `dd` as opposed to `cp`.
upstream patch : https://review.gluster.org/#/c/17184/
Tested with glusterfs-3.8.4-35.el7rhgs with the following steps 1. Created a raw image and got its sha256sum value 2. Copied the same image from localfilesystem on to the fuse mounted filesystem 3. Recalculated the sha256sum of the file sha256sum calculated at 1 and 3 remains the same
Laura, I'm not sure the edited doc text correctly captures the issue and the fix (unless I'm reading too much into every single detail). Let me provide some inline comments in any case: "The checksum of a file changed when sharding was enabled." >> So this happened after a file was copied from a local file system to a sharded gluster volume. "This occurred because the file's shards were not correctly truncated, which meant that the sharded files had a greater aggregate size than the original file, which affected the checksum." >> This wasn't related to shards not getting truncated. Shard does its own accounting of aggregated file size because the file is now getting split into multiple pieces and it's shard's responsibility to present an aggregated view of the file (including its size) to the application. The aggregated size is maintained in the form of an extended attribute on the base shard (zeroth shard). This accounting logic had a bug whenever the application sent a truncate file operation on a sharded file while writes were in progress. The incorrect accounting caused sharding translator to assign a higher aggregated size extended attribute value to the destination copy, causing its checksum to not match with the src file's on the local disk. "Files are now truncated correctly and the checksum of a sharded file matches its checksum before sharding." >> The bug in shard truncate fop is now fixed and as a result the aggregated size is now getting accounted correctly even when there are parallel writes and truncates. Feel free to ping me if you need any more clarifications. -Krutika
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774