Bug 1447959
Summary: | Mismatch in checksum of the image file after copying to a new image file | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | |
Component: | sharding | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, kdhananj, rhinduja, rhs-bugs, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.3.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-26 | Doc Type: | Bug Fix | |
Doc Text: |
The checksum of a file could change when it was copied from a local file system to a volume with sharding enabled. If write and truncate operations were in progress simultaneously, the aggregated size was calculated incorrectly, resulting in a changed checksum. Aggregated file size is now calculated correctly in this circumstance.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1448299 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:41:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1448299 | |||
Bug Blocks: | 1411323, 1417151, 1485863 |
Description
SATHEESARAN
2017-05-04 10:17:02 UTC
The issue is size mismatch between the src and dst file upon `cp` where the dst file is on gluster mount and sharded, leading to checksum mismatch. This particular bug is in shard's aggregated size accounting and is exposed when there are parallel writes and extending truncate on the file. And `cp` does a truncate on the dst file before writing to it. The parallelization comes in when write-behind flushes cached writes and an extending truncate in parallel. Note that the data integrity of the vm image is *not* affected by this bug. What is affected is the size of the file. To confirm this, I truncated the extra bytes off the dst file to make its size same as size of src file and computed checksum again. In this case checksums did match. I asked Satheesaran also to verify the same and he confirmed it works. Basically md5sum,sha256sum etc fetch file size and read till the end of the file size. So in the dst file, the excess portion is filled with zeroes and checksum calculated on this region too. FWIW, the same checksum test exists in upstream master regression test suite - https://github.com/gluster/glusterfs/blob/master/tests/bugs/shard/bug-1272986.t. The reason it passes there consistently is because the script performs copy through `dd` as opposed to `cp`. upstream patch : https://review.gluster.org/#/c/17184/ Tested with glusterfs-3.8.4-35.el7rhgs with the following steps 1. Created a raw image and got its sha256sum value 2. Copied the same image from localfilesystem on to the fuse mounted filesystem 3. Recalculated the sha256sum of the file sha256sum calculated at 1 and 3 remains the same Laura, I'm not sure the edited doc text correctly captures the issue and the fix (unless I'm reading too much into every single detail). Let me provide some inline comments in any case: "The checksum of a file changed when sharding was enabled." >> So this happened after a file was copied from a local file system to a sharded gluster volume. "This occurred because the file's shards were not correctly truncated, which meant that the sharded files had a greater aggregate size than the original file, which affected the checksum." >> This wasn't related to shards not getting truncated. Shard does its own accounting of aggregated file size because the file is now getting split into multiple pieces and it's shard's responsibility to present an aggregated view of the file (including its size) to the application. The aggregated size is maintained in the form of an extended attribute on the base shard (zeroth shard). This accounting logic had a bug whenever the application sent a truncate file operation on a sharded file while writes were in progress. The incorrect accounting caused sharding translator to assign a higher aggregated size extended attribute value to the destination copy, causing its checksum to not match with the src file's on the local disk. "Files are now truncated correctly and the checksum of a sharded file matches its checksum before sharding." >> The bug in shard truncate fop is now fixed and as a result the aggregated size is now getting accounted correctly even when there are parallel writes and truncates. Feel free to ping me if you need any more clarifications. -Krutika Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |