Description of problem: Casting a 64bit offset to a void * during self-heal causes corruption. Version-Release number of selected component (if applicable): release-3.2 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --snip-- Hello, Michael and I ran a battery of testing today and closed out the two issues identified below (of March 11). FYI RE the "background-self-heal-only" patch; It has been tested now to our satisfaction and works as described/intended. http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch FYI RE the 2GB replicate error; >>> 2) Of the file that were replicated, not all were >>> corrupted (capped at 2G -- note that we >>> confirmed that this was the first 2G of the >>> source file contents). >>> >>> So is there a known replicate issue with files >>> greater than 2GB? We have confirmed this issue and the referenced patch appears to correct the problem. We were able to get one particular file to reliably fail at 2GB under GlusterFS 3.2.6, and then correctly transfer it and many other >2GB files, after applying this patch. The error stems from putting the off_t (64bit) offset value into a void * cookie value typecast as long (unsigned 32bit) and then restoring it into an off_t again. The tip-off was a recurring offset of 18446744071562067968 seen in the logs. The effect is described well here; http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64 We can't explain why this issue was intermittent, and we're not sure if the rw_sh->offset is the correct 64bit offset to use. However that offset appeared to match the cookie value in all tested pre-failure states. Please advise if there is a better (more correct) off_t offset to use. http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch Thanks for your help, --snip--
We took the truncation of offset in 32 bit patch. background self-heal only option is not taken in.
ASSIGNED until the specific patches in Gerrit appear here.
http://review.gluster.org/3972 addresses this in 3.2, but I don't see anything for 3.3/master.
The bug was present only on 3.2. The code was modified a lot because of granular-self-heal at the time of 3.3 which took away this bug in 3.3.
Fixed in 3.2.7