Bug 809982 - truncation of offset in self-heal
truncation of offset in self-heal
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.2.5
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: Pranith Kumar K
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-04 15:29 EDT by Anand Avati
Modified: 2015-09-01 19:05 EDT (History)
5 users (show)

See Also:
Fixed In Version: 3.2.7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-11 00:25:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Anand Avati 2012-04-04 15:29:33 EDT
Description of problem: Casting a 64bit offset to a void * during self-heal causes corruption.


Version-Release number of selected component (if applicable): release-3.2


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--snip--


Hello,


 Michael and I ran a battery of testing today and
closed out the two issues identified below (of March
11).


FYI RE the "background-self-heal-only" patch;

 It has been tested now to our satisfaction and
 works as described/intended.


http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch



FYI RE the 2GB replicate error;

 >>>    2) Of the file that were replicated, not all were
 >>>          corrupted (capped at 2G -- note that we
 >>>          confirmed that this was the first 2G of the
 >>>          source file contents).
 >>>
 >>> So is there a known replicate issue with files
 >>> greater than 2GB?

 We have confirmed this issue and the referenced
 patch appears to correct the problem.  We were
 able to get one particular file to reliably fail at 2GB
 under GlusterFS 3.2.6, and then correctly
 transfer it and many other >2GB files, after
 applying this patch.

 The error stems from putting the off_t (64bit)
 offset value into a void * cookie value typecast
 as long (unsigned 32bit) and then restoring it into
 an off_t again.  The tip-off was a recurring offset
 of 18446744071562067968 seen in the logs. The
 effect is described well here;

http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64

 We can't explain why this issue was intermittent,
 and we're not sure if the rw_sh->offset is the
 correct 64bit offset to use.  However that offset
 appeared to match the cookie value in all tested
 pre-failure states.  Please advise if there is a
 better (more correct) off_t offset to use.


http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch



Thanks for your help,

--snip--
Comment 1 Pranith Kumar K 2012-06-11 06:49:30 EDT
We took the truncation of offset in 32 bit patch. background self-heal only option is not taken in.
Comment 2 Jeff Darcy 2012-10-31 09:30:04 EDT
ASSIGNED until the specific patches in Gerrit appear here.
Comment 3 Jeff Darcy 2012-10-31 16:56:24 EDT
http://review.gluster.org/3972 addresses this in 3.2, but I don't see anything for 3.3/master.
Comment 4 Pranith Kumar K 2012-11-01 03:30:21 EDT
The bug was present only on 3.2. The code was modified a lot because of granular-self-heal at the time of 3.3 which took away this bug in 3.3.
Comment 5 Vijay Bellur 2012-12-11 00:25:20 EST
Fixed in 3.2.7

Note You need to log in before you can comment on or make changes to this bug.