Bug 809982 - truncation of offset in self-heal
Summary: truncation of offset in self-heal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.2.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-04 19:29 UTC by Anand Avati
Modified: 2015-09-01 23:05 UTC (History)
5 users (show)

Fixed In Version: 3.2.7
Clone Of:
Environment:
Last Closed: 2012-12-11 05:25:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anand Avati 2012-04-04 19:29:33 UTC
Description of problem: Casting a 64bit offset to a void * during self-heal causes corruption.


Version-Release number of selected component (if applicable): release-3.2


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--snip--


Hello,


 Michael and I ran a battery of testing today and
closed out the two issues identified below (of March
11).


FYI RE the "background-self-heal-only" patch;

 It has been tested now to our satisfaction and
 works as described/intended.


http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch



FYI RE the 2GB replicate error;

 >>>    2) Of the file that were replicated, not all were
 >>>          corrupted (capped at 2G -- note that we
 >>>          confirmed that this was the first 2G of the
 >>>          source file contents).
 >>>
 >>> So is there a known replicate issue with files
 >>> greater than 2GB?

 We have confirmed this issue and the referenced
 patch appears to correct the problem.  We were
 able to get one particular file to reliably fail at 2GB
 under GlusterFS 3.2.6, and then correctly
 transfer it and many other >2GB files, after
 applying this patch.

 The error stems from putting the off_t (64bit)
 offset value into a void * cookie value typecast
 as long (unsigned 32bit) and then restoring it into
 an off_t again.  The tip-off was a recurring offset
 of 18446744071562067968 seen in the logs. The
 effect is described well here;

http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64

 We can't explain why this issue was intermittent,
 and we're not sure if the rw_sh->offset is the
 correct 64bit offset to use.  However that offset
 appeared to match the cookie value in all tested
 pre-failure states.  Please advise if there is a
 better (more correct) off_t offset to use.


http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch



Thanks for your help,

--snip--

Comment 1 Pranith Kumar K 2012-06-11 10:49:30 UTC
We took the truncation of offset in 32 bit patch. background self-heal only option is not taken in.

Comment 2 Jeff Darcy 2012-10-31 13:30:04 UTC
ASSIGNED until the specific patches in Gerrit appear here.

Comment 3 Jeff Darcy 2012-10-31 20:56:24 UTC
http://review.gluster.org/3972 addresses this in 3.2, but I don't see anything for 3.3/master.

Comment 4 Pranith Kumar K 2012-11-01 07:30:21 UTC
The bug was present only on 3.2. The code was modified a lot because of granular-self-heal at the time of 3.3 which took away this bug in 3.3.

Comment 5 Vijay Bellur 2012-12-11 05:25:20 UTC
Fixed in 3.2.7


Note You need to log in before you can comment on or make changes to this bug.