Bug 1187547 - self-heal-algorithm with option "full" doesn't heal sparse files correctly
Summary: self-heal-algorithm with option "full" doesn't heal sparse files correctly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1166020 1190633
Blocks: 1167012 1179563 glusterfs-3.6.3
TreeView+ depends on / blocked
 
Reported: 2015-01-30 12:08 UTC by Ravishankar N
Modified: 2016-02-04 15:20 UTC (History)
5 users (show)

Fixed In Version: glusterfs-v3.6.3
Doc Type: Bug Fix
Doc Text:
Clone Of: 1166020
Environment:
Last Closed: 2016-02-04 15:20:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Ravishankar N 2015-01-30 12:08:07 UTC
+++ This bug was initially created as a clone of Bug #1166020 +++

Description of problem:
Here is Lindsay Mathieson's email on gluster-users with the description of the problems she faced.

On 11/18/2014 05:35 PM, Lindsay Mathieson wrote:
>
> I have a VM image which is a sparse file - 512GB allocated, but only 32GB used.
>
>  
>
>  
>
> root@vnb:~# ls -lh /mnt/gluster-brick1/datastore/images/100
>
> total 31G
>
> -rw------- 2 root root 513G Nov 18 19:57 vm-100-disk-1.qcow2
>
>  
>
>  
>
> I switched to full sync and rebooted.
>
>  
>
> heal was started on the image and it seemed to be just transfering the full file from node vnb to vng. iftop showed bandwidth at 500 Mb/s
>
>  
>
> Eventually the cumulative transfer got to 140GB which seemed odd as the real file size was 31G. I logged onto the second node (vng) and the *real* file size size was up to 191Gb.
>
>  
>
> It looks like the heal is not handling sparse files, rather it is transferring empty bytes to make up the allocated size. Thats a serious problem for the common habit of over committing your disk space with vm images. Not to mention the inefficiency.
Ah! this problem doesn't exist in diff self-heal :-(. Because the checksums of the files will match in the sparse regions. In full self-heal it just reads from the source file and writes to the sink file. What we can change there is if the file is a sparse file and the data that is read is all zeros (read will return all zeros as data in the sparse region) then read the stale file and compare if it is also all zeros. If both are 'zeros' then skip the write. I also checked that if the sparse file is created while the other brick is down, then also it preserves the holes(i.e. sparse regions). This problem only appears when both the files in their full size exist on both the bricks and full self-heal is done like here :-(.

Thanks for your valuable inputs. So basically you found 2 issues. I will raise 2 bugs one for each of the issues you found. I can CC you to the bugzilla, so that you can see the update on the bug once it is fixed. Do you want to be CCed to the bug?

Pranith
>
>  
>
> thanks,
>
>  
>
> -- 
>
> Lindsay
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



Version-Release number of selected component (if applicable):
Reported on 3.5.2 but issue exists everywhere.

How reproducible:
always

Steps to Reproduce:
1. Create a plain/distributed replicate volume.
2. Create a sparse VM
3. Configure the volume with cluster.data-self-heal-algorithm full.
4. Bring a brick down and modify data in the VM.
5. Bring the brick back up.
6. This will write sparse regions with data nullifying their usage for sparse VMs.

--- Additional comment from Anand Avati on 2015-01-23 00:52:55 EST ---

REVIEW: http://review.gluster.org/9480 (afr: Don't write to sparse regions of sink.) posted (#1) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Anand Avati on 2015-01-28 07:03:52 EST ---

REVIEW: http://review.gluster.org/9480 (afr: Don't write to sparse regions of sink.) posted (#2) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Anand Avati on 2015-01-29 08:05:33 EST ---

REVIEW: http://review.gluster.org/9480 (afr: Don't write to sparse regions of sink.) posted (#3) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Anand Avati on 2015-01-30 02:01:10 EST ---

REVIEW: http://review.gluster.org/9480 (afr: Don't write to sparse regions of sink.) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-01-30 07:02:49 EST ---

COMMIT: http://review.gluster.org/9480 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 0f84f8e8048367737a2dd6ddf0c57403e757441d
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Fri Jan 23 11:12:54 2015 +0530

    afr: Don't write to sparse regions of sink.
    
    Problem:
    When data-self-heal-algorithm is set to 'full', shd just reads from
    source and writes to sink. If source file happened to be sparse (VM
    workloads), we end up actually writing 0s to the corresponding regions
    of the sink causing it to lose its sparseness.
    
    Fix:
    If the source file is sparse, and the data read from source and sink are
    both zeros for that range, skip writing that range to the sink.
    
    Change-Id: I787b06a553803247f43a40c00139cb483a22f9ca
    BUG: 1166020
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/9480
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>

Comment 1 Anand Avati 2015-01-30 12:13:11 UTC
REVIEW: http://review.gluster.org/9515 (afr: Don't write to sparse regions of sink.) posted (#1) for review on release-3.6 by Ravishankar N (ravishankar@redhat.com)

Comment 2 Anand Avati 2015-02-03 13:25:10 UTC
COMMIT: http://review.gluster.org/9515 committed in release-3.6 by Raghavendra Bhat (raghavendra@redhat.com) 
------
commit f397d7edb85c1e4b78c4cac176dc8a0afe8cf9a8
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Fri Jan 23 11:12:54 2015 +0530

    afr: Don't write to sparse regions of sink.
    
    Backport of http://review.gluster.org/9480
    
    Problem:
    When data-self-heal-algorithm is set to 'full', shd just reads from
    source and writes to sink. If source file happened to be sparse (VM
    workloads), we end up actually writing 0s to the corresponding regions
    of the sink causing it to lose its sparseness.
    
    Fix:
    If the source file is sparse, and the data read from source and sink are
    both zeros for that range, skip writing that range to the sink.
    
    Change-Id: Id23d953fe2c8c64cde5ce3530b52ef91a7583891
    BUG: 1187547
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/9515
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>

Comment 3 Pranith Kumar K 2015-02-19 06:29:02 UTC
Lindsay,
      Want to share your experience with this build so that we can see if there is anything more we need to address before closing this issue for good?

Pranith.

Comment 4 Kaushal 2016-02-04 15:20:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v3.6.3, please open a new bug report.

glusterfs-v3.6.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2015-April/021669.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.