Bug 765482 (GLUSTER-3750)

Summary: Having bricks with different sizes can truncate files.
Product: [Community] GlusterFS Reporter: Jeff Shaw <jeff.shaw>
Component: replicateAssignee: Brian Foster <bfoster>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.4CC: gluster-bugs, jdarcy, jeff.shaw, rfortier, vbellur, vijay
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853690 (view as bug list) Environment:
Last Closed: 2013-07-24 17:19:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 853690    
Attachments:
Description Flags
Mounted gluster volume log. none

Description Jeff Shaw 2011-10-21 14:32:54 UTC
This is a data corruption bug for Gluster on CentOS 6.

If one or more bricks in a replication group are smaller than the others, a file written to the replication group can overflow the smaller bricks. Later, if the file is read from one of the smaller bricks, the truncated file is read with no read error reported to the user. Instead, due to bytes being unavailable for reading, an error should occur and the copy should fail. Also, the truncated copy is reported as having the length that would have been expected if the file were not corrupt.

Here is a real world example of some commands I ran, with the mount point's log attached. I have a file on two file servers under /brick0, which are mounted on gluster0-gw0 as /mnt/test. I've already copied a file to the gluster volume that is too big for gluster0-member0:/brick0.

[root@gluster0-member0 ~]# df -h /brick0
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_gluster0member0-lv_brick0
                      245M  245M     0 100% /brick0
[root@gluster0-member0 ~]# ls -lh /brick0
total 239M
-rwxr--r-- 1 jeff.shaw domain users 332M Sep  9 16:46 debian-live-508-amd64-rescue.iso

[root@gluster0-member1 ~]# df -h /brick0
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_gluster0member1-lv_brick0
                      485M  346M  114M  76% /brick0
[root@gluster0-member1 ~]# ls -lh /brick0
total 332M
-rwxr--r-- 1 15004 15007 332M Sep  9 16:46 debian-live-508-amd64-rescue.iso

[root@gluster0-member1 ~]# umount /brick0

[root@gluster0-gw0 ~]# ls -lh /mnt/test
total 332M
-rwxr--r-- 1 jeff.shaw domain users 332M Sep  9 16:46 debian-live-508-amd64-rescue.iso
[root@gluster0-gw0 ~]# cp /mnt/test/debian-live-508-amd64-rescue.iso .
[root@gluster0-gw0 ~]# ls -lh .
-rwxr--r--  1 root root 332M Oct 21 09:55 debian-live-508-amd64-rescue.iso

Considering that I unmounted the only brick that stored the entire contents of debian-live-508-amd64-rescue.iso, I don't see how this copy is possibly successful. The gluster file system should fail to read the file.

[root@gluster0-gw0 ~]# md5sum debian-live-508-amd64-rescue.iso
33ff3a930892fcd8df3bebb244a1e99d  debian-live-508-amd64-rescue.iso

[root@gluster0-member1 ~]# mount /brick0
[root@gluster0-member1 ~]# md5sum /brick0/debian-live-508-amd64-rescue.iso
512d97b6da025da413f730a5be7231ef  /brick0/debian-live-508-amd64-rescue.iso

Now, since I've mounted gluster0-member1:/brick0, which has the only good copy of the file, I would hope that the replication translator (or whatever handles this) would only read from that copy. After running md5sum a few times on the file, it appears to be doing what I expect.

I used a known good copy of the file to verify that the correct md5sum is 512d97b6da025da413f730a5be7231ef.

# gluster --version
glusterfs 3.2.4 built on Sep 30 2011 07:17:57

...

# uname -a
Linux gluster0-group0-brick1 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Jeff Darcy 2012-10-31 14:03:13 UTC
http://review.gluster.org/4144 was posted against cloned bug 853690.