This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 800326 - Data corruption in stripe translator
Data corruption in stripe translator
Status: CLOSED WONTFIX
Product: GlusterFS
Classification: Community
Component: stripe (Show other bugs)
3.2.5
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: shishir gowda
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-06 05:26 EST by Alexander Bersenev
Modified: 2013-12-08 20:29 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-03-07 06:36:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The vol file (1.42 KB, application/octet-stream)
2012-03-06 05:26 EST, Alexander Bersenev
no flags Details
The output of dd (19.33 KB, application/octet-stream)
2012-03-06 05:29 EST, Alexander Bersenev
no flags Details
Dump of log file. (4.13 KB, text/plain)
2012-03-07 02:53 EST, Alexander Bersenev
no flags Details
Commands executed (2.23 KB, text/plain)
2012-03-07 03:40 EST, Alexander Bersenev
no flags Details
Patch to increase min stripe size to 16384 (1.15 KB, patch)
2012-03-09 00:37 EST, Alexander Bersenev
no flags Details | Diff

  None (edit)
Description Alexander Bersenev 2012-03-06 05:26:44 EST
Created attachment 567903 [details]
The vol file

Description of problem:
Stripe translator does not properly order the data and sometimes corrupts the data. 

Version-Release number of selected component (if applicable):
Tested on:
1. glusterfs 3.2.5 built on Nov 15 2011 08:43:14 (RHEL6)
2. glusterfs 3git built on Feb 13 2012 14:33:20 (Gentoo)

How reproducible:
Always.

Steps to Reproduce:
1. Create a new volume with stripe=4 and block-size=4096 with all bricks on one node. Turn off all caching and prefetch translators. My vol file is attached.
2. Mount it in some directory, for example in /gluster/fs/.
3. Create the test file in mounted filesystem:
perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096
4. Try to read it with large blocksize:
dd if=s4096 bs=1000000 2>/dev/null | hexdump -C
  
Actual results:
The data is corrupted. Same read command gives different read results:
# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
2146abf3b6cbc7e90a92aa55839da659  -
# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
f85c27c65320ad13906bf09710feaa7a  -
dd if=s4096 bs=1000000 2>/dev/null | md5sum 
3861b7a934d2d2bb77fdb6a9575d54de  -

If block size is small, data not corrupts:
# dd if=s4096 bs=100 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
# dd if=s4096 bs=100 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
# dd if=s4096 bs=100 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -

Expected results:
The data is not corrupted.

Additional info:
Comment 1 Alexander Bersenev 2012-03-06 05:29:59 EST
Created attachment 567906 [details]
The output of dd
Comment 2 shylesh 2012-03-06 07:11:12 EST
Backend :xfs
OS: RHEL 6.1


1. created a stripe volume with count 4
 
Volume Name: stripe4
Type: Stripe
Volume ID: 9a43501a-d782-4f8a-9147-4dcf66a123b7
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: RHEL6.1:/export/sdb/stripe41
Brick2: RHEL6.1:/export/sdb/stripe42
Brick3: RHEL6.1:/export/sdb/stripe43
Brick4: RHEL6.1:/export/sdb/stripe44
Options Reconfigured:
diagnostics.count-fop-hits: off
diagnostics.latency-measurement: off
performance.stat-prefetch: off
cluster.stripe-block-size: 4MB


2. Mounted the volume and created a file 
  perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096



 
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -
[root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 
015a232752a53b9195fd86562907bcea  -



We are not able to reproduce this issue, could you please provide the logs.
Comment 3 Alexander Bersenev 2012-03-06 07:19:20 EST
Please, retry with blocksize=4096 bytes, not kbytes.
Comment 4 shishir gowda 2012-03-07 02:30:03 EST
We are looking into the bug.

In the mean time can you please disable io-cache xlator, and read-ahead xlator.
This fixes the problem for us.
Comment 5 Alexander Bersenev 2012-03-07 02:53:33 EST
Created attachment 568177 [details]
Dump of log file.

I've turned off all translators(the log is in attached file log.txt), but problem does not disappear.
Comment 6 Alexander Bersenev 2012-03-07 03:40:40 EST
Created attachment 568196 [details]
Commands executed

Here is a dump of executed commands beginning from volume creation.
Comment 7 shishir gowda 2012-03-07 05:37:15 EST
Hi Alexander,

We found the issue to be with related to iobufs

We can send a maximum of GF_IOBREF_IOBUF_COUNT (16) in the responses(read in this case). Each request tends to be of 128k.

By having stripe-block-size set to 4k, we would need 32 bufs, hence we end up loosing 16 bufs.

Stripe was designed to handle 128k or higher block sizes.

I will change the stripe-block-size min limit to 16k to prevent this scenario.

If needed I can give you a patch which hikes this buf count to 32.
Comment 8 Alexander Bersenev 2012-03-07 06:10:35 EST
With hardcoded fuse's max_read = 131072 it seems to be an acceptable solution. I don't see data corruptions anymore.

I think this bug can be closed.
Comment 9 shishir gowda 2012-03-07 06:36:42 EST
As stripe is designed to work with larger block size, we will not fix this bug.
Comment 10 Alexander Bersenev 2012-03-09 00:37:56 EST
Created attachment 568818 [details]
Patch to increase min stripe size to 16384

Here is a patch to change the stripe-block-size min limit to 16k. 

As stripe is designed to work with larger block size and _doesn't work_ with lower values, please, fix this bug by raising the minimum block size limit.

Note You need to log in before you can comment on or make changes to this bug.