Hide Forgot
Created attachment 567903 [details] The vol file Description of problem: Stripe translator does not properly order the data and sometimes corrupts the data. Version-Release number of selected component (if applicable): Tested on: 1. glusterfs 3.2.5 built on Nov 15 2011 08:43:14 (RHEL6) 2. glusterfs 3git built on Feb 13 2012 14:33:20 (Gentoo) How reproducible: Always. Steps to Reproduce: 1. Create a new volume with stripe=4 and block-size=4096 with all bricks on one node. Turn off all caching and prefetch translators. My vol file is attached. 2. Mount it in some directory, for example in /gluster/fs/. 3. Create the test file in mounted filesystem: perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096 4. Try to read it with large blocksize: dd if=s4096 bs=1000000 2>/dev/null | hexdump -C Actual results: The data is corrupted. Same read command gives different read results: # dd if=s4096 bs=1000000 2>/dev/null | md5sum 2146abf3b6cbc7e90a92aa55839da659 - # dd if=s4096 bs=1000000 2>/dev/null | md5sum f85c27c65320ad13906bf09710feaa7a - dd if=s4096 bs=1000000 2>/dev/null | md5sum 3861b7a934d2d2bb77fdb6a9575d54de - If block size is small, data not corrupts: # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - Expected results: The data is not corrupted. Additional info:
Created attachment 567906 [details] The output of dd
Backend :xfs OS: RHEL 6.1 1. created a stripe volume with count 4 Volume Name: stripe4 Type: Stripe Volume ID: 9a43501a-d782-4f8a-9147-4dcf66a123b7 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: RHEL6.1:/export/sdb/stripe41 Brick2: RHEL6.1:/export/sdb/stripe42 Brick3: RHEL6.1:/export/sdb/stripe43 Brick4: RHEL6.1:/export/sdb/stripe44 Options Reconfigured: diagnostics.count-fop-hits: off diagnostics.latency-measurement: off performance.stat-prefetch: off cluster.stripe-block-size: 4MB 2. Mounted the volume and created a file perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096 [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - We are not able to reproduce this issue, could you please provide the logs.
Please, retry with blocksize=4096 bytes, not kbytes.
We are looking into the bug. In the mean time can you please disable io-cache xlator, and read-ahead xlator. This fixes the problem for us.
Created attachment 568177 [details] Dump of log file. I've turned off all translators(the log is in attached file log.txt), but problem does not disappear.
Created attachment 568196 [details] Commands executed Here is a dump of executed commands beginning from volume creation.
Hi Alexander, We found the issue to be with related to iobufs We can send a maximum of GF_IOBREF_IOBUF_COUNT (16) in the responses(read in this case). Each request tends to be of 128k. By having stripe-block-size set to 4k, we would need 32 bufs, hence we end up loosing 16 bufs. Stripe was designed to handle 128k or higher block sizes. I will change the stripe-block-size min limit to 16k to prevent this scenario. If needed I can give you a patch which hikes this buf count to 32.
With hardcoded fuse's max_read = 131072 it seems to be an acceptable solution. I don't see data corruptions anymore. I think this bug can be closed.
As stripe is designed to work with larger block size, we will not fix this bug.
Created attachment 568818 [details] Patch to increase min stripe size to 16384 Here is a patch to change the stripe-block-size min limit to 16k. As stripe is designed to work with larger block size and _doesn't work_ with lower values, please, fix this bug by raising the minimum block size limit.