Bug 800326
Summary: | Data corruption in stripe translator | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Alexander Bersenev <bay> | ||||||||||||
Component: | stripe | Assignee: | shishir gowda <sgowda> | ||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 3.2.5 | CC: | gluster-bugs, nsathyan, shmohan | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2012-03-07 11:36:42 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Created attachment 567906 [details]
The output of dd
Backend :xfs OS: RHEL 6.1 1. created a stripe volume with count 4 Volume Name: stripe4 Type: Stripe Volume ID: 9a43501a-d782-4f8a-9147-4dcf66a123b7 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: RHEL6.1:/export/sdb/stripe41 Brick2: RHEL6.1:/export/sdb/stripe42 Brick3: RHEL6.1:/export/sdb/stripe43 Brick4: RHEL6.1:/export/sdb/stripe44 Options Reconfigured: diagnostics.count-fop-hits: off diagnostics.latency-measurement: off performance.stat-prefetch: off cluster.stripe-block-size: 4MB 2. Mounted the volume and created a file perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096 [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - [root@RHEL6 mnt]# dd if=s4096 bs=1000000 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - We are not able to reproduce this issue, could you please provide the logs. Please, retry with blocksize=4096 bytes, not kbytes. We are looking into the bug. In the mean time can you please disable io-cache xlator, and read-ahead xlator. This fixes the problem for us. Created attachment 568177 [details]
Dump of log file.
I've turned off all translators(the log is in attached file log.txt), but problem does not disappear.
Created attachment 568196 [details]
Commands executed
Here is a dump of executed commands beginning from volume creation.
Hi Alexander, We found the issue to be with related to iobufs We can send a maximum of GF_IOBREF_IOBUF_COUNT (16) in the responses(read in this case). Each request tends to be of 128k. By having stripe-block-size set to 4k, we would need 32 bufs, hence we end up loosing 16 bufs. Stripe was designed to handle 128k or higher block sizes. I will change the stripe-block-size min limit to 16k to prevent this scenario. If needed I can give you a patch which hikes this buf count to 32. With hardcoded fuse's max_read = 131072 it seems to be an acceptable solution. I don't see data corruptions anymore. I think this bug can be closed. As stripe is designed to work with larger block size, we will not fix this bug. Created attachment 568818 [details]
Patch to increase min stripe size to 16384
Here is a patch to change the stripe-block-size min limit to 16k.
As stripe is designed to work with larger block size and _doesn't work_ with lower values, please, fix this bug by raising the minimum block size limit.
|
Created attachment 567903 [details] The vol file Description of problem: Stripe translator does not properly order the data and sometimes corrupts the data. Version-Release number of selected component (if applicable): Tested on: 1. glusterfs 3.2.5 built on Nov 15 2011 08:43:14 (RHEL6) 2. glusterfs 3git built on Feb 13 2012 14:33:20 (Gentoo) How reproducible: Always. Steps to Reproduce: 1. Create a new volume with stripe=4 and block-size=4096 with all bricks on one node. Turn off all caching and prefetch translators. My vol file is attached. 2. Mount it in some directory, for example in /gluster/fs/. 3. Create the test file in mounted filesystem: perl -e 'print $_ x 4096 for(0..9,'a'..'z')' > s4096 4. Try to read it with large blocksize: dd if=s4096 bs=1000000 2>/dev/null | hexdump -C Actual results: The data is corrupted. Same read command gives different read results: # dd if=s4096 bs=1000000 2>/dev/null | md5sum 2146abf3b6cbc7e90a92aa55839da659 - # dd if=s4096 bs=1000000 2>/dev/null | md5sum f85c27c65320ad13906bf09710feaa7a - dd if=s4096 bs=1000000 2>/dev/null | md5sum 3861b7a934d2d2bb77fdb6a9575d54de - If block size is small, data not corrupts: # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - # dd if=s4096 bs=100 2>/dev/null | md5sum 015a232752a53b9195fd86562907bcea - Expected results: The data is not corrupted. Additional info: