Bug 763609 - (GLUSTER-1877) data corruption while running arequal.
data corruption while running arequal.
Product: GlusterFS
Classification: Community
Component: rdma (Show other bugs)
All Linux
low Severity high
: ---
: ---
Assigned To: Raghavendra G
Depends On:
  Show dependency treegraph
Reported: 2010-10-09 01:23 EDT by Raghavendra G
Modified: 2015-12-01 11:45 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: RTP
Mount Type: fuse
Documentation: DNR
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
diff of hexdump of git-repo-config (392.50 KB, application/octet-stream)
2010-10-08 22:25 EDT, Raghavendra G
no flags Details

  None (edit)
Description Raghavendra G 2010-10-08 22:25:35 EDT
Created attachment 342
Comment 1 Raghavendra G 2010-10-09 01:23:55 EDT
Test: arequal.sh /usr /mnt/distribute/usr
Bug: checksum of regular files was different.
Configuration: found it on both distribute and single point to point setups with all performance translators.
Consistency in reproducing the issue: The bug is not reproducible conistently
Is issue found on sockets: Test not run with sockets as transport.

It was found that one binary git-repo-config was corrupted. git-repo-config was the file on glusterfs mount point.

raghu@booradley:~/work/user-issues/bugs/rdma-data-corruption$ ls -lh git-repo-config local.git-repo-config 
-rwxr-xr-x 1 raghu users 3.6M 2010-10-09 08:40 git-repo-config*
-rwxr-xr-x 1 raghu users 3.6M 2010-10-09 08:36 local.git-repo-config*

raghu@booradley:~/work/user-issues/bugs/rdma-data-corruption$ md5sum git-repo-config local.git-repo-config 
6f2d845bc5c6e9f9f57a19c46fc9757a  git-repo-config
e44ec37902b419eb7e599e5a268da18b  local.git-repo-config

diff on hexdump of these two files showed that a contiguous chunk of file of size 131056 (16 bytes less than iobuf size) bytes was zeroed out in corrupted file. I've attached the diff.

No rdma errors were found in both client and server logs.
Comment 2 Raghavendra G 2010-10-27 21:14:04 EDT
Bug is easily reproducible with following shell script:


prev="empty" ;
while true; do
    cp -f /usr/lib/locale/locale-archive $GLUSTER_MOUNT
    sum=`md5sum $GLUSTER_MOUNT/locale-archive` 
    if [ "$prev" != "empty" -a "$prev" != "$sum" ]; then
        echo "mismatch prev=$prev sum=$sum"
    rm -f $GLUSTER_MOUNT/locale-archive

locale-archive is a file of size around 50MB.

As of now, the minimum configuration required to reproduce this bug is distributed replicate with just write-behind as the only performance translator.
Comment 3 Raghavendra G 2010-10-27 21:51:19 EDT
Minimum configuration required to reproduce the bug is a two node replicate setup with write-behind as the only performance translator on rdma transport.
Comment 4 Anand Avati 2010-10-29 03:42:26 EDT
PATCH: http://patches.gluster.com/patch/5600 in master (rpc-transport: fix race-condition between rdma-read completion and updating the count of number of vectors to be passed to rpc.)
Comment 5 Anand Avati 2010-11-07 20:15:06 EST
PATCH: http://patches.gluster.com/patch/5609 in master (rpc-transport/rdma: increment post->ctx.count in a loop doint rdma_read.)

Note You need to log in before you can comment on or make changes to this bug.