Bug 762788 (GLUSTER-1056) - Data corruption on client side
Summary: Data corruption on client side
Keywords:
Status: CLOSED DUPLICATE of bug 762547
Alias: GLUSTER-1056
Product: GlusterFS
Classification: Community
Component: core
Version: 3.0.2
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-08 07:27 UTC by Sachidananda Urs
Modified: 2015-12-01 16:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Client vol file (5.11 KB, application/octet-stream)
2010-07-08 04:27 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2010-07-08 04:27:51 UTC
Created attachment 249 [details]
new spec file

Comment 1 Chida 2010-07-08 05:11:55 UTC
Here are more details,

The corruption seem to happen under load.  The work load is to run 200 or more lame encoding process on the mount point.  The processes run normal initially and seem to freeze after a while. After the freeze, the files appear corrupted. The files are mp3, jpeg and xml files between a few kb's to 9MB in size and millions of them.

When files are locked or process is frozen, a reboot seem to fix the corruption.

The files created/modified during heavy load seem to be corrupted. xml files hold metadata for lame encoder and when xml is corrupted, lame segfaults or coredumps.

On another machine with the same gluster mount, there is no corruption. This server is not on load.

Comment 2 Shehjar Tikoo 2010-07-08 05:19:30 UTC
Chida, what is the application used for encoding? I want to incorporate this into my tests.

Comment 3 Chida 2010-07-08 05:31:16 UTC
(In reply to comment #3)
> Chida, what is the application used for encoding? I want to incorporate this
> into my tests.

This is the application, http://lame.sourceforge.net/

You can launch 200+ instances of encoding process in parallel. While this is running. you may create 100's of small txt files with some content. Then check if the txt files and/or mp3's are intact. There are other tools to verify the integrity of mp3 files such as id3, http://checkmate.gissen.nl/, or even a checksum.

I will try to get a more accurate test case.

Comment 4 Sachidananda Urs 2010-07-08 07:27:12 UTC
Files appear as zero byes on the mount point. When the files are closed and opened `\n' characters are seen. After reboot, everything seems okay.

Attaching client vol file.

Comment 5 Chida 2010-07-09 03:27:46 UTC
Here are more details:

It's when lame is encoding mp3 to mp3 in many directories.
The source and destinations are scattered amongst many directories
relatively randomly.

At some point under heavy load the access to certain files on the mount
point is tainted.
Some files report as a null for every byte in the file or the file is
empty when an ls -la returns the expected result.

This only effects some files on the mount point, other files work fine.

At the same time other client machines mounting the same gluster mount
point using the same gluster node have no problem, the files reporting
as corrupted on the other client are accessible as normal.

Some processes on the failed client are zombied and disk bound.

A reboot of the client machine fixes the problem until it hits heavy
load again.

Comment 6 shishir gowda 2010-07-27 06:38:05 UTC

*** This bug has been marked as a duplicate of bug 815 ***


Note You need to log in before you can comment on or make changes to this bug.