Bug 773493 - File size initially wrong after a replica volumes image failure and repair.
Summary: File size initially wrong after a replica volumes image failure and repair.
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.2.5
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-11 23:30 UTC by Jeff Byers
Modified: 2014-12-14 19:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-14 19:40:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Byers 2012-01-11 23:30:47 UTC
File size initially wrong after a replica volumes image failure and repair.

During testing, we noticed a strange situation where the
files data was correct, but the size indicated was
incorrect. But later, without any writes -- only an 'ls' or
'dir' command, the file's size then becomes correct.

This can be reproduced by:

1) Start with GlusterFS replica volume duplex.

2) Generate a test file of a specific size 'N'. Give plenty
of time for the file I/O to complete and be flushed.

3) Shut down one of the GlusterFS nodes such that the
replica volume becomes simplex.

4) Verify that the file is still there and good and has size
'N', both by an 'ls' or 'dir' command, and by reading the
entire file.

5) Regenerate the same test file of a specific size not
equal to 'N', say 'N * 2'. Again, give plenty of time for the
file I/O to complete and be flushed.

6) Allow the failed GlusterFS node back up so that the
GlusterFS replica volume becomes duplex. Do not start the
forced healing with "find ... stat ..." yet.

7) Re-verify the file by reading its contents fully before
doing an 'ls' or 'dir' command on it. The file's data will be
correct, but it will appear to have the size 'N', not 'N * 2'.

8) Do an 'ls' or 'dir' and see that the size is shown as 'N * 2'.

9) Re-verify the same file and now see that both the file
data and size are correct, 'N * 2'.

=====

This was first tested using GlusterFS 3.2.3, but was retested
with 3.2.5 -- the newer version made no difference.

The only thing that seems to avoid this problem is setting
'performance.stat-prefetch' to 'off', it defaults to 'on'.

It seems like there might be a problem here with a files
cached metadata that could result in some strange and
transient problems.

Is this a known issue that is documented somewhere that I
missed?

I would rather be using the GlusterFS volume default
options, but for now only trust 'performance.stat-prefetch'
being set to 'off', despite the expected performance loss.

~ Jeff Byers ~

Comment 1 Pranith Kumar K 2012-01-12 06:50:18 UTC
hi Jeff,
    Thanks for logging the bug.
I have doubts in step 6,7 in the steps you provided.
How much time did you wait after step 6 is completed?.
In step-7 what command are you using to read the file and how are you verifying its size to be N and not N*2?.

Do you have a script that tests this.

Pranith

Comment 2 Jeff Byers 2012-01-12 13:11:46 UTC
When the full test is running, the client is writing
and reading files continuously, writing the bunch, then
verifying the bunch. When reproducing manually, it is
human time in seconds, or minutes.

The commands that generate the file and verify the file
are home grown. The generator takes a size and file
name, writes a header and fills in the rest with a data
pattern. The verifier takes the name, and from the
file header determines the expected size, reads
and verifies the data pattern in 64K chunks, and
verifies that the file is the correct size by seeing
EOF when it should. Note that the verifier program
does not use 'stat' information for the file size,
the file contents itself determines the size.

There are test scripts and programs, but they are tied
into a system and might be difficult to extract them
from. The file generator and verifier may be free
standing enough though.

~ Jeff Byers ~

Comment 3 Kaushal 2012-02-29 06:47:02 UTC
Hi Jeff,

From what I understand about your verifier, it gets the size the generator wrote to the header of the file and then verifies that the data of given size is present and correct. Is this correct? 
But this still doesn't clarify Pranith's doubt about how size is N not N*2. If the verifier says data is alright, then the size it reads form the header must be N*2. Where is the size shown as N?
Also, from where are you verifying the files. From the mountpoint or the bricks?

If it is possible, can you also provide the file generator & verifier.

Kaushal

Comment 4 Jeff Byers 2012-03-01 01:50:15 UTC
(In reply to comment #3)
> Hi Jeff,
> 
> From what I understand about your verifier, it gets the size the generator
> wrote to the header of the file and then verifies that the data of given size
> is present and correct. Is this correct? 
> But this still doesn't clarify Pranith's doubt about how size is N not N*2. If
> the verifier says data is alright, then the size it reads form the header must
> be N*2. Where is the size shown as N?
> Also, from where are you verifying the files. From the mountpoint or the
> bricks?
> 
> If it is possible, can you also provide the file generator & verifier.
> 
> Kaushal

Hi Jeff,

    From what I understand about your verifier, it gets
    the size the generator wrote to the header of the
    file and then verifies that the data of given size
    is present and correct. Is this correct?

Jeff: Yes

    But this still doesn't clarify Pranith's doubt
    about how size is N not N*2. If the verifier says
    data is alright, then the size it reads form the
    header must be N*2. Where is the size shown as N?

Jeff: I wasnt clear enough. The data was correct up to
N, but EOF was reached on the verify read 64KB loop
before the N*2 worth of reads were done. But if one
does a "dir", or waits a while, and the verifier is
rerun, all of the data is then there.

In our test scripts, we went so far as to put a retry
loop, and halfway through the verify retries, did the
"dir" which allowed it to then pass. This was done
until we determined that 'performance.stat-prefetch=off'
would resolve the problem completely.

    Also, from where are you verifying the files. From the
    mount-point or the bricks?

Jeff: All of the I/O is being done from the mount-point
of the volumes. Our configuration is 2-way GlusterFS
replica volume, with local GlusterFS clients and Samba
arbitrated by CTDB, with the host being Windows using
CIFS.

    If it is possible, can you also provide the file
    generator & verifier.

We are looking at that, and are going to try to
simplify the environment enough that we could send
something over to you.

~ Jeff Byers ~

Comment 5 Amar Tumballi 2012-07-11 05:20:24 UTC
Jeff, can you check if the issue still exists with 3.3.0 release?

Comment 6 Jeff Byers 2012-07-11 13:29:36 UTC
Can you point me to the change list, or where in the source the fix was implemented so that, when I have time, I can add logging to make sure that the code path with the fix was executed? Thanks.

Comment 7 Andre Klapper 2012-11-10 15:09:35 UTC
Amar: Can you answer comment 6, please?

Comment 8 Pranith Kumar K 2013-02-22 10:38:55 UTC
> We are looking at that, and are going to try to
> simplify the environment enough that we could send
> something over to you.
> 
> ~ Jeff Byers ~

Jeff,
     Could you let us know if you have a simpler test case that can re-create the issue?

Pranith

Comment 9 Jeff Byers 2013-03-14 23:27:24 UTC
(In reply to comment #8)
> > We are looking at that, and are going to try to
> > simplify the environment enough that we could send
> > something over to you.
> > 
> > ~ Jeff Byers ~
> 
> Jeff,
>      Could you let us know if you have a simpler test case that can
> re-create the issue?
> 
> Pranith

We believe that we provided some simplified test scripts
and instructions on how to use them to you guys about a
year ago.

The test engineer remembers getting authorization to do
this, and putting something together. Unfortunately, it
has been too long, and cannot find any local copies of
what we had sent over to you.

Do you remember communicating with, and/or receiving
anything from our David McBride? 

We do have our original programs and scripts of course,
but the work to make them externally useful would need
to be repeated.

What is the likelihood of any use being make of any
test scripts and programs we would provide now?

Comment 10 Niels de Vos 2014-11-27 14:54:44 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.


Note You need to log in before you can comment on or make changes to this bug.