142849 – fcntl appends over NFS/GFS: data corruption

Bug 142849 - fcntl appends over NFS/GFS: data corruption

Summary: fcntl appends over NFS/GFS: data corruption

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Ben Marzinski
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	164915
TreeView+	depends on / blocked

Reported:	2004-12-14 17:39 UTC by Derek Anderson
Modified:	2010-01-12 03:01 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2006-0234
Clone Of:
Environment:
Last Closed:	2006-03-09 19:45:28 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0234	0	normal	SHIPPED_LIVE	GFS-kernel bug fix update	2006-03-09 05:00:00 UTC

Description Derek Anderson 2004-12-14 17:39:27 UTC

Description of problem:
Setup:
- link-10: mounted GFS as /mnt/gfs0, exports via nfs as /mnt/gfs0 with
options *(rw,sync).
- bench-01: mounted NFS /mnt/gfs0 as /mnt/link-10/mnt/gfs0 with
default options.
- Run genesis from the sistina-test tree.

On the first append operation genesis is getting a data comparison
error.  An od on the NFS machine shows the appended bytes to be null;
an od on the GFS machine of the same file shows the expected bytes
written.

This same test was run over straight NFS without error.

Only a problem with fcntl style locking.  Flock ran without error.

This can also be reproduced with the accordion tool, which does all
append operations, with some truncs thrown in.

====================================
The command line on the NFS mounter:
====================================
[root@bench-01 bin]# ./genesis -w /mnt/link-10/mnt/gfs0 -s 10s -k -L
fcntl -v
genesis starting with:
Working dir:       /mnt/link-10/mnt/gfs0
Iterations:        0
Run time:          0s
Random Seed:       22760
Number of files:   1024
File Size:         10
Filename Length:   25
Number of dirs:    10
Locking:           fcntl
Skip shrink        true
APPEND: gendir_8/klspndyjwgcsetqycykjnmxn
*** DATA COMPARISON ERROR gendir_8/klspndyjwgcsetqycykjnmxn ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 0
    1st  7 expected bytes:  A:22760
    1st  7 actual bytes:    .......

genesis (22760) exited with status 1

=================================================
Run od on the file in error from the NFS machine:
=================================================
[root@bench-01 bin]# od -c
/mnt/link-10/mnt/gfs0/gendir_8/klspndyjwgcsetqycykjnmxn
0000000   C   :   2   2   7   6   0   :   b   e  \0  \0  \0  \0  \0  \0
0000020  \0
0000021

# This guy sees the 7 appended bytes as null

=================================================
Run od on the file in error from the GFS machine:
=================================================
[root@link-10 gfs0]# od -c /mnt/gfs0/gendir_8/klspndyjwgcsetqycykjnmxn
0000000   C   :   2   2   7   6   0   :   b   e   A   :   2   2   7   6
0000020   0
0000021

# This shows the expected 7 bytes properly appended to the file.

Version-Release number of selected component (if applicable):
DEVEL.1102693630 (built Dec 10 2004 09:48:40)

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ken Preslan 2005-01-05 22:34:53 UTC

Let me make sure I know exactly what you're trying to do.  I think
you're trying to see if there is consistency between accesses to a
local filesystem and accesses to that same filesystem exported by NFS
to another machine.  Right?

Does this work with Ext2/3 as the underlying filesystem?

Comment 2 Dean Jansa 2005-04-19 15:29:56 UTC

> Does this work with Ext2/3 as the underlying filesystem?

I looks that way, I just tried with Ext3 without error.

Comment 3 Kiersten (Kerri) Anderson 2005-07-18 20:05:25 UTC

Is this still a problem?  Defect is in NEEDINFO state.

Comment 4 Dean Jansa 2005-11-04 17:09:08 UTC

This is still a problem:

GFS fs exported via NFS to fore:
(Cluster is ia64 - DLM, fore is x86_64)

RHEL4U2 on all nodes.l

On fore (NFS client):
link-13:/mnt/vedder on /mnt/vedder type nfs (rw,addr=10.15.84.163)

On link-13 (NFS server, serving up the GFS fs via NFS):
[root@link-13 fore]# exportfs -v
/mnt/vedder     fore.lab.msp.redhat.com(rw,wdelay,root_squash)


----------------- Test Results ----------------
: fore; accordion -i 10 -L fcntl testfile_fcntl

accordion starting:
Iterations:       10
Run time:         0s
Lock type:        fcntl
File size:        1024
Extend size:      1
Random truncate:  No
Use lseek:        No
Release Interval: 0
Random seed:      11474
Filelist:
----------------------------------------------------
/mnt/vedder/fore/testfile_fcntl
*** DATA COMPARISON ERROR testfile_fcntl ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 0
    1st  1 expected bytes:  W
    1st  1 actual bytes:    .

child (11474) exited with status 1


After the error, take a look at the file contents:

File on NFS CLIENT:
: fore; od -c testfile_fcntl
0000000   W   W   W   W   W   W   W   W   W   W   W  \0
0000014

File on GFS:
[root@link-13]# od -c testfile_fcntl 
0000000   W   W   W   W   W   W   W   W   W   W   W   W
0000014




Run the same test, using flocks:

: fore; accordion -i 10 -L flock testfile_flock

accordion starting:
Iterations:       10
Run time:         0s
Lock type:        flock
File size:        1024
Extend size:      1
Random truncate:  No
Use lseek:        No
Release Interval: 0
Random seed:      11491
Filelist:
----------------------------------------------------
/mnt/vedder/fore/testfile_flock
child (11491) exited with status 0

After test run: 
: fore; od -c testfile_flock 
0000000   W   W   W   W   W   W   W   W   W   W
0000012

[root@link-13]# od -c testfile_flock 
0000000   W   W   W   W   W   W   W   W   W   W
0000012

Comment 5 Ben Marzinski 2005-12-07 23:35:38 UTC

Well, this doesn't even need NFS... or locking. I can get this with lock_nolock
on one machine.  Simply do fcntl locking, write to a file, and instead of
reading, do a sendfile on what you just wrote. sendfile will not give you the
correct data. Doing a read will.

Comment 6 Ben Marzinski 2005-12-09 00:11:11 UTC

And it doesn't evey need the fcntls..
sendfile doesn't correctly deal with stuffed inodes.  Sometimes it will work,
sometimes not. Unstuffed inodes always seem to work fine... So any file
3865 bytes big or larger should always work.

Comment 7 Ben Marzinski 2005-12-10 00:32:59 UTC

O.k. I have a fix. here's what's wrong. If anyone has a better solution, I'm all
ears.

When you do gfs_write on a stuffed inode, you don't update the page cache,
because the inodes are stored in the buffer cache. This doesn't effect reads,
because gfs special cases the stuffed reads.  Unfortunately, sendfile needs to
use the page cache, because it relys on the destination socket's sendpage
routine to work. So my fix is: after you do a write on a stuffed inode, if the
first page of the
file is cached (It appears from looking at the code that there is already an
assumption that stuffed inodes will never be more than a page in length) mark
the cached page as not uptodate.

Comment 10 Red Hat Bugzilla 2006-03-09 19:45:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html

Note You need to log in before you can comment on or make changes to this bug.