Bug 237874 - NFS write corruption with PAE kernel
Summary: NFS write corruption with PAE kernel
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: athlon
OS: Linux
medium
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 427887
TreeView+ depends on / blocked
 
Reported: 2007-04-25 19:49 UTC by Chris Schanzle
Modified: 2008-02-08 04:25 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2008-02-08 04:25:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
diffs of file corruption (10.07 KB, text/plain)
2007-04-27 23:34 UTC, Chris Schanzle
no flags Details

Description Chris Schanzle 2007-04-25 19:49:52 UTC
Description of problem:
Writing files via NFS with client running kernel-PAE results in frequent file
corruption.

System is an "AMD Athlon(tm) 64 X2 Dual Core Processor 4800+" on an ASUS A8N-SLI
Premium motherboard with 4GB of RAM.  I tried to use the PAE kernel to get my
final GB of RAM accessible.  My home directory is mounted from an FC6 NFS
server, has been working fine for ~year with the non-PAE kernel.

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.20-1.2944.fc6.i686
kernel-2.6.20-1.2944.fc6.i686

How reproducible:
Not 100%, but easily reproducible and very noticable in daily operation (rpm
builds fail, mozilla corrupts inboxes & address book, etc).

Steps to Reproduce:
1. grab a verifiable substantial tar file, such as Linus kernel sources, on
local disk, and verify integrity:

cd /var/tmp
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.7.tar.bz2
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.7.tar.bz2.sign
gpg --verify linux-2.6.20.7.tar.bz2.sign

2. extract to local disk & generate known-good sha1sums
rm -rf linux-2.6.20.7 SUMS
tar xjf linux-2.6.20.7.tar.bz2
find linux-2.6.20.7  -type f | xargs sha1sum >> SUMS
rm -rf linux-2.6.20.7


3. untar to NFS filesystem, showing bad sha1sums each time.
cd
n=0
while : ; do
  echo starting pass $((++n))
  rm -rf linux-2.6.20.7
  tar xjf /var/tmp/linux-2.6.20.7.tar.bz2
  sha1sum --check /var/tmp/SUMS | egrep -v 'OK$'
done
  
Actual results:
for me, 40-50 sha1sums don't match on each pass.

Expected results:
no corruption  :-)

Additional information:
With PAE kernel, cannot reproduce corruption on local disk, only NFS.
Without PAE kernel, cannot reproduce corruption.
NFS server is FC6 too.

Comment 1 Chuck Ebbert 2007-04-27 21:54:23 UTC
What kind of corruption occurs? Please compare the known-good and bad files
and see if there is a pattern.


Comment 2 Chris Schanzle 2007-04-27 23:34:54 UTC
Created attachment 153676 [details]
diffs of file corruption

Comment 3 Chris Schanzle 2007-04-27 23:36:09 UTC
The tails of the files are corrupt.  Less the attachment, noting binary garbage.

To do this, I used basically the same script as above but didn't wipe out the
good copy in /var/tmp (last line of step 2 above).  Then,

cd
n=0
while : ; do
  echo starting pass $((++n))
  rm -rf linux-2.6.20.7
  tar xjf /var/tmp/linux-2.6.20.7.tar.bz2
  bad=`sha1sum --check /var/tmp/SUMS | egrep -v 'OK$' | awk -F: '{print $1}'`
  for i in $bad; do diff -au /var/tmp/$i $i; done > /tmp/diffs.$n
done

I always get corruption on the first pass.

Comment 4 Chris Schanzle 2007-05-07 19:47:48 UTC
*blush*

I've run memtest86+ previously on this system for days w/o error.  But that was
about six months ago.

Ran memtest86+ the other day and within minutes found memory errors with one
DIMM.  Replaced DIMM, short memtest OK.

Shockingly, *still* getting file corruption with the file tails:

starting pass 1
sha1sum: WARNING: 7 of 21282 computed checksums did NOT match
starting pass 2
sha1sum: WARNING: 11 of 21282 computed checksums did NOT match
starting pass 3
sha1sum: WARNING: 8 of 21282 computed checksums did NOT match
starting pass 4
sha1sum: WARNING: 8 of 21282 computed checksums did NOT match
starting pass 5

I'll continue to validate the hardware and/or try another similar box.

Comment 5 Chris Schanzle 2007-05-08 20:54:31 UTC
No memtest86+ errors on overnight (14hrs) memtest86+ with repaired and another
identically configured system.

Can replicate the problem on both systems with PAE kernel.  Cannot replicate (16
passes OK) with i686 kernel (kernel-2.6.20-1.2944.fc6.i686).

Comment 6 Jon Stanley 2008-01-08 01:52:22 UTC
(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 7 Jon Stanley 2008-02-08 04:25:10 UTC
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA,
since no information has been lodged for over 30 days.

Please re-open this bug or file a new one if you can provide the requested data,
and thanks for filing the original report!


Note You need to log in before you can comment on or make changes to this bug.