Bug 52543 - c programs compiled over nfs mounts on 2.4 linux kernels get corrupted
Summary: c programs compiled over nfs mounts on 2.4 linux kernels get corrupted
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: nfs-server
Version: 7.1
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-08-24 20:36 UTC by John Spencer
Modified: 2007-04-18 16:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-08-11 10:39:15 UTC
Embargoed:


Attachments (Terms of Use)

Description John Spencer 2001-08-24 20:36:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
My testing seems to indicate that when run under Linux 2.4 Kernel, bad 
data is written and it doesn't matter if the NFS mounted partition is on 
any version of Linux or Solaris.  If you examine a compiled test file 
right away, you'll see good data.  If you wait a few seconds or look from 
another machine, the file contents will change.  This seems to indicate 
that there is some caching going on that is causing the problem.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. compile a program using gcc on a 2.4 Linux kernel on an nfs mounted 
partition
2.  use od to examine contents of the file
3.  if you examine the contents immediately the info is correct, if you 
wait a few seconds or examine the contents from another machine, the file 
contents will change.
	

Actual Results:  bad data in files compiled using gcc on nfs mounted 
partitions on the Linux 2.4 kernel

Expected Results:  good data, data that stays the same and does not change 
on nfs mounted partitions
if the file is compiled locally it contains good data

Additional info:

My
testing seems to indicate that when run under Linux 2.4 Kernel, bad data
is written and it doesn't seem to matter if the disk is on any version
of Linux or Solaris.

All 3 of these 2.4 kernels show the same problem:

Linux version 2.4.0-4GB (root.de) (gcc version 2.95.2
19991024 (release)) #1 Wed Jan 24 15:55:09 GMT 2001

Linux version 2.4.3-20mdk (chmou.com) (gcc version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release / Linux-Mandrake 8.0))
#1 Sun Apr 15 23:03:10 CEST 2001

Linux version 2.4.2-2 (root.redhat.com) (gcc version 2.96
20000731 (Red Hat Linux 7.1 2.96-79)) #1 Sun Apr 8 20:41:30 EDT 2001

This version of Linux does not:

Linux version 2.2.17-21mdk (chmou.com) (gcc version
2.95.3 19991030 (prerelease)) #1 Thu Oct 5 13:16:08 CEST 2000



Run the program with one file pathname argument.  Then use od to examine
the file contents.  Here's what you should get if all is working
correctly:

    $ ./test.linux foo.while
    $ od -x foo.while 
    0000000 f00d baad f00d baad f00d 600d f00d 600d
    0000020 f00d 600d f00d 600d f00d 600d f00d 600d
    *
    6543440 f00d 600d f00d 600d 0000 0000 0000 0000
    6543460


When run under a 2.4 kernel, you get the following results:

    $ ./test.linux /s/package/bgriffin/foo.casex
    $ od -x /s/package/bgriffin/foo.casex       
    0000000 f00d 600d f00d 600d f00d 600d f00d 600d
    *
    6543440 f00d 600d f00d 600d 0000 0000 0000 0000
    6543460

NOTE:  If you examine the file right away, you'll see good data.  If you
wait a few seconds or look from another machine, the file contents will
change.  This seems to indicate that there is some caching going on that
is causing the problem.



To compile the program:

    $ gcc -c -g test.c
    $ gcc -o test.linux test.o

Comment 1 Jeff Elam 2001-12-20 22:44:22 UTC
We have the same problem here at Intel with kernel 2.4.2-2 using Mentor 
Modelsim.  This problem did not exist in 6.2 with the same version of 
Modelsim.  The test case provided in this bug report reproduces the problem 
perfectly.

However, the problem is not limited to NFS.  We also have new problems with 
Modelsim on the 2.4.2-2 kernel over AFS (both Transarc and Openafs), and the 
test case reproduces those problems perfectly also.  With the Transarc AFS, the 
process completely hangs right at the mmap() call.  When it hangs, all ps, top, 
ls /proc attempts by any other user also hangs.  Shutdown hangs, but reboot -f 
is OK (but not clean, of course).

With Openafs, it doesn't hang, but it behaves similarly to the NFS corruption.  
When the resulting file is viewed locally, it looks OK.  When it is viewed from 
another system, it is corrupt.  With NFS, the corrupted file written to the 
network server eventually overrides the local copy and the corruption appears 
after a few seconds.  Under openafs, the local output remains OK because it 
stays in the AFS cache, while every other client sees the corrupt file.

Under the 2.4.9-12 kernel, the NFS corruption no longer occurs.  However, the 
Openafs corruption of the output to the network still happens exactly as 
before.  Transarc AFS for 2.4.9-12 doesn't exist yet so I couldn't test that 
one.


Note You need to log in before you can comment on or make changes to this bug.