Bug 170423

Summary:

Cache invalidation bug in nfs v3

Product:

Red Hat Enterprise Linux 4

Reporter:

Peter K <cap>

Component:

kernel

Assignee:

Steve Dickson <steved>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

4.0

CC:

aaron, herrold, jay.hilliard, jbaron, nixon, staubach

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHSA-2006-0132

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-03-07 20:20:01 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

168429

Attachments:

Description	Flags
Purposed Upstream Patch	none
ethereal trace of a working test run.	none

Description Peter K 2005-10-11 16:00:53 UTC

Description of problem:
A trivial sequence of normal commands can make a file permanently different on
two nfs v3 clients that mount (rw) a common directory. Permanent here means that
no (known) amount of time fixes it only cache invalidation (updating the file)
or reboot helps.

Version-Release number of selected component (if applicable):
The problem has been reproduced on both 32- and 64-bit and with both
2.6.9-5.0.5smp and 2.6.9-11smp (it's a 2.6 bug with a fix on the way)

How reproducible:
very reproducible, but on some systems and kernels it can be somewhat sensitive
to timing

Steps to Reproduce:
requirements: two clients (n1, n2) that rw-mount a common nfs v3 directory

steps:
n1 $ echo foo > file

n2 $ cat file
foo

n1 $ touch .
n1 $ echo fxx > file

n2 $ touch file
n2 $ cat file
foo

Actual results:
"file" on n2 still contains "foo" and will not be updated to "fxx" unless file
is updated (touch for example) or the directory is remounted or the system rebooted

Expected results:
"file" should be updated with "fxx"

Additional info:
if size changes (foo -> fxxx for example) nothing happens. This is because of
sepparate handling of isize and mtime in fs/nfs/inode.c

the bug was first reported to the nfs ml by a colleague of mine a few days ago:
http://marc.theaimsgroup.com/?l=linux-nfs&m=112860356727402&w=2

it was also sent to lkml:
http://lkml.org/lkml/2005/10/11/47

where Trond spotted it and started on a fix:
http://lkml.org/lkml/2005/10/11/101

Comment 1 Peter K 2005-10-14 13:30:02 UTC

There is now a fix from Trond:
 http://lkml.org/lkml/2005/10/13/142

I have just verified that it fixes the problem for us.

Comment 2 Steve Dickson 2005-10-17 13:10:47 UTC

Created attachment 120056 [details]
Purposed Upstream Patch

Comment 5 Steve Dickson 2005-11-09 21:48:08 UTC

I spent most of today trying to reproduce this problem and was unable
to... at least with the scenario described in the first Bug Comment.
Note the caching in the 2.6.14 kernel is much different that in the
RHEL4 kernel... so this might have been something that was introduced
in a later kernel.... So unless I'm able to find a reproducer, I'll have to
mark this bug as NOTABUG

Comment 6 Peter K 2005-11-10 08:53:43 UTC

I'm really surprised, you are the first person that I've heard of that is unable
to reproduce this. I've (as I wrote initially) reproduced it on 2.6.9-5.0.5smp,
2.6.9-11smp (both x86_64) and kernel.org 2.6.13.2. With the fix (that Trond
found) atleast 2.6.13.2 is works ok. If I remember correctly people have also
reproduced it on a bunch of fc and debian machines.

This bug is not just a minor annoyance to us it makes it impossible for the
climate model CCSM to be run correctly (without modifications) on our cluster.
We also have a redhat support issue depending on this bugzilla.

Too make sure that it's till there (the bug) I did a new test (on two machines
in a cluster). This is the copy-paste-exact output (including timestamps) for
both machines:

[root@n9 test]# uname -r
2.6.9-5.0.5.ELsmp
[root@n9 test]# pwd
/home/test
[root@n9 test]# mount | grep home
h1:/tornado_home on /home type nfs (rw,nosuid,nodev,hard,tcp,addr=192.168.11.221)
[root@n9 test]# grep home /etc/fstab
h1:/tornado_home        /home                   nfs    
defaults,nosuid,nodev,hard,tcp  0 0
[root@n9 test]# date ; echo foo > file
Thu Nov 10 09:44:04 CET 2005
[root@n9 test]# date ; echo fxx > file
Thu Nov 10 09:44:25 CET 2005
[root@n9 test]# date ; touch .
Thu Nov 10 09:44:34 CET 2005
[root@n9 test]# date ; cat file
Thu Nov 10 09:45:08 CET 2005
fxx
[root@n9 test]#

[root@n10 test]# uname -r
2.6.9-5.0.5.ELsmp
[root@n10 test]# pwd
/home/test
[root@n10 test]# mount | grep home
h1:/tornado_home on /home type nfs (rw,nosuid,nodev,hard,tcp,addr=192.168.11.221)
[root@n10 test]# grep home /etc/fstab
h1:/tornado_home        /home                   nfs    
defaults,nosuid,nodev,hard,tcp  0 0
[root@n10 test]# date ; cat file
Thu Nov 10 09:44:10 CET 2005
foo
[root@n10 test]# date ; touch file
Thu Nov 10 09:44:43 CET 2005
[root@n10 test]# date ; cat file
Thu Nov 10 09:44:49 CET 2005
foo
[root@n10 test]# date ; cat file
Thu Nov 10 09:45:12 CET 2005
foo
[root@n10 test]# date ; cat file
Thu Nov 10 09:46:52 CET 2005
foo
[root@n10 test]#

Comment 7 Peter K 2005-11-10 16:07:18 UTC

-== Summary as of 17.00 CET 20051110 ==-
* I have noticed that the bug only bites sometimes
* I can still reproduce it on all machines though (including 2.6.9-22.0.1)
* how idle the hosts are may make a difference
* shouldn't you look at the upstream patch regardless since Trond considers it a
bug and the fix seems handle a forgotten case

somewhat caotic information follows including a fully automatic way to reproduce:


increasing the number of machines I've tested and different kernels I've now
noticed that it's not 100% reproducible... it seems to be alot easier to
reproduce if the machine is idle and you have to xterms (one on each host). If I
script the process (using ssh) it's harder to reproduce and on two 2.6.9-22.0.1
machines it's alot harder (but allways possible). It might be that the -22.0.1
kernel is better or only the fact that those two aren't 100% idle.

here's the script I use and it has so far never failed to reproduce the problem
(but it's usually only 1 in 3 that goes wrong when automated like this):

it prints out foo followed by the new value fxx if everything was ok, it prints
out foo followed by foo if the bug hits.

[cap@tornado cap]$ uname -r
2.6.9-22.0.1.ELsmp
[cap@tornado cap]$ mount | grep rossby3
d2:/nobackup/rossby3 on /nobackup/rossby3 type nfs
(rw,nosuid,nodev,hard,tcp,addr=192.168.11.232)
[cap@tornado cap]$ grep rossby3 /etc/fstab
d2:/nobackup/rossby3    /nobackup/rossby3       nfs    
defaults,nosuid,nodev,hard,tcp  0 0

[cap@tornado cap]$ for i in 1 2 3 1 2 3 1 2 3; do echo foo > file ; sleep $i
;echo -n "$i "; ssh dunder cat /nobackup/rossby3/cap/file ; sleep $i ; touch . ;
sleep $i ; echo fxx > file ; sleep $i ;echo -n "$i "; ssh dunder "touch
/nobackup/rossby3/cap/file ; sleep $i ; cat /nobackup/rossby3/cap/file"; sleep
$i;  done
1 foo
1 fxx
2 foo
2 fxx
3 foo
3 fxx
1 foo
1 fxx
2 foo
2 fxx
3 foo
3 foo
1 foo
1 fxx
2 foo
2 fxx
3 foo
3 foo
[cap@tornado cap]$ cat file
fxx
[cap@tornado cap]$ ssh dunder cat /nobackup/rossby3/cap/file
foo
[cap@tornado cap]$ ssh dunder cat /nobackup/rossby3/cap/file
foo
[cap@tornado cap]$ ssh dunder cat /nobackup/rossby3/cap/file
foo
[cap@tornado cap]$ touch file
[cap@tornado cap]$ ssh dunder cat /nobackup/rossby3/cap/file
fxx
[cap@tornado cap]$

note how "file" stays foo after a loop like this until it's touched (on the
writing client).

I know that this isn't very nice and clean but it's atleast fully automated
(you'll have to change hostname and filename inte the loop though).
Unfortunately I'll go on vacation now (5 weeks in Australia =) so I can't follow
up much more on this. I'll see if a collegue of mine can "take over"

Comment 8 Steve Dickson 2005-11-10 16:42:46 UTC

Created attachment 120884 [details]
ethereal trace of a working test run.

I was using later kernels (2.6.9-22) on both my clients so
I backed off to 2.6.9-5.0.5.ELsmp kernels and I'm still not
able to reproduce this issue... Here is what I was doing:

pro1$ uname -r
2.6.9-5.0.5.ELsmp
pro1$ cd /mnt/xeon5/home/tmp
pro1$ date ; echo foo > file
Thu Nov 10 11:36:19 EST 2005
pro1$ date ; echo fxx > file ; touch .
Thu Nov 10 11:36:34 EST 2005

pro5$ uname -r
2.6.9-5.0.5.ELsmp
pro5$ cd /mnt/xeon5/home/tmp
pro5$ date ; cat file
Thu Nov 10 11:36:24 EST 2005
foo
pro5$ date ; cat file
Thu Nov 10 11:36:39 EST 2005
fxx

I also made sure the clocks on both clients were sync-ed via ntpdate.

Now I'm not ready to give up on this yet... so I've attached a
bzip2-ed ethereal trace (captured on the server so both clients
could be traced) of a working run. If possible, I would like you
to do the same so they can  be compared....

btw, what server are you using?

Comment 9 Jay Hilliard 2005-11-10 20:12:59 UTC

I'm also able to reproduce this on 2.6.9-22.  A patch for this kernel is welcome.

Comment 10 Steve Dickson 2005-11-10 21:06:12 UTC

Unfortunately the upstream  patch in Comment #2 needs 8 other upstream
patches for it to apply cleanly...  Which I'm not against doing since one, I've
already done the work and two it would move the RHEL4 cache code
close to upstream (for better or worse ;-) ). But since I can't reproduce the
problem I have no way of verifying if these patches actually fix the bug...

So would anybody be willing to download a pre-U3 test kernel
from my people patch to see if one, there are any regressions
and two, to see if actually fixes the caching bug?

If so I could probably have something ready by later tonight
or early tomorrow (depending out our build system)

Comment 11 Peter K 2005-11-10 22:47:37 UTC

regarding the server we use, I have tried both 2.6.9-5.0.5smp, 2.6.9-11smp and
now we are running a 2.6.13.2

if you have a testkernel for me to try I'll test it (if there's time before I
leave) otherwise I'll try to get someone else to try it out.

Comment 12 Steve Dickson 2005-11-11 11:00:36 UTC

RHEL4 U3 Test kernels that have the patch in
Comment #2 as well as a number of other patches that are
needed for that one patch are available in:
http://people.redhat.com/steved/bz170423/

Please let me know asap if these fix the caching issue
your seeing... tia...

Comment 13 Jay Hilliard 2005-11-15 00:21:24 UTC

This certainly fixes the problem.  I was able to duplicate the original problem
on  a network appliance, Solaris8, Solaris9, and MacOSX nfs mounts.  The patched
kernel works! Thanks!

Comment 14 Steve Dickson 2005-11-16 11:34:47 UTC

Cool.... Thank you very much for your effort... Its definitely appreciated!!

Comment 18 Aaron Straus 2006-01-05 20:24:55 UTC

Hi, just wanted to mention we surely see this in a real work-load here RHEL4 ES
(2.6.9-22.0.1.ELsmp).  I'll try a patched kernel ASAP.  Thanks!

Comment 22 Red Hat Bugzilla 2006-03-07 20:20:02 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Comment 25 Issue Tracker 2007-06-19 08:35:54 UTC

Internal Status set to 'Resolved'
Status set to: Closed by Client
Resolution set to: 'RHEL 4 U4'

This event sent from IssueTracker by uthomas 
 issue 81774