Red Hat Bugzilla – Bug 198298
Kernel Oops when umount last NFS share
Last modified: 2007-11-30 17:11:37 EST
Description of problem:
I get a kernel Oops when I try to unmount the last NFS share.
This may belong to the NFS person, or the umount person.
Version-Release number of selected component (if applicable):
Kernel: 2.6.17-1.2139_FC4smp i686
Very: It happens every time I execute /bin/umount /net/asml1
when this is the last/only NFS mount and it's not being used.
Steps to Reproduce:
1. cd /net/asml1 (force a NFS mount)
2. cd ~ (make sure it's no longer in use)
3. /bin/umount /net/asml1
This has been a problem for the dozen or so kernels I've tried, both SMP and
non. In June it was causing kernel panics that took 5-10 minutes to recover
and often system crashes. I have installed the latest kernel but I haven't
rebooted, partly because this is a production server and partly because I
suspect the problem will go away for a day or two and then start up again
crashing the kernel. It's probably a complex problem cause by a confluence of
several factors. However, it is extremely reproducable and does not crash the
server (yet). So I can do a lot of testing.
Created attachment 132200 [details]
Kernel Oops Messages
Created attachment 132261 [details]
system message addresses run through ksymoops
(In reply to comment #2)
> Created an attachment (id=132261) 
> system message addresses run through ksymoops
This looks more like what I'd expect to see in a panic than
an oops, strange.
(In reply to comment #0)
> How reproducible:
> Very: It happens every time I execute /bin/umount /net/asml1
> when this is the last/only NFS mount and it's not being used.
So this is host has only one export.
Is that correct?
How should I make the server look to attempt to duplicate
(In reply to comment #4)
> (In reply to comment #0)
> > How reproducible:
> > Very: It happens every time I execute /bin/umount /net/asml1
> > when this is the last/only NFS mount and it's not being used.
> So this is host has only one export.
> Is that correct?
> How should I make the server look to attempt to duplicate
I've installed FC4 and yum updated to the latest updates.
Kernel is 2.6.17-1.2142_FC4.
util-linux is 2.12p-9.14.
Server is FC4 running 2.6.16-2121_FC6 and is named eagle.
So far I tried:
eagle with 2 exports:
And with 1 export:
And no problem seen.
More information needed.
Here are the exports in my auto.net file:
asml1 -fstype=nfs,soft,timeo=0,ro asml1:/usr/asm
asml2 -fstype=nfs,soft,timeo=0,ro asml2:/usr/asm
asml1_data -fstype=nfs,soft,timeo=0,ro asml1:/usr/asm/data.5465
asml2_data -fstype=nfs,soft,timeo=0,ro asml2:/usr/asm/data.9974
These two machines are Solaris 6 boxes. I hope that doesn't make a
difference but I'm trying to be helpful.
I updated via yum, installed the latest kernel and had the same problem.
Current Kernel: 2.6.17-1.2142_FC4smp
rpm -qa util-linux: util-linux-2.12p-9.14
When I said it's highly repeatable I meant I can cause the error on my box
Would it be possible to get an bzip2-ed binary tethereal (or snoop) trace?
something something similar to tethereal -w /tmp/data.pcap <client>
Created attachment 133005 [details]
15 seconds of tethereal: Contains three kernel oops events in rapid succession triggered by umouting NFS partition.
[This comment added as part of a mass-update to all open FC4 kernel bugs]
FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel. As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
Please retest with Fedora Core 5.
I have not been able to cause the oops for about a month now.
I hope one of the latest updates I applied in July or August
has fixed it but have not proven anything. It's OK to take
this off the list.