Bug 198298 - Kernel Oops when umount last NFS share
Kernel Oops when umount last NFS share
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Ian Kent
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-10 17:26 EDT by Jason Welter
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-18 09:23:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Kernel Oops Messages (7.28 KB, text/plain)
2006-07-10 17:26 EDT, Jason Welter
no flags Details
system message addresses run through ksymoops (22.30 KB, text/plain)
2006-07-11 14:39 EDT, Jason Welter
no flags Details
15 seconds of tethereal: Contains three kernel oops events in rapid succession triggered by umouting NFS partition. (13.99 MB, application/x-gzip-compressed)
2006-07-25 13:30 EDT, Jason Welter
no flags Details

  None (edit)
Description Jason Welter 2006-07-10 17:26:51 EDT
Description of problem:
I get a kernel Oops when I try to unmount the last NFS share.
This may belong to the NFS person, or the umount person.

Version-Release number of selected component (if applicable):
Kernel: 2.6.17-1.2139_FC4smp i686
Umount: mount-2.12p

How reproducible:
Very:  It happens every time I execute /bin/umount /net/asml1
when this is the last/only NFS mount and it's not being used.

Steps to Reproduce:
1.  cd /net/asml1 (force a NFS mount)
2.  cd ~ (make sure it's no longer in use)
3.  /bin/umount /net/asml1
  
Actual results:

Expected results:

Additional info:
This has been a problem for the dozen or so kernels I've tried, both SMP and 
non.  In June it was causing kernel panics that took 5-10 minutes to recover 
and often system crashes.  I have installed the latest kernel but I haven't 
rebooted, partly because this is a production server and partly because I 
suspect the problem will go away for a day or two and then start up again 
crashing the kernel.  It's probably a complex problem cause by a confluence of 
several factors.  However, it is extremely reproducable and does not crash the
server (yet).  So I can do a lot of testing.
Comment 1 Jason Welter 2006-07-10 17:26:51 EDT
Created attachment 132200 [details]
Kernel Oops Messages
Comment 2 Jason Welter 2006-07-11 14:39:17 EDT
Created attachment 132261 [details]
system message addresses run through ksymoops
Comment 3 Ian Kent 2006-07-20 23:42:48 EDT
(In reply to comment #2)
> Created an attachment (id=132261) [edit]
> system message addresses run through ksymoops
> 

This looks more like what I'd expect to see in a panic than
an oops, strange.
Comment 4 Ian Kent 2006-07-20 23:53:24 EDT
(In reply to comment #0)
> 
> How reproducible:
> Very:  It happens every time I execute /bin/umount /net/asml1
> when this is the last/only NFS mount and it's not being used.

So this is host has only one export.
Is that correct?

How should I make the server look to attempt to duplicate
this?
Comment 5 Ian Kent 2006-07-21 00:12:48 EDT
(In reply to comment #4)
> (In reply to comment #0)
> > 
> > How reproducible:
> > Very:  It happens every time I execute /bin/umount /net/asml1
> > when this is the last/only NFS mount and it's not being used.
> 
> So this is host has only one export.
> Is that correct?
> 
> How should I make the server look to attempt to duplicate
> this?
> 

I've installed FC4 and yum updated to the latest updates.
Kernel is 2.6.17-1.2142_FC4.
util-linux is 2.12p-9.14.

Server is FC4 running 2.6.16-2121_FC6 and is named eagle.

So far I tried:

eagle with 2 exports:
/boot   *(sync)
/autofs *(rw,sync,no_root_squash)

cd /net/eagle
cd
umount /net/eagle/boot
umount /net/eagle/autofs

And with 1 export:
/       *(rw,sync,no_root_squash)

cd /net/eagle
cd
umount /net/eagle

And no problem seen.

More information needed.
Comment 6 Jason Welter 2006-07-21 10:22:29 EDT
Here are the exports in my auto.net file:
asml1           -fstype=nfs,soft,timeo=0,ro     asml1:/usr/asm
asml2           -fstype=nfs,soft,timeo=0,ro     asml2:/usr/asm
asml1_data      -fstype=nfs,soft,timeo=0,ro     asml1:/usr/asm/data.5465
asml2_data      -fstype=nfs,soft,timeo=0,ro     asml2:/usr/asm/data.9974

These two machines are Solaris 6 boxes.  I hope that doesn't make a
difference but I'm trying to be helpful.

I updated via yum, installed the latest kernel and had the same problem.
Current Kernel:     2.6.17-1.2142_FC4smp
rpm -qa util-linux: util-linux-2.12p-9.14

When I said it's highly repeatable I meant I can cause the error on my box
very easily.
Comment 7 Steve Dickson 2006-07-24 14:20:44 EDT
Would it be possible to get an bzip2-ed binary tethereal (or snoop) trace?
something something similar to tethereal -w /tmp/data.pcap <client>
Comment 8 Jason Welter 2006-07-25 13:30:37 EDT
Created attachment 133005 [details]
15 seconds of tethereal: Contains three kernel oops events in rapid succession triggered by umouting NFS partition.
Comment 9 Dave Jones 2006-09-16 23:27:40 EDT
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.
Comment 10 Jason Welter 2006-09-18 08:49:05 EDT
I have not been able to cause the oops for about a month now.
I hope one of the latest updates I applied in July or August
has fixed it but have not proven anything.  It's OK to take
this off the list.

Note You need to log in before you can comment on or make changes to this bug.