174335 – kernel oops after autofs fails to connect via smbfs

Bug 174335 - kernel oops after autofs fails to connect via smbfs

Summary: kernel oops after autofs fails to connect via smbfs

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jeff Moyer
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-28 05:25 UTC by Jason Welter
Modified:	2007-11-30 22:11 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-05-05 14:59:07 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
System Log From Previous Boot to Crash & Restart (28.41 KB, application/x-zip-compressed) 2005-11-29 18:49 UTC, Jason Welter	no flags	Details
View All

Description Jason Welter 2005-11-28 05:25:21 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
I've got a fully updated Fedora Core 4 server crashing hard every week or
two.  I use autofs to read & delete log files on 17 XP boxs and 6 NT4SP6 boxes as well as a couple other Windows files servers every 5 minutes.  The first indication of a problem I get is smbmount stops working, then the server becomes unresponsive to the point where only a power slam will fix it, and it does fix it...for a few days.

I've been updating my kernel as often as a new one is released.  Currently I'm running 2.6.14-1.1637_FC4smp.


Version-Release number of selected component (if applicable):
autofs-4.1.4-5

How reproducible:
Sometimes

Steps to Reproduce:
1. I wait 7-10 days
2.
3.
  

Actual Results:  The mounts quit working.  If I'm at work I restart, if not I'll get a call after 1-2 hours when every process on the server grinds to a halt.

Expected Results:  The server should not crash, even if autofs quits working.

Additional info:

This is the system log from the last crash.  I have logs from three other crashes
over the last month:

################################################################################
Nov 25 15:05:34 poseidon automount[14437]: failed to mount /win/prober01
Nov 25 15:05:41 poseidon automount[14451]: >> Error connecting to xxx.xxx.xxx.xxx (No route to host)
Nov 25 15:05:41 poseidon automount[14451]: >> 14453: Connection to SAW4341 failed
Nov 25 15:05:41 poseidon automount[14451]: >> SMB connection failed
Nov 25 15:05:41 poseidon automount[14451]: mount(generic): failed to mount //SAW4341/fabdata (type
 smbfs) on /win/prober01
Nov 25 15:05:41 poseidon automount[14451]: failed to mount /win/prober01
Nov 25 15:07:55 poseidon kernel: BUG: spinlock lockup on CPU#1, smbmnt/14461, f8b7c790 (Not tainte
d)
Nov 25 15:07:55 poseidon kernel:  [<c01decc3>] __spin_lock_debug+0xac/0xcf
Nov 25 15:07:55 poseidon kernel:  [<c01ded32>] _raw_spin_lock+0x4c/0x6a
Nov 25 15:07:55 poseidon kernel:  [<f8b75251>] smbiod_register_server+0xd/0x39 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<f8b743da>] smb_fill_super+0x23b/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<c01d9aba>] idr_get_new_above_int+0x5e/0xe9
Nov 25 15:07:55 poseidon kernel:  [<c017de5f>] get_filesystem+0xf/0x36
Nov 25 15:07:55 poseidon kernel:  [<c0169d70>] sget+0x161/0x16d
Nov 25 15:07:55 poseidon kernel:  [<c016a420>] set_anon_super+0x0/0xa1
Nov 25 15:07:55 poseidon kernel:  [<c016a6cf>] get_sb_nodev+0x37/0x71
Nov 25 15:07:55 poseidon kernel:  [<c016a84a>] do_kern_mount+0xaf/0x14a
Nov 25 15:07:55 poseidon kernel:  [<f8b7419f>] smb_fill_super+0x0/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<c017f314>] do_new_mount+0x6b/0x90
Nov 25 15:07:55 poseidon kernel:  [<c017f991>] do_mount+0x18b/0x1a9
Nov 25 15:07:55 poseidon kernel:  [<c017fd62>] sys_mount+0x77/0xae
Nov 25 15:07:55 poseidon kernel:  [<c01039e1>] syscall_call+0x7/0xb
Nov 25 15:57:41 poseidon kernel: input: AT Translated Set 2 keyboard on isa0060/serio0
Nov 25 16:01:30 poseidon syslogd 1.4.1: restart.
Nov 25 16:01:30 poseidon kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 25 16:01:30 poseidon kernel: Linux version 2.6.14-1.1637_FC4smp (bhcompile.re
dhat.com) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Nov 9 18:34:11 EST 2005

Comment 1 Jeff Moyer 2005-11-28 17:47:09 UTC

Unfortunately, this isn't enough information to debug the problem.  I need to
see what's going on on the other CPUs.  From the trace above, this really looks
like an smbfs bug.

Next time this happens, please get the output from sysrq-t.  Thanks.

Comment 2 Jason Welter 2005-11-29 16:43:57 UTC

I turned off hyperthreading and bumped the Samba debug level to 9.
If it doesn't crash this weekend then I'll know more.

How you I get the output from sysrq-t?

Comment 3 Jeff Moyer 2005-11-29 16:47:44 UTC

So long as the system is not completely hung, you can do the following, as root:

# sysctl -w kernel/sysrq=1
# echo t > /proc/sysrq-trigger

Or, from the console, you can hit <Alt><Sysrq>t

The output will be logged in /var/log/messages.

-Jeff

Comment 4 Jason Welter 2005-11-29 18:49:49 UTC

Created attachment 121607 [details]
System Log From Previous Boot to Crash & Restart

Here's the Latest System Log with SMB at debug level=9

Comment 5 Jason Welter 2005-11-29 18:52:32 UTC

1. I get an empty file when I try sysctl and echo.
2. The system crashed again.  It was completely unresponsive to the keyboard
so I couldn't have retrieved the sysctl output even if it generated anything.
3. I've attached the latest system log from a very recent boot to the crash.

Comment 6 Jeff Moyer 2005-11-29 19:16:42 UTC

/proc/sysrq-trigger is always going to be an empty file.  When echoing to it, it
should generate kernel printk's, and those should show up on the console and in
the logs.

Nov 29 13:21:31 poseidon kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000001
Nov 29 13:21:31 poseidon kernel: EIP is at smbiod+0xef/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel: Call Trace:
Nov 29 13:21:31 poseidon kernel:  [<c01341b6>] autoremove_wake_function+0x0/0x37
Nov 29 13:21:31 poseidon kernel:  [<f8b75565>] smbiod+0x0/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel:  [<c0101d5d>] kernel_thread_helper+0x5/0xb

and then a few minutes later you get your crash:

Nov 29 13:24:17 poseidon kernel: BUG: spinlock lockup on CPU#0, smbmnt/3140, f8b
7c790 (Not tainted)

This is definitely not an autofs bug.  This code is pretty much abandoned.  Is
there any way you can use cifs in your environment?

Thanks.

Comment 7 Jason Welter 2005-11-30 16:44:04 UTC

I just finished researching cifs and implimented it (which consisted of a few 
changes to auto.windows and hosts).  It was very simple and looks good.  I 
won't know for sure for a couple of days.  Thanks for the suggestion.

Comment 8 Dave Jones 2006-02-03 05:16:12 UTC

This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.

Comment 9 John Thacker 2006-05-05 14:59:07 UTC

Closing per last comment.
Note that cifs apparently works and smbfs is deprecated.

Comment 10 Jason Welter 2006-05-05 15:32:35 UTC

cifs has been working in a production enviornment for 6 months now.
I have no need of smbfs any more.  I don't know if it works with new
kernels because I'm no longer using it.

Note You need to log in before you can comment on or make changes to this bug.