Red Hat Bugzilla – Bug 174335
kernel oops after autofs fails to connect via smbfs
Last modified: 2007-11-30 17:11:18 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
Description of problem:
I've got a fully updated Fedora Core 4 server crashing hard every week or
two. I use autofs to read & delete log files on 17 XP boxs and 6 NT4SP6 boxes as well as a couple other Windows files servers every 5 minutes. The first indication of a problem I get is smbmount stops working, then the server becomes unresponsive to the point where only a power slam will fix it, and it does fix it...for a few days.
I've been updating my kernel as often as a new one is released. Currently I'm running 2.6.14-1.1637_FC4smp.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. I wait 7-10 days
Actual Results: The mounts quit working. If I'm at work I restart, if not I'll get a call after 1-2 hours when every process on the server grinds to a halt.
Expected Results: The server should not crash, even if autofs quits working.
This is the system log from the last crash. I have logs from three other crashes
over the last month:
Nov 25 15:05:34 poseidon automount: failed to mount /win/prober01
Nov 25 15:05:41 poseidon automount: >> Error connecting to xxx.xxx.xxx.xxx (No route to host)
Nov 25 15:05:41 poseidon automount: >> 14453: Connection to SAW4341 failed
Nov 25 15:05:41 poseidon automount: >> SMB connection failed
Nov 25 15:05:41 poseidon automount: mount(generic): failed to mount //SAW4341/fabdata (type
smbfs) on /win/prober01
Nov 25 15:05:41 poseidon automount: failed to mount /win/prober01
Nov 25 15:07:55 poseidon kernel: BUG: spinlock lockup on CPU#1, smbmnt/14461, f8b7c790 (Not tainte
Nov 25 15:07:55 poseidon kernel: [<c01decc3>] __spin_lock_debug+0xac/0xcf
Nov 25 15:07:55 poseidon kernel: [<c01ded32>] _raw_spin_lock+0x4c/0x6a
Nov 25 15:07:55 poseidon kernel: [<f8b75251>] smbiod_register_server+0xd/0x39 [smbfs]
Nov 25 15:07:55 poseidon kernel: [<f8b743da>] smb_fill_super+0x23b/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel: [<c01d9aba>] idr_get_new_above_int+0x5e/0xe9
Nov 25 15:07:55 poseidon kernel: [<c017de5f>] get_filesystem+0xf/0x36
Nov 25 15:07:55 poseidon kernel: [<c0169d70>] sget+0x161/0x16d
Nov 25 15:07:55 poseidon kernel: [<c016a420>] set_anon_super+0x0/0xa1
Nov 25 15:07:55 poseidon kernel: [<c016a6cf>] get_sb_nodev+0x37/0x71
Nov 25 15:07:55 poseidon kernel: [<c016a84a>] do_kern_mount+0xaf/0x14a
Nov 25 15:07:55 poseidon kernel: [<f8b7419f>] smb_fill_super+0x0/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel: [<c017f314>] do_new_mount+0x6b/0x90
Nov 25 15:07:55 poseidon kernel: [<c017f991>] do_mount+0x18b/0x1a9
Nov 25 15:07:55 poseidon kernel: [<c017fd62>] sys_mount+0x77/0xae
Nov 25 15:07:55 poseidon kernel: [<c01039e1>] syscall_call+0x7/0xb
Nov 25 15:57:41 poseidon kernel: input: AT Translated Set 2 keyboard on isa0060/serio0
Nov 25 16:01:30 poseidon syslogd 1.4.1: restart.
Nov 25 16:01:30 poseidon kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 25 16:01:30 poseidon kernel: Linux version 2.6.14-1.1637_FC4smp (firstname.lastname@example.org
dhat.com) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Nov 9 18:34:11 EST 2005
Unfortunately, this isn't enough information to debug the problem. I need to
see what's going on on the other CPUs. From the trace above, this really looks
like an smbfs bug.
Next time this happens, please get the output from sysrq-t. Thanks.
I turned off hyperthreading and bumped the Samba debug level to 9.
If it doesn't crash this weekend then I'll know more.
How you I get the output from sysrq-t?
So long as the system is not completely hung, you can do the following, as root:
# sysctl -w kernel/sysrq=1
# echo t > /proc/sysrq-trigger
Or, from the console, you can hit <Alt><Sysrq>t
The output will be logged in /var/log/messages.
Created attachment 121607 [details]
System Log From Previous Boot to Crash & Restart
Here's the Latest System Log with SMB at debug level=9
1. I get an empty file when I try sysctl and echo.
2. The system crashed again. It was completely unresponsive to the keyboard
so I couldn't have retrieved the sysctl output even if it generated anything.
3. I've attached the latest system log from a very recent boot to the crash.
/proc/sysrq-trigger is always going to be an empty file. When echoing to it, it
should generate kernel printk's, and those should show up on the console and in
Nov 29 13:21:31 poseidon kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000001
Nov 29 13:21:31 poseidon kernel: EIP is at smbiod+0xef/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel: Call Trace:
Nov 29 13:21:31 poseidon kernel: [<c01341b6>] autoremove_wake_function+0x0/0x37
Nov 29 13:21:31 poseidon kernel: [<f8b75565>] smbiod+0x0/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel: [<c0101d5d>] kernel_thread_helper+0x5/0xb
and then a few minutes later you get your crash:
Nov 29 13:24:17 poseidon kernel: BUG: spinlock lockup on CPU#0, smbmnt/3140, f8b
7c790 (Not tainted)
This is definitely not an autofs bug. This code is pretty much abandoned. Is
there any way you can use cifs in your environment?
I just finished researching cifs and implimented it (which consisted of a few
changes to auto.windows and hosts). It was very simple and looks good. I
won't know for sure for a couple of days. Thanks for the suggestion.
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
Closing per last comment.
Note that cifs apparently works and smbfs is deprecated.
cifs has been working in a production enviornment for 6 months now.
I have no need of smbfs any more. I don't know if it works with new
kernels because I'm no longer using it.