From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 Description of problem: I've got a fully updated Fedora Core 4 server crashing hard every week or two. I use autofs to read & delete log files on 17 XP boxs and 6 NT4SP6 boxes as well as a couple other Windows files servers every 5 minutes. The first indication of a problem I get is smbmount stops working, then the server becomes unresponsive to the point where only a power slam will fix it, and it does fix it...for a few days. I've been updating my kernel as often as a new one is released. Currently I'm running 2.6.14-1.1637_FC4smp. Version-Release number of selected component (if applicable): autofs-4.1.4-5 How reproducible: Sometimes Steps to Reproduce: 1. I wait 7-10 days 2. 3. Actual Results: The mounts quit working. If I'm at work I restart, if not I'll get a call after 1-2 hours when every process on the server grinds to a halt. Expected Results: The server should not crash, even if autofs quits working. Additional info: This is the system log from the last crash. I have logs from three other crashes over the last month: ################################################################################ Nov 25 15:05:34 poseidon automount[14437]: failed to mount /win/prober01 Nov 25 15:05:41 poseidon automount[14451]: >> Error connecting to xxx.xxx.xxx.xxx (No route to host) Nov 25 15:05:41 poseidon automount[14451]: >> 14453: Connection to SAW4341 failed Nov 25 15:05:41 poseidon automount[14451]: >> SMB connection failed Nov 25 15:05:41 poseidon automount[14451]: mount(generic): failed to mount //SAW4341/fabdata (type smbfs) on /win/prober01 Nov 25 15:05:41 poseidon automount[14451]: failed to mount /win/prober01 Nov 25 15:07:55 poseidon kernel: BUG: spinlock lockup on CPU#1, smbmnt/14461, f8b7c790 (Not tainte d) Nov 25 15:07:55 poseidon kernel: [<c01decc3>] __spin_lock_debug+0xac/0xcf Nov 25 15:07:55 poseidon kernel: [<c01ded32>] _raw_spin_lock+0x4c/0x6a Nov 25 15:07:55 poseidon kernel: [<f8b75251>] smbiod_register_server+0xd/0x39 [smbfs] Nov 25 15:07:55 poseidon kernel: [<f8b743da>] smb_fill_super+0x23b/0x3b5 [smbfs] Nov 25 15:07:55 poseidon kernel: [<c01d9aba>] idr_get_new_above_int+0x5e/0xe9 Nov 25 15:07:55 poseidon kernel: [<c017de5f>] get_filesystem+0xf/0x36 Nov 25 15:07:55 poseidon kernel: [<c0169d70>] sget+0x161/0x16d Nov 25 15:07:55 poseidon kernel: [<c016a420>] set_anon_super+0x0/0xa1 Nov 25 15:07:55 poseidon kernel: [<c016a6cf>] get_sb_nodev+0x37/0x71 Nov 25 15:07:55 poseidon kernel: [<c016a84a>] do_kern_mount+0xaf/0x14a Nov 25 15:07:55 poseidon kernel: [<f8b7419f>] smb_fill_super+0x0/0x3b5 [smbfs] Nov 25 15:07:55 poseidon kernel: [<c017f314>] do_new_mount+0x6b/0x90 Nov 25 15:07:55 poseidon kernel: [<c017f991>] do_mount+0x18b/0x1a9 Nov 25 15:07:55 poseidon kernel: [<c017fd62>] sys_mount+0x77/0xae Nov 25 15:07:55 poseidon kernel: [<c01039e1>] syscall_call+0x7/0xb Nov 25 15:57:41 poseidon kernel: input: AT Translated Set 2 keyboard on isa0060/serio0 Nov 25 16:01:30 poseidon syslogd 1.4.1: restart. Nov 25 16:01:30 poseidon kernel: klogd 1.4.1, log source = /proc/kmsg started. Nov 25 16:01:30 poseidon kernel: Linux version 2.6.14-1.1637_FC4smp (bhcompile.re dhat.com) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Nov 9 18:34:11 EST 2005
Unfortunately, this isn't enough information to debug the problem. I need to see what's going on on the other CPUs. From the trace above, this really looks like an smbfs bug. Next time this happens, please get the output from sysrq-t. Thanks.
I turned off hyperthreading and bumped the Samba debug level to 9. If it doesn't crash this weekend then I'll know more. How you I get the output from sysrq-t?
So long as the system is not completely hung, you can do the following, as root: # sysctl -w kernel/sysrq=1 # echo t > /proc/sysrq-trigger Or, from the console, you can hit <Alt><Sysrq>t The output will be logged in /var/log/messages. -Jeff
Created attachment 121607 [details] System Log From Previous Boot to Crash & Restart Here's the Latest System Log with SMB at debug level=9
1. I get an empty file when I try sysctl and echo. 2. The system crashed again. It was completely unresponsive to the keyboard so I couldn't have retrieved the sysctl output even if it generated anything. 3. I've attached the latest system log from a very recent boot to the crash.
/proc/sysrq-trigger is always going to be an empty file. When echoing to it, it should generate kernel printk's, and those should show up on the console and in the logs. Nov 29 13:21:31 poseidon kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000001 Nov 29 13:21:31 poseidon kernel: EIP is at smbiod+0xef/0x18a [smbfs] Nov 29 13:21:31 poseidon kernel: Call Trace: Nov 29 13:21:31 poseidon kernel: [<c01341b6>] autoremove_wake_function+0x0/0x37 Nov 29 13:21:31 poseidon kernel: [<f8b75565>] smbiod+0x0/0x18a [smbfs] Nov 29 13:21:31 poseidon kernel: [<c0101d5d>] kernel_thread_helper+0x5/0xb and then a few minutes later you get your crash: Nov 29 13:24:17 poseidon kernel: BUG: spinlock lockup on CPU#0, smbmnt/3140, f8b 7c790 (Not tainted) This is definitely not an autofs bug. This code is pretty much abandoned. Is there any way you can use cifs in your environment? Thanks.
I just finished researching cifs and implimented it (which consisted of a few changes to auto.windows and hosts). It was very simple and looks good. I won't know for sure for a couple of days. Thanks for the suggestion.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per last comment. Note that cifs apparently works and smbfs is deprecated.
cifs has been working in a production enviornment for 6 months now. I have no need of smbfs any more. I don't know if it works with new kernels because I'm no longer using it.