Bug 174335 - kernel oops after autofs fails to connect via smbfs
kernel oops after autofs fails to connect via smbfs
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jeffrey Moyer
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-28 00:25 EST by Jason Welter
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-05 10:59:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
System Log From Previous Boot to Crash & Restart (28.41 KB, application/x-zip-compressed)
2005-11-29 13:49 EST, Jason Welter
no flags Details

  None (edit)
Description Jason Welter 2005-11-28 00:25:21 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
I've got a fully updated Fedora Core 4 server crashing hard every week or
two.  I use autofs to read & delete log files on 17 XP boxs and 6 NT4SP6 boxes as well as a couple other Windows files servers every 5 minutes.  The first indication of a problem I get is smbmount stops working, then the server becomes unresponsive to the point where only a power slam will fix it, and it does fix it...for a few days.

I've been updating my kernel as often as a new one is released.  Currently I'm running 2.6.14-1.1637_FC4smp.


Version-Release number of selected component (if applicable):
autofs-4.1.4-5

How reproducible:
Sometimes

Steps to Reproduce:
1. I wait 7-10 days
2.
3.
  

Actual Results:  The mounts quit working.  If I'm at work I restart, if not I'll get a call after 1-2 hours when every process on the server grinds to a halt.

Expected Results:  The server should not crash, even if autofs quits working.

Additional info:

This is the system log from the last crash.  I have logs from three other crashes
over the last month:

################################################################################
Nov 25 15:05:34 poseidon automount[14437]: failed to mount /win/prober01
Nov 25 15:05:41 poseidon automount[14451]: >> Error connecting to xxx.xxx.xxx.xxx (No route to host)
Nov 25 15:05:41 poseidon automount[14451]: >> 14453: Connection to SAW4341 failed
Nov 25 15:05:41 poseidon automount[14451]: >> SMB connection failed
Nov 25 15:05:41 poseidon automount[14451]: mount(generic): failed to mount //SAW4341/fabdata (type
 smbfs) on /win/prober01
Nov 25 15:05:41 poseidon automount[14451]: failed to mount /win/prober01
Nov 25 15:07:55 poseidon kernel: BUG: spinlock lockup on CPU#1, smbmnt/14461, f8b7c790 (Not tainte
d)
Nov 25 15:07:55 poseidon kernel:  [<c01decc3>] __spin_lock_debug+0xac/0xcf
Nov 25 15:07:55 poseidon kernel:  [<c01ded32>] _raw_spin_lock+0x4c/0x6a
Nov 25 15:07:55 poseidon kernel:  [<f8b75251>] smbiod_register_server+0xd/0x39 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<f8b743da>] smb_fill_super+0x23b/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<c01d9aba>] idr_get_new_above_int+0x5e/0xe9
Nov 25 15:07:55 poseidon kernel:  [<c017de5f>] get_filesystem+0xf/0x36
Nov 25 15:07:55 poseidon kernel:  [<c0169d70>] sget+0x161/0x16d
Nov 25 15:07:55 poseidon kernel:  [<c016a420>] set_anon_super+0x0/0xa1
Nov 25 15:07:55 poseidon kernel:  [<c016a6cf>] get_sb_nodev+0x37/0x71
Nov 25 15:07:55 poseidon kernel:  [<c016a84a>] do_kern_mount+0xaf/0x14a
Nov 25 15:07:55 poseidon kernel:  [<f8b7419f>] smb_fill_super+0x0/0x3b5 [smbfs]
Nov 25 15:07:55 poseidon kernel:  [<c017f314>] do_new_mount+0x6b/0x90
Nov 25 15:07:55 poseidon kernel:  [<c017f991>] do_mount+0x18b/0x1a9
Nov 25 15:07:55 poseidon kernel:  [<c017fd62>] sys_mount+0x77/0xae
Nov 25 15:07:55 poseidon kernel:  [<c01039e1>] syscall_call+0x7/0xb
Nov 25 15:57:41 poseidon kernel: input: AT Translated Set 2 keyboard on isa0060/serio0
Nov 25 16:01:30 poseidon syslogd 1.4.1: restart.
Nov 25 16:01:30 poseidon kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 25 16:01:30 poseidon kernel: Linux version 2.6.14-1.1637_FC4smp (bhcompile@hs20-bc1-4.build.re
dhat.com) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Nov 9 18:34:11 EST 2005
Comment 1 Jeffrey Moyer 2005-11-28 12:47:09 EST
Unfortunately, this isn't enough information to debug the problem.  I need to
see what's going on on the other CPUs.  From the trace above, this really looks
like an smbfs bug.

Next time this happens, please get the output from sysrq-t.  Thanks.
Comment 2 Jason Welter 2005-11-29 11:43:57 EST
I turned off hyperthreading and bumped the Samba debug level to 9.
If it doesn't crash this weekend then I'll know more.

How you I get the output from sysrq-t?
Comment 3 Jeffrey Moyer 2005-11-29 11:47:44 EST
So long as the system is not completely hung, you can do the following, as root:

# sysctl -w kernel/sysrq=1
# echo t > /proc/sysrq-trigger

Or, from the console, you can hit <Alt><Sysrq>t

The output will be logged in /var/log/messages.

-Jeff
Comment 4 Jason Welter 2005-11-29 13:49:49 EST
Created attachment 121607 [details]
System Log From Previous Boot to Crash & Restart

Here's the Latest System Log with SMB at debug level=9
Comment 5 Jason Welter 2005-11-29 13:52:32 EST
1. I get an empty file when I try sysctl and echo.
2. The system crashed again.  It was completely unresponsive to the keyboard
so I couldn't have retrieved the sysctl output even if it generated anything.
3. I've attached the latest system log from a very recent boot to the crash.
Comment 6 Jeffrey Moyer 2005-11-29 14:16:42 EST
/proc/sysrq-trigger is always going to be an empty file.  When echoing to it, it
should generate kernel printk's, and those should show up on the console and in
the logs.

Nov 29 13:21:31 poseidon kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000001
Nov 29 13:21:31 poseidon kernel: EIP is at smbiod+0xef/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel: Call Trace:
Nov 29 13:21:31 poseidon kernel:  [<c01341b6>] autoremove_wake_function+0x0/0x37
Nov 29 13:21:31 poseidon kernel:  [<f8b75565>] smbiod+0x0/0x18a [smbfs]
Nov 29 13:21:31 poseidon kernel:  [<c0101d5d>] kernel_thread_helper+0x5/0xb

and then a few minutes later you get your crash:

Nov 29 13:24:17 poseidon kernel: BUG: spinlock lockup on CPU#0, smbmnt/3140, f8b
7c790 (Not tainted)

This is definitely not an autofs bug.  This code is pretty much abandoned.  Is
there any way you can use cifs in your environment?

Thanks.
Comment 7 Jason Welter 2005-11-30 11:44:04 EST
I just finished researching cifs and implimented it (which consisted of a few 
changes to auto.windows and hosts).  It was very simple and looks good.  I 
won't know for sure for a couple of days.  Thanks for the suggestion.
Comment 8 Dave Jones 2006-02-03 00:16:12 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 9 John Thacker 2006-05-05 10:59:07 EDT
Closing per last comment.
Note that cifs apparently works and smbfs is deprecated.
Comment 10 Jason Welter 2006-05-05 11:32:35 EDT
cifs has been working in a production enviornment for 6 months now.
I have no need of smbfs any more.  I don't know if it works with new
kernels because I'm no longer using it.

Note You need to log in before you can comment on or make changes to this bug.