Description of problem: Kernel Panic when Samba share is access that is served from this box. Does not happen with kernel before 2.6.17-1.2174_FC5. Message dumped to console is: 'Kernel Panic not syncing: Aiee, killing interrupt handler' There is nothing in the logs the kernel must panic before anything can be written to the log files. The box completely locks up. Version-Release number of selected component (if applicable): 2.6.17-1.2187_FC5 samba-3.0.23a-1.fc5.1 How reproducible: Every time. Steps to Reproduce: Access samba share on the machine when it is running kernel version 2.6.17-1.2187_FC5 Actual results: Kernel panic. Expected results: Normal operation. Additional info: This a pretty standard machine. The system has some large XFS file systems which are served via Samba.
I forgot to say that if I shut Samba down or not allow clients to access the shares the machine seem to work fine. I can browse the file systems, start a X session, browse the Web and do network stuff. Seem to be just Samba and the new kernel causing the panic.
Created attachment 136474 [details] /proc/cpuinfo
Created attachment 136475 [details] /proc/interrupts
Created attachment 136476 [details] lsmod lspci /proc/version
I have just done some testing and it is not just samba traffic that causes the panic. Any network traffic to a service on the machine get a panic. For example if I access a web page via apache on this machine it panics. I am going to see if I can get a null modem cable to see if I can capture the panic.
I get the panic serving up NFS from the 2187 kernel. I mount and chdir a few levels into the mount fine, but when I try to do a list, the server panics. Clients running the 2187 kernel do not panic. FWIW the 3001 build at http://people.redhat.com/davej/kernels/Fedora/FC5/ does not exhibit this problem for me.
By anychance are you using the forcedeth network driver? http://forums.fedoraforum.org/showthread.php?p=611858#post611858 I am so could be this.
Created attachment 136598 [details] ftpd panic png
Created attachment 136599 [details] nfsd panic png
Created attachment 136600 [details] sshd panic png
These kernel panics occur on a machine using e1000.ko. I am unable to recreate the panic on machines using 3c59x.ko and e100.ko. I've attached the pngs for reference, I guess, since the test kernel already appears to have fixed (or at least addressed) any bug.
i'm the other guy at http://forums.fedoraforum.org/showthread.php?p=611858#post611858 with the forcedepth driver. running 32bit on an amd
Created attachment 136606 [details] cpuinfo /proc/cpuinfo
Created attachment 136607 [details] interrupts /proc/interrupts
Created attachment 136608 [details] lspci lspci
Probably the same problem as my bug 206901. Yes, I'm using forcedeth on one of my NICs. Running smp 32 bit kernel.
I have installed the 3001 build from http://people.redhat.com/davej/kernels/Fedora/FC5/ and I can confirm also that there seems to be no problem with this kernel.
Also getting kernel panics late during the boot process (just after samba starts), using the forcedeth driver. Couldn't find a newer kernel at davej's site.... looks like he has taken them down.
"Me too". Upgraded to kernel-smp-2.6.17-1.2187_FC5 (skipped 2174) and have had 3 hard freezes since. I'm trying to figure out how best to see the panic dumps (my symptom is X just freezes and I see no panic debug output). And "me too" on using e1000. I have an Intel PRO/1000 MT Server NIC with jumbo frames on. And my hunch is the bug is network related: Crash: #1: can't remember #2: froze immediately when I released the mouse button to drop some files in Nautilus on an XP file share. #3: froze when I opened an ssh window from XP to linux and ran dmesg: it output about 2 pages of dmesg and froze It doesn't appear to be samba specific, so someone change the summary. If everyone here has e1000 (esp the higher end ones with all the advanced features) then I think we have found the culprit. I am now running 2174 and it hasn't crashed yet (24 hours, light usage but lots of NFS and remote MythTV). I no longer have the previous kernel installed but will go back to that if 2174 crashes again. If I can capture a panic dump I will post it here.
PS: I don't think I'm using "forcedeth", in fact I have no idea what that is. But if it's a mod, lsmod doesn't show it on my system.
pza wrote in Comment #18 > Couldn't find a newer kernel at davej's > site.... looks like he has taken them down. The kernel has gone from unofficial testing to official testing. It's now at http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/5/ as build 2189. That means if you find bugs in it, you can file them with Bugzilla, whereas with unofficial tests you could only complain about it on the lists. The announcement is at https://www.redhat.com/archives/fedora-test-list/2006-September/msg00746.html P.S. bart can probably close this bug as "Rawhide" if no one objects.
I am also having this panic with 2187 and forcedeth driver on a sun Ultra 20
I've also tried this kernel and eperience a hard lock when any network traffic goes over my eth0 link which is using the e1000 driver. I have no problems at all with data traversing my eth1 link which is using the natsemi driver. Have reverted back to the 2174 kernel for the moment. no problems with that one.
I'm experiencing the exact same problem (kernel panic) with 2.6.17-1.2187_FC5 on a x86_64 box while trying to transfer files over HTTP or SCP. The funny thing is the machine has two ethernet NICs and the problem occurs only when using the integrated nvidia controller. Serving files to the other network (via the second NIC) works fine!
This just happened on a friend's system too, when upgraded to FC5 2187. His system has an onboard e1000. So this bug is not just limited to high-end MT e1000 server cards. If it occurs with all e1000's then this is a very nasty bug indeed. As an interesting note, my friend has his box (a router/firewall) unplugged from the LAN (via e1000) for a few days, though it was still on the WAN (via tulip). It was 100% stable until the e1000/LAN was plugged in, then it had about 10 panics in 2 days. As a follow-up to comment #19, my own system has not crashed at all in over a week using 2174. My friend's system also seems stable (for 6 hours so far) with 2174. If this bug has been closed-rawhide, does that mean an errata will be issued for FC5? I can't see any mention of anyone here actually finding the source of this bug and what the code fixes are. Has it been fixed upstream?
"Me too" We've been setting up all of our new servers with FC5 and things had been very stable until we updated 2 of our systems with 2187. Both systems frequently crashed on samba share access, and when samba was shut down, everything appeared to be stable again (Unfortunately samba is a vital service on these systems). I pulled the 2174 rpm from our stable system, installed it on the unstable box, and we've been stable ever since. All of our systems have at least one e1000 interface in use. My nic is reported by lspci as "Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04)".
Follow-up to comment #21, where this was set to CLOSED RAWHIDE. I checked the 2189 link and details and I see no mention of anything related to this bug. Has someone tested 2189 or newer and shown this bug to be fixed there? Does anyone have any info on the cause/fix? I still don't see why this bug is closed. Everyone with a e1000 is going to get bitten by this one -- I predict a new "me too" posted every day.
Hi It was closed as rawhide because the dev kernel 3001 and the kernel in rawhide fixes the problems with the forcedepth ehternet driver causing this lockup. I have tested the 3001 and have been running this kernel for the last few weeks with no problems as all. The 3001 kernel seem to be very stable. Regards Daniel
Has anyone here tested 2200 to see if that fixes this bug? I don't have the guts to do this on my machines -- they are all production servers.
2200 works with no problems at all on my machine. I have been up for 3 days so far with no problems.
Just updated one of our newly deployed FC5 systems with .2200 and started up 4 rsync sessions. On .2187 just one session kp'ed the server in a matter of seconds. As of this moment, 2 of the 4 have completed successfully and the other two are still in progress (fairly large transfer). So from my experience, 2200 fixed the issue.
I have also applied the .2200 X86_64 kernel and the system has been running stable for over 24hrs now. (e1000 driver or associated in .2187 locked the system)