Description of problem: Kernel crashes few seconds after starting hercules-390 emulator. Version-Release number of selected component (if applicable): 2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:36:02 EDT 2006 ppc64 ppc64 ppc64 GNU/Linux hercules-3.04.1-3.fc5 How reproducible: tried to reproduce twice in a row - resulted in two crashes Steps to Reproduce: 1. start hercules (with network support and CTCAs defined over tun/tap driver) 2. IPL operating system (MVS or VM/370). Start TCPIP in MVS or VM/370 3. use any networking function in MVS or VM Actual results: system crash with kernel BUG in skb_gso_segment at net/core/dev.c:1206 Expected results: normal system operation Additional info: hercules.cnf loads network support and defines CTCA: LDMOD dyninst.so hdt3270.so hdt3505.so hdt3525.so hdt1403.so hdt3088.so hdt3420.so hdteq.so 044A,044B 3088 CTCI /dev/net/tun 1492 192.168.66.2 192.168.66.1 255.255.255.0 hardware platform: IBM pSeries 9117-570, linux for ppc64 running in LPAR with 0.5 real CPU and 3GB of RAM. lspci output: 00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03) d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01) 0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03) 0001:d0:01.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 0001:d0:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03) 0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03) 0002:c8:01.0 SCSI storage controller: Mylex Corporation AcceleRAID 600/500/400/Sapphire support Device (rev 04) 0002:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
Created attachment 136352 [details] debug information about kernel crash (stack backtrace, registers etc)
This was fixed ages ago. We really need to update the xen code in FC5.
*** Bug 206753 has been marked as a duplicate of this bug. ***
Same problem, i686smp system using the newer kernel 2.6.17-1.2187_FC5smp BT from the last crash: PID: 2646 TASK: cd0c7150 CPU: 0 COMMAND: "httpd" #0 [cdf1ea98] crash_kexec at c0444941 #1 [cdf1eae0] die at c040547a #2 [cdf1eb20] do_invalid_op at c0405bf5 #3 [cdf1ebd0] error_code (via invalid_op) at c04049d5 EAX: 00000000 EBX: cd91b124 ECX: 000111a3 EDX: 000111a3 EBP: 000111a3 DS: 007b ESI: cd91b124 ES: 007b EDI: 00000008 CS: 0060 EIP: c05be839 ERR: ffffffff EFLAGS: 00010297 #4 [cdf1ec04] skb_gso_segment at c05be839 #5 [cdf1ec18] dev_hard_start_xmit at c05bf955 #6 [cdf1ec3c] __qdisc_run at c05cdf7a #7 [cdf1ec5c] dev_queue_xmit at c05c1371 #8 [cdf1ec78] ip_output at c05de9ba #9 [cdf1eca4] ip_queue_xmit at c05de1ee #10 [cdf1ed20] tcp_transmit_skb at c05eba0c #11 [cdf1ed70] tcp_push_one at c05ed4c6 #12 [cdf1ed88] tcp_sendmsg at c05e3bb2 #13 [cdf1ee18] do_sock_write at c05b58ef #14 [cdf1ee34] sock_writev at c05b7a4b #15 [cdf1ef24] do_readv_writev at c046b73c #16 [cdf1ef8c] vfs_writev at c046b877 #17 [cdf1ef9c] sys_writev at c046bce1 #18 [cdf1efb8] system_call at c0403e38 EAX: ffffffda EBX: 00000011 ECX: bffa3778 EDX: 00000004 DS: 007b ESI: 00000004 ES: 007b EDI: 002e5ff4 SS: 007b ESP: bffa35b0 EBP: bffa35d8 CS: 0073 EIP: 00ba9410 ERR: 00000092 EFLAGS: 00000246 crash> whatis skb_gso_segment struct sk_buff *skb_gso_segment(struct sk_buff *, int); Looks like that function is defined in the xen patch the rpmbuild process does to the core kernel code...
Just to notice, that same kernel version (2.6.17-1.2187_FC5smp #1 SMP Mon Sep 11 02:07:57 EDT 2006 ppc ppc ppc GNU/Linux) on 32-bit PPC machine (7025-F50) does NOT crash.
The new version doesn't crash on my i386 machine Linux wooded.hillside.co.uk 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006 i686 i686 i386 GNU/Linux - and version Linux rose.cantweb.co.uk 2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:30:44 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux is running happily on my X86_64. The invalid op looks like a compiler error .... is it?
The bug is in the NAT code so it only shows up if you have the NAT module loaded. As I said before, the bug is already fixed in rawhide pending another Xen update for FC5.
*** Bug 204220 has been marked as a duplicate of this bug. ***
The 2.6.18 (2189) kernel in FC5 testing should cure this.
I'm getting "switchroot: mount failed" when booting the 2.6.18 kernel from testing. Unfortunately I don't have access to the console myself, so getting any further debugging is going to be hard. The root FS is a software RAID5 running on 3 SATA drives on the following interfaces: 00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02) 03:03.0 RAID bus controller: Silicon Image, Inc. Adaptec AAR-1210SA SATA HostRAID Controller (rev 02) Is there an archive of old FC5 updates anywhere so that I can downgrade to a working kernel in the meantime? It seems that old updates are not available in the updates repository. :(
You need to make sure that you've upgraded the xen package as well as the kernel. If the problem persists, please file a new bug. Thanks.
*** Bug 209910 has been marked as a duplicate of this bug. ***
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Kernel 2.6.18-1.2200.fc5.ppc64 seems to fix the problem. Connections initiated through iptables NAT do not crash kernel anymore. Alexey bozy
I can confirm that the new kernel 2.6.18-1.2200.fc5 stayed up today on my machine, so the bug I reported appears to be fixed. Many thanks for all your efforts.