Red Hat Bugzilla – Bug 206630
Kernel BUG in skb_gso_segment and crash.
Last modified: 2007-11-30 17:11:43 EST
Description of problem:
Kernel crashes few seconds after starting hercules-390 emulator.
Version-Release number of selected component (if applicable):
2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:36:02 EDT 2006 ppc64 ppc64 ppc64 GNU/Linux
tried to reproduce twice in a row - resulted in two crashes
Steps to Reproduce:
1. start hercules (with network support and CTCAs defined over tun/tap driver)
2. IPL operating system (MVS or VM/370). Start TCPIP in MVS or VM/370
3. use any networking function in MVS or VM
system crash with kernel BUG in skb_gso_segment at net/core/dev.c:1206
normal system operation
hercules.cnf loads network support and defines CTCA:
LDMOD dyninst.so hdt3270.so hdt3505.so hdt3525.so hdt1403.so hdt3088.so
044A,044B 3088 CTCI /dev/net/tun 1492 192.168.66.2 192.168.66.1 255.255.255.0
hardware platform: IBM pSeries 9117-570, linux for ppc64 running in LPAR with
0.5 real CPU and 3GB of RAM. lspci output:
00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0001:d0:01.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet
Controller (rev 03)
0001:d0:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet
Controller (rev 03)
0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0002:c8:01.0 SCSI storage controller: Mylex Corporation AcceleRAID
600/500/400/Sapphire support Device (rev 04)
0002:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
Created attachment 136352 [details]
debug information about kernel crash (stack backtrace, registers etc)
This was fixed ages ago. We really need to update the xen code in FC5.
*** Bug 206753 has been marked as a duplicate of this bug. ***
Same problem, i686smp system using the newer kernel 2.6.17-1.2187_FC5smp
BT from the last crash:
PID: 2646 TASK: cd0c7150 CPU: 0 COMMAND: "httpd"
#0 [cdf1ea98] crash_kexec at c0444941
#1 [cdf1eae0] die at c040547a
#2 [cdf1eb20] do_invalid_op at c0405bf5
#3 [cdf1ebd0] error_code (via invalid_op) at c04049d5
EAX: 00000000 EBX: cd91b124 ECX: 000111a3 EDX: 000111a3 EBP: 000111a3
DS: 007b ESI: cd91b124 ES: 007b EDI: 00000008
CS: 0060 EIP: c05be839 ERR: ffffffff EFLAGS: 00010297
#4 [cdf1ec04] skb_gso_segment at c05be839
#5 [cdf1ec18] dev_hard_start_xmit at c05bf955
#6 [cdf1ec3c] __qdisc_run at c05cdf7a
#7 [cdf1ec5c] dev_queue_xmit at c05c1371
#8 [cdf1ec78] ip_output at c05de9ba
#9 [cdf1eca4] ip_queue_xmit at c05de1ee
#10 [cdf1ed20] tcp_transmit_skb at c05eba0c
#11 [cdf1ed70] tcp_push_one at c05ed4c6
#12 [cdf1ed88] tcp_sendmsg at c05e3bb2
#13 [cdf1ee18] do_sock_write at c05b58ef
#14 [cdf1ee34] sock_writev at c05b7a4b
#15 [cdf1ef24] do_readv_writev at c046b73c
#16 [cdf1ef8c] vfs_writev at c046b877
#17 [cdf1ef9c] sys_writev at c046bce1
#18 [cdf1efb8] system_call at c0403e38
EAX: ffffffda EBX: 00000011 ECX: bffa3778 EDX: 00000004
DS: 007b ESI: 00000004 ES: 007b EDI: 002e5ff4
SS: 007b ESP: bffa35b0 EBP: bffa35d8
CS: 0073 EIP: 00ba9410 ERR: 00000092 EFLAGS: 00000246
crash> whatis skb_gso_segment
struct sk_buff *skb_gso_segment(struct sk_buff *, int);
Looks like that function is defined in the xen patch the rpmbuild process does
to the core kernel code...
Just to notice, that same kernel version (2.6.17-1.2187_FC5smp #1 SMP Mon Sep
11 02:07:57 EDT 2006 ppc ppc ppc GNU/Linux) on 32-bit PPC machine (7025-F50)
does NOT crash.
The new version doesn't crash on my i386 machine
Linux wooded.hillside.co.uk 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006 i686 i686 i386
- and version
Linux rose.cantweb.co.uk 2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:30:44 EDT 2006 x86_64 x86_64
is running happily on my X86_64.
The invalid op looks like a compiler error .... is it?
The bug is in the NAT code so it only shows up if you have the NAT module
loaded. As I said before, the bug is already fixed in rawhide pending another
Xen update for FC5.
*** Bug 204220 has been marked as a duplicate of this bug. ***
The 2.6.18 (2189) kernel in FC5 testing should cure this.
I'm getting "switchroot: mount failed" when booting the 2.6.18 kernel from
testing. Unfortunately I don't have access to the console myself, so getting
any further debugging is going to be hard.
The root FS is a software RAID5 running on 3 SATA drives on the following
00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02)
03:03.0 RAID bus controller: Silicon Image, Inc. Adaptec AAR-1210SA SATA
HostRAID Controller (rev 02)
Is there an archive of old FC5 updates anywhere so that I can downgrade to a
working kernel in the meantime? It seems that old updates are not available in
the updates repository. :(
You need to make sure that you've upgraded the xen package as well as the
kernel. If the problem persists, please file a new bug. Thanks.
*** Bug 209910 has been marked as a duplicate of this bug. ***
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
Kernel 2.6.18-1.2200.fc5.ppc64 seems to fix the problem. Connections initiated
through iptables NAT do not crash kernel anymore.
I can confirm that the new kernel 2.6.18-1.2200.fc5 stayed up today on my machine, so the bug I
reported appears to be fixed.
Many thanks for all your efforts.