Bug 206630

Summary: Kernel BUG in skb_gso_segment and crash.
Product: [Fedora] Fedora Reporter: Alexey Bozrikov <a>
Component: kernelAssignee: Herbert Xu <herbert.xu>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bugs-redhat, davej, master, pc, steve, ville.lindfors, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.18-1.2200.fc5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-17 06:45:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug information about kernel crash (stack backtrace, registers etc) none

Description Alexey Bozrikov 2006-09-15 12:45:11 UTC
Description of problem:
Kernel crashes few seconds after starting hercules-390 emulator.

Version-Release number of selected component (if applicable):
2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:36:02 EDT 2006 ppc64 ppc64 ppc64 GNU/Linux
hercules-3.04.1-3.fc5

How reproducible:
tried to reproduce twice in a row - resulted in two crashes

Steps to Reproduce:
1. start hercules (with network support and CTCAs defined over tun/tap driver) 
2. IPL operating system (MVS or VM/370). Start TCPIP in MVS or VM/370
3. use any networking function in MVS or VM
  
Actual results:
system crash with kernel BUG in skb_gso_segment at net/core/dev.c:1206

Expected results:
normal system operation

Additional info:
hercules.cnf loads network support and defines CTCA:
LDMOD dyninst.so hdt3270.so hdt3505.so hdt3525.so hdt1403.so hdt3088.so 
hdt3420.so hdteq.so
044A,044B 3088 CTCI /dev/net/tun 1492 192.168.66.2  192.168.66.1  255.255.255.0

hardware platform: IBM pSeries 9117-570, linux for ppc64 running in LPAR with 
0.5 real CPU and 3GB of RAM. lspci output:
00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0001:d0:01.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
0001:d0:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
0002:c8:01.0 SCSI storage controller: Mylex Corporation AcceleRAID 
600/500/400/Sapphire support Device (rev 04)
0002:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)

Comment 1 Alexey Bozrikov 2006-09-15 12:45:11 UTC
Created attachment 136352 [details]
debug information about kernel crash (stack backtrace, registers etc)

Comment 2 Herbert Xu 2006-09-17 23:50:39 UTC
This was fixed ages ago.  We really need to update the xen code in FC5.

Comment 3 Herbert Xu 2006-09-17 23:56:42 UTC
*** Bug 206753 has been marked as a duplicate of this bug. ***

Comment 4 Need Real Name 2006-09-20 05:23:04 UTC
Same problem, i686smp system using the newer kernel 2.6.17-1.2187_FC5smp

BT from the last crash:
PID: 2646   TASK: cd0c7150  CPU: 0   COMMAND: "httpd"
 #0 [cdf1ea98] crash_kexec at c0444941
 #1 [cdf1eae0] die at c040547a
 #2 [cdf1eb20] do_invalid_op at c0405bf5
 #3 [cdf1ebd0] error_code (via invalid_op) at c04049d5
    EAX: 00000000  EBX: cd91b124  ECX: 000111a3  EDX: 000111a3  EBP: 000111a3
    DS:  007b      ESI: cd91b124  ES:  007b      EDI: 00000008
    CS:  0060      EIP: c05be839  ERR: ffffffff  EFLAGS: 00010297
 #4 [cdf1ec04] skb_gso_segment at c05be839
 #5 [cdf1ec18] dev_hard_start_xmit at c05bf955
 #6 [cdf1ec3c] __qdisc_run at c05cdf7a
 #7 [cdf1ec5c] dev_queue_xmit at c05c1371
 #8 [cdf1ec78] ip_output at c05de9ba
 #9 [cdf1eca4] ip_queue_xmit at c05de1ee
#10 [cdf1ed20] tcp_transmit_skb at c05eba0c
#11 [cdf1ed70] tcp_push_one at c05ed4c6
#12 [cdf1ed88] tcp_sendmsg at c05e3bb2
#13 [cdf1ee18] do_sock_write at c05b58ef
#14 [cdf1ee34] sock_writev at c05b7a4b
#15 [cdf1ef24] do_readv_writev at c046b73c
#16 [cdf1ef8c] vfs_writev at c046b877
#17 [cdf1ef9c] sys_writev at c046bce1
#18 [cdf1efb8] system_call at c0403e38
    EAX: ffffffda  EBX: 00000011  ECX: bffa3778  EDX: 00000004
    DS:  007b      ESI: 00000004  ES:  007b      EDI: 002e5ff4
    SS:  007b      ESP: bffa35b0  EBP: bffa35d8
    CS:  0073      EIP: 00ba9410  ERR: 00000092  EFLAGS: 00000246

crash> whatis skb_gso_segment
struct sk_buff *skb_gso_segment(struct sk_buff *, int);

Looks like that function is defined in the xen patch the rpmbuild process does
to the core kernel code...

Comment 5 Alexey Bozrikov 2006-09-20 06:18:07 UTC
Just to notice, that same kernel version (2.6.17-1.2187_FC5smp #1 SMP Mon Sep 
11 02:07:57 EDT 2006 ppc ppc ppc GNU/Linux) on 32-bit PPC machine (7025-F50) 
does NOT crash.

Comment 6 Peter Collinson 2006-09-20 18:54:23 UTC
The  new version doesn't crash on my i386 machine 

Linux wooded.hillside.co.uk 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006 i686 i686 i386 
GNU/Linux

- and version 
Linux rose.cantweb.co.uk 2.6.17-1.2174_FC5 #1 SMP Tue Aug 8 15:30:44 EDT 2006 x86_64 x86_64 
x86_64 GNU/Linux
is running happily on my X86_64.

The invalid op looks like a compiler error .... is it?

Comment 7 Herbert Xu 2006-09-21 01:04:18 UTC
The bug is in the NAT code so it only shows up if you have the NAT module
loaded.  As I said before, the bug is already fixed in rawhide pending another
Xen update for FC5.

Comment 8 Herbert Xu 2006-09-21 01:06:29 UTC
*** Bug 204220 has been marked as a duplicate of this bug. ***

Comment 9 Herbert Xu 2006-09-27 10:57:37 UTC
The 2.6.18 (2189) kernel in FC5 testing should cure this.

Comment 10 Steve Hill 2006-09-27 14:54:46 UTC
I'm getting "switchroot: mount failed" when booting the 2.6.18 kernel from
testing.  Unfortunately I don't have access to the console myself, so getting
any further debugging is going to be hard.

The root FS is a software RAID5 running on 3 SATA drives on the following
interfaces:
00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02)
03:03.0 RAID bus controller: Silicon Image, Inc. Adaptec AAR-1210SA SATA
HostRAID Controller (rev 02)

Is there an archive of old FC5 updates anywhere so that I can downgrade to a
working kernel in the meantime?  It seems that old updates are not available in
the updates repository. :(

Comment 11 Herbert Xu 2006-09-28 08:54:58 UTC
You need to make sure that you've upgraded the xen package as well as the
kernel.  If the problem persists, please file a new bug.  Thanks.

Comment 12 Herbert Xu 2006-10-13 11:42:31 UTC
*** Bug 209910 has been marked as a duplicate of this bug. ***

Comment 13 Dave Jones 2006-10-17 00:05:13 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 14 Alexey Bozrikov 2006-10-17 06:45:26 UTC
Kernel 2.6.18-1.2200.fc5.ppc64 seems to fix the problem. Connections initiated 
through iptables NAT do not crash kernel anymore.

Alexey
bozy

Comment 15 Peter Collinson 2006-10-17 17:45:26 UTC
I can confirm that  the new kernel 2.6.18-1.2200.fc5 stayed up today on my machine, so the bug I 
reported appears to be fixed.

Many thanks for all your efforts.