Bug 432604 - Crash due to GSO bits propagating into RESET packet
Crash due to GSO bits propagating into RESET packet
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
low Severity low
: rc
: ---
Assigned To: Herbert Xu
Martin Jenner
Depends On:
  Show dependency treegraph
Reported: 2008-02-13 05:24 EST by Ian Campbell
Modified: 2008-02-14 09:35 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-02-14 09:35:05 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Ian Campbell 2008-02-13 05:24:08 EST
Description of problem:

We are having customers report a crash in tcp_tso_segment which I believe is the
problem described in this thread:
http://www.mail-archive.com/netdev@vger.kernel.org/msg31538.html and fixed by
upstream commit 

BUG: unable to handle kernel NULL pointer dereference at virtual address 0000001c
printing eip:
00e19000 -> *pde = 00000003:0a1e0027
00e1d000 -> *pme = 00000003:0a22e067
00dcf000 -> *pte = 00000000:00000000
Oops: 0000 1
SMPlast sysfs file: /class/net/eth2/address
Modules linked in: xfrm4_mode_tunnel(U) esp4(U) iptable_raw(U) xt_comment(U)
xt_policy(U) ipt_ULOG(U) ipt_TTL(U) ipt_ttl(U) ipt_TOS(U) ipt_tos(U)
ipt_TCPMSS(U) ipt_SAME(U) ipt_REJECT(U) ipt_REDIRECT(U) ipt_recent(U)
ipt_owner(U) ipt_NETMAP(U) ipt_MASQUERADE(U) ipt_LOG(U) ipt_iprange(U)
ipt_hashlimit(U) ipt_ECN(U) ipt_ecn(U) ipt_DSCP(U) ipt_dscp(U) ipt_CLUSTERIP(U)
ipt_ah(U) ipt_addrtype(U) ip_nat_tftp(U) ip_nat_snmp_basic(U) ip_nat_sip(U)
ip_nat_pptp(U) ip_nat_irc(U) ip_nat_h323(U) ip_nat_ftp(U) ip_nat_amanda(U)
ip_conntrack_tftp(U) ip_conntrack_sip(U) ip_conntrack_pptp(U)
ip_conntrack_netbios_ns(U) ip_conntrack_irc(U) ip_conntrack_h323(U)
ip_conntrack_ftp(U) ts_kmp(U) ip_conntrack_amanda(U) xt_tcpmss(U) xt_pkttype(U)
xt_physdev(U) bridge(U) xt_NFQUEUE(U) xt_multiport(U) xt_MARK(U) xt_mark(U)
xt_mac(U) xt_limit(U) xt_length(U) xt_helper(U) xt_dccp(U) xt_conntrack(U)
xt_CONNMARK(U) xt_connmark(U) xt_CLASSIFY(U) xt_tcpudp(U) xt_state(U)
iptable_nat(U) ip_nat(U) ip_conntrack(U) iptable_mangle(U) nfnetlink(U)
iptable_filter(U) ip_tables(U) x_tables(U) deflate(U) zlib_deflate(U) twofish(U)
serpent(U) aes(U) blowfish(U) des(U) sha256(U) md5(U) crypto_null(U) af_key(U)
dm_multipath(U) dm_mod(U) xennet(U) ext3(U) jbd(U) xenblk(U)
CPU: 0
EIP: 0061:[<c05bfc58>] Not tainted VLI
EFLAGS: 00010202 (2.6.18-8.1.8.el5.xs4.0.1.0xen #1)
EIP is at tcp_tso_segment+0x1a0/0x22c
eax: 0000b2f9 ebx: 00000000 ecx: 00000000 edx: 0f260000
esi: c2922054 edi: 22ea63ef ebp: 00000014 esp: c06fbe58
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, ti=c06fb000 task=c0653940 task.ti=c06c7000)
Stack: 00000000 c2928780 0000fa7f 0000056c ffff0000 c2cf0e00 0000501e 00000000
00000000 c05d79ed c2cf0e00 c0682c20 c079dac0 00000000 c059c0e8 c2cf0e00
c3d49d00 c2cf0e30 00000000 c05e3dd1 c2cf0e00 c3d49d00 c2cf0e30 00000000
Call Trace:
[<c05d79ed>] inet_gso_segment+0xed/0x181
[<c059c0e8>] skb_gso_segment+0xfb/0x132
[<c05e3dd1>] xfrm4_output_finish+0x45/0x95
[<c05e3e6f>] xfrm4_output+0x4e/0x53
[<c05b7c5b>] ip_forward+0x1d9/0x22e
[<c05b6a95>] ip_rcv+0x3ef/0x429
[<c059c509>] netif_receive_skb+0x2dd/0x355
[<c501fd01>] netif_poll+0x8e5/0xa52 [xennet]
[<c059df0f>] net_rx_action+0x96/0x185
[<c041ffd3>] __do_softirq+0x5e/0xc3
[<c040679c>] do_softirq+0x56/0xae
[<c040673d>] do_IRQ+0xa5/0xae
[<c053a155>] evtchn_do_upcall+0x64/0x9b
[<c0404ec5>] hypervisor_callback+0x3d/0x48
[<c0407fd1>] raw_safe_halt+0x8c/0xaf
[<c0402bca>] xen_idle+0x22/0x2e
[<c0402ce9>] cpu_idle+0x91/0xab
[<c06cc799>] start_kernel+0x381/0x388
Code: 6c 55 ff 73 1c e8 49 74 f1 ff 89 c2 66 31 c0 c1 e2 10 83 c4 0c 01 d0 15 ff
ff 00 00 f7 d0 c1 e8 10 66 89 46 10 8b 1b 03 7c 24 0c <8b> 73 1c 89 f8 0f c8 80
66 0d 7f 89 46 04 83 3b 00 75 90 0f b7
EIP: [<c05bfc58>] tcp_tso_segment+0x1a0/0x22c SS:ESP 0069:c06fbe58
<0>Kernel panic - not syncing: Fatal exception in interrupt

Version-Release number of selected component (if applicable):

Problem was reported to us against 2.6.18-8.1.8.el5xen but I think it still
persists in -53.1.13.el5. However I'm not concinced this is a Xen specific
problem, altough Xen might be the main way you might actually run into it.

How reproducible:

I'm still trying to get a precise reproduction scenario from the customers. It
appears to involve particular firewall settings.

Additional info:

I think the problem may also effect RHEL4 where the fix would
Comment 1 Ian Campbell 2008-02-13 05:30:18 EST
A link to the root of the ML thread might be handy:
Comment 2 Herbert Xu 2008-02-14 06:24:50 EST
The back trace has nothing to do with the thread you've quoted since it doesn't
contain send_reset in the call chain.  The back trace instead shows a bogus
packet received from xen-front.  Somehow your xen-front driver is emitting a
packet that has NULL in skb->h which is causing this crash.  So I suggest you
double-check that the xen-front driver you're using is identical to what's
currently in RHEL5.

Comment 3 Ian Campbell 2008-02-14 09:06:03 EST
Ah, you are quite right. Shame :-(

The kernel is 2.6.18-8.1.8.el5xen kernel with the patches from bugs #234375,
#158657, #247265, #248515, #234375, #234375 and #251905 applied. Of those only
#234375 and #25190 are relevant to netfront but don't appear to be related to
this particular issue.

We are receiving having the same reports on the Debian kernel which we ship,
which is basically the linux-2.6.18-xen.hg tree so I think this isn't a Red Hat
specific issue. I will continue to investigate.

Note You need to log in before you can comment on or make changes to this bug.