1047693 – [abrt] WARNING: CPU: 0 PID: 3630 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda()

Bug 1047693 - [abrt] WARNING: CPU: 0 PID: 3630 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda()

Summary: [abrt] WARNING: CPU: 0 PID: 3630 at net/core/dev.c:2218 skb_warn_bad_offload+...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Rashid Khan
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://retrace.fedoraproject.org/faf...
Whiteboard:	abrt_hash:eb22252d112155a7c3ec5fed53c...
Depends On:
Blocks:	1050742
TreeView+	depends on / blocked

Reported:	2014-01-01 21:14 UTC by Tony
Modified:	2016-08-22 09:21 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-12-10 15:00:53 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File: dmesg (72.18 KB, text/plain) 2014-01-01 21:14 UTC, Tony	no flags	Details
View All

Description Tony 2014-01-01 21:14:25 UTC

Description of problem:
This is a server (Intel DQ77KB) that I've been using to run a few VMs (FreeBSD, Fedora, an XP VM) using KVM and OpenVSwitch. Since moving to Fedora 20, esp some updates that I installed in the past hour, this machine has become unstable.

Additional info:
reporter:       libreport-2.1.10
WARNING: CPU: 0 PID: 3630 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda()
: caps=(0x00000008801948c9, 0x0000000000000000) len=1898 data_len=1832 gso_size=1448 gso_type=5 ip_summed=0
Modules linked in: vhost_net vhost macvtap macvlan tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE bnep bluetooth ip6t_REJECT xt_conntrack cfg80211 rfkill openvswitch vxlan ip_tunnel gre libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev hid_logitech_dj snd_hda_intel microcode snd_hda_codec snd_hwdep i2c_i801 snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer lpc_ich mfd_core snd soundcore shpchp e1000e ptp pps_core mei_me mei tpm_tis tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd sunrpc raid1 i915 i2c_algo_bit drm_kms_helper drm i2c_core video
CPU: 0 PID: 3630 Comm: vhost-3613 Not tainted 3.12.5-302.fc20.x86_64 #1
Hardware name:                  /DQ77KB, BIOS KBQ7710H.86A.0052.2013.0708.1336 07/08/2013
 0000000000000009 ffff88041e203a10 ffffffff81662d11 ffff88041e203a58
 ffff88041e203a48 ffffffff810691dd ffff8803f39d6c00 ffff8803cf1b8000
 0000000000000005 0000000000000000 ffff8803f39d6c00 ffff88041e203aa8
Call Trace:
 <IRQ>  [<ffffffff81662d11>] dump_stack+0x45/0x56
 [<ffffffff810691dd>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8106924c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff81308743>] ? ___ratelimit+0x93/0x100
 [<ffffffff816652a2>] skb_warn_bad_offload+0xcd/0xda
 [<ffffffff81566d01>] __skb_gso_segment+0x71/0xc0
 [<ffffffff8156700a>] dev_hard_start_xmit+0x18a/0x570
 [<ffffffff815859f0>] sch_direct_xmit+0xe0/0x1c0
 [<ffffffff815675e9>] dev_queue_xmit+0x1f9/0x4a0
 [<ffffffffa0507ecb>] netdev_send+0x4b/0xc0 [openvswitch]
 [<ffffffffa05033d2>] ? ovs_masked_flow_lookup+0x122/0x260 [openvswitch]
 [<ffffffffa050783d>] ovs_vport_send+0x1d/0x80 [openvswitch]
 [<ffffffffa04fe16a>] do_output+0x2a/0x50 [openvswitch]
 [<ffffffffa04fe613>] do_execute_actions+0x2e3/0xb20 [openvswitch]
 [<ffffffff810a43d2>] ? enqueue_task_fair+0x412/0x660
 [<ffffffffa04fee7b>] ovs_execute_actions+0x2b/0x30 [openvswitch]
 [<ffffffffa05022e8>] ovs_dp_process_received_packet+0x88/0x100 [openvswitch]
 [<ffffffff8109ab37>] ? try_to_wake_up+0xe7/0x290
 [<ffffffffa05077aa>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa0508211>] netdev_frame_hook+0xc1/0x120 [openvswitch]
 [<ffffffff81565072>] __netif_receive_skb_core+0x252/0x820
 [<ffffffff81565658>] __netif_receive_skb+0x18/0x60
 [<ffffffff8156617e>] process_backlog+0xae/0x180
 [<ffffffff81565a49>] net_rx_action+0x149/0x240
 [<ffffffff8106e747>] __do_softirq+0xf7/0x240
 [<ffffffff8167361c>] call_softirq+0x1c/0x30
 <EOI>  [<ffffffff810146a5>] do_softirq+0x55/0x90
 [<ffffffff81564d58>] netif_rx_ni+0x28/0x30
 [<ffffffffa06236e1>] tun_get_user+0x401/0x820 [tun]
 [<ffffffffa0623b5a>] tun_sendmsg+0x5a/0x80 [tun]
 [<ffffffffa063dc9c>] handle_tx+0x1bc/0x530 [vhost_net]
 [<ffffffffa063e045>] handle_tx_kick+0x15/0x20 [vhost_net]
 [<ffffffffa062bdb2>] vhost_worker+0xf2/0x190 [vhost]
 [<ffffffffa062bcc0>] ? vhost_dev_reset_owner+0x30/0x30 [vhost]
 [<ffffffff8108b0d0>] kthread+0xc0/0xd0
 [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40
 [<ffffffff81671cbc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40

Comment 1 Tony 2014-01-01 21:14:31 UTC

Created attachment 844264 [details]
File: dmesg

Comment 2 Michele Baldessari 2014-01-02 09:09:06 UTC

This was claimed to be fixed with 3.10:
http://openvswitch.org/pipermail/discuss/2013-May/009977.html

Tony, 

can you get us a bit more details about the networking set up?
- ip a
- ethtool -k <interface>

Where <interface> is each interface that is involved here. Also 'cat /proc/net/bonding/*' if you are using bonding.

Thanks,
Michele

Comment 3 Tony 2014-01-02 13:52:24 UTC

Sure... There are currently four VMs running, so four OVS vnet? interfaces. The two physical interfaces are em1: VM Trunk, em2: Management.

[root@muscaria ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether e8:40:f2:e3:1c:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.1.4.30/24 brd 10.1.4.255 scope global dynamic em2
       valid_lft 5998sec preferred_lft 5998sec
    inet6 2607:f2c0:f00e:8f0a:ea40:f2ff:fee3:1cb5/128 scope global dynamic 
       valid_lft 86385sec preferred_lft 86385sec
    inet6 fe80::ea40:f2ff:fee3:1cb5/64 scope link 
       valid_lft forever preferred_lft forever
3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000
    link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ea40:f2ff:fee3:1cb6/64 scope link 
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether 76:5b:1f:56:aa:38 brd ff:ff:ff:ff:ff:ff
5: ovs_DMZbr0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7423:24ff:fe73:ac44/64 scope link 
       valid_lft forever preferred_lft forever
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:c4:dc:2c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fec4:dc2c/64 scope link 
       valid_lft forever preferred_lft forever
7: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:e1:ca:15 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fee1:ca15/64 scope link 
       valid_lft forever preferred_lft forever
14: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:b6:3a:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:feb6:3aee/64 scope link 
       valid_lft forever preferred_lft forever
16: vnet3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:6e:6a:63 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe6e:6a63/64 scope link 
       valid_lft forever preferred_lft forever


[root@muscaria ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether e8:40:f2:e3:1c:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.1.4.30/24 brd 10.1.4.255 scope global dynamic em2
       valid_lft 5998sec preferred_lft 5998sec
    inet6 2607:f2c0:f00e:8f0a:ea40:f2ff:fee3:1cb5/128 scope global dynamic 
       valid_lft 86385sec preferred_lft 86385sec
    inet6 fe80::ea40:f2ff:fee3:1cb5/64 scope link 
       valid_lft forever preferred_lft forever
3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000
    link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ea40:f2ff:fee3:1cb6/64 scope link 
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether 76:5b:1f:56:aa:38 brd ff:ff:ff:ff:ff:ff
5: ovs_DMZbr0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7423:24ff:fe73:ac44/64 scope link 
       valid_lft forever preferred_lft forever
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:c4:dc:2c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fec4:dc2c/64 scope link 
       valid_lft forever preferred_lft forever
7: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:e1:ca:15 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fee1:ca15/64 scope link 
       valid_lft forever preferred_lft forever
14: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:b6:3a:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:feb6:3aee/64 scope link 
       valid_lft forever preferred_lft forever
16: vnet3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500
    link/ether fe:54:00:6e:6a:63 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe6e:6a63/64 scope link 
       valid_lft forever preferred_lft forever

Comment 4 Michele Baldessari 2014-01-02 14:31:38 UTC

Thanks Tony,

can you also get me the output of:
ethtool -i em1
ethtool -i em2
ethtool -k em1
ethtool -k em2

With that I should have enough to raise it upstream.

thanks,
Michele

Comment 5 Tony 2014-01-02 16:19:19 UTC

[root@muscaria ~]# ethtool -i em1
driver: e1000e
version: 2.3.2-k
firmware-version: 2.1-3
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
[root@muscaria ~]# ethtool -i em2
driver: e1000e
version: 2.3.2-k
firmware-version: 0.13-4
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no



[root@muscaria ~]# ethtool -k em1
Features for em1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
[root@muscaria ~]# ethtool -k em2
Features for em2:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]

Comment 6 Tony 2014-01-02 17:09:06 UTC

Thank You.

Comment 7 Michele Baldessari 2014-01-02 17:40:10 UTC

Hi Tony,

how quickly can you reproduce this warning? I ask because there might be
one patch worth trying before taking it on netdev/ovs-dev. Namely:
commit 3cdb35b074142c915a463c535839886ae08fdfd4
Author: Pravin B Shelar <pshelar>
Date:   Fri Oct 25 15:12:33 2013 -0700

    openvswitch: Enable all GSO features on internal port.
    
    OVS already can handle all types of segmentation offloads that
    are supported by the kernel.
    Following patch specifically enables UDP and IPV6 segmentation
    offloads.
    
    Signed-off-by: Pravin B Shelar <pshelar>
    Signed-off-by: Jesse Gross <jesse>


If you can reproduce it quickly it is worth trying a kernel with the above patch.
If not I'll ping upstream.

thanks,
Michele

Comment 8 Tony 2014-01-03 00:03:38 UTC

I can see if that helps, but likely not until the weekend.

Comment 9 Tony 2014-01-12 20:37:13 UTC

...THIS weekend. So, with the patch applied, the errors continue. The most recent:

WARNING: CPU: 3 PID: 3192 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda()

Comment 10 Tony 2014-01-20 20:02:15 UTC

Michele, Is there anything else to try? Any more offloads types to enable?

Comment 11 Michele Baldessari 2014-01-20 21:01:49 UTC

Hi Tony,

(catching up on my backlog)  How quickly can you reproduce this issue?
Does it also happen on 3.13? (http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/x86_64/)

thanks,
Michele

Comment 12 Tony 2014-01-21 03:09:57 UTC

Yup :( A few since booting with 3.13 already.

Comment 13 Alexander Panov 2014-01-21 10:49:44 UTC

Hello,

I'm experiencing the same problem:
an 21 05:29:32 host47 kernel: [13775187.131611] WARNING: at net/core/dev.c:1919 skb_warn_bad_offload+0xc2/0xcf()
Jan 21 05:29:33 host47 kernel: [13775187.131619] Hardware name: AS -2042G-6RF
Jan 21 05:29:34 host47 kernel: [13775187.131625] : caps=(0x0000000000115829, 0x0000000000000000) len=7161 data_len=7109 gso_size=1448 gso_type=5 ip_summed=0
Jan 21 05:29:35 host47 kernel: [13775187.131627] Modules linked in: iptable_nat raid1 iptable_mangle ebt_arp ebtable_nat ebt_limit ebt_ip ebtable_filter ebtables ipt_MASQUERADE nf_nat xt_conntrack lockd sunrpc bridge 8021q garp stp llc bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ip6t_REJECT ib_core nf_conntrack_ipv6 nf_defrag_ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter nf_conntrack_ipv4 ip6_tables nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM binfmt_misc raid10 microcode serio_raw sp5100_tco amd64_edac_mod edac_core i2c_piix4 fam15h_power k10temp edac_mce_amd i2c_core vhost_net tun macvtap macvlan ixgbe kvm_amd kvm mdio igb ptp pps_core dca raid0 crc32c_intel ghash_clmulni_intel usb_storage mpt2sas raid_class scsi_transport_sas [last unloaded: iptable_mangle]
Jan 21 05:29:36 host47 kernel: [13775187.131711] Pid: 26764, comm: vhost-26741 Tainted: G        W    3.6.8-2.cs.fc17.x86_64 #1
Jan 21 05:29:37 host47 kernel: [13775187.131845] Call Trace:
Jan 21 05:29:38 host47 kernel: [13775187.131847]  <IRQ>  [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
Jan 21 05:29:39 host47 kernel: [13775187.131874]  [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
Jan 21 05:29:40 host47 kernel: [13775187.131884]  [<ffffffff8161bc6d>] skb_warn_bad_offload+0xc2/0xcf
Jan 21 05:29:41 host47 kernel: [13775187.131892]  [<ffffffff8150f720>] skb_gso_segment+0x220/0x290
Jan 21 05:29:42 host47 kernel: [13775187.131900]  [<ffffffff81512949>] dev_hard_start_xmit+0x239/0x690
Jan 21 05:29:43 host47 kernel: [13775187.131910]  [<ffffffff8151312f>] dev_queue_xmit+0x38f/0x610
Jan 21 05:29:44 host47 kernel: [13775187.131924]  [<ffffffffa038dd7f>] br_dev_queue_push_xmit+0x7f/0xd0 [bridge]
Jan 21 05:29:45 host47 kernel: [13775187.131935]  [<ffffffffa038e052>] br_forward_finish+0x22/0x60 [bridge]
Jan 21 05:29:46 host47 kernel: [13775187.131945]  [<ffffffffa038e0ed>] __br_forward+0x5d/0xb0 [bridge]
Jan 21 05:29:47 host47 kernel: [13775187.131955]  [<ffffffffa038e2fd>] br_forward+0x5d/0x70 [bridge]
Jan 21 05:29:48 host47 kernel: [13775187.131964]  [<ffffffffa038f036>] br_handle_frame_finish+0x1f6/0x290 [bridge]
Jan 21 05:29:49 host47 kernel: [13775187.131975]  [<ffffffffa038f246>] br_handle_frame+0x176/0x260 [bridge]
Jan 21 05:29:50 host47 kernel: [13775187.131984]  [<ffffffff81510cc6>] __netif_receive_skb+0x226/0x8a0
Jan 21 05:29:51 host47 kernel: [13775187.131993]  [<ffffffff8106d33e>] ? run_timer_softirq+0x3e/0x350
Jan 21 05:29:52 host47 kernel: [13775187.132000]  [<ffffffff815113f2>] process_backlog+0xb2/0x180
Jan 21 05:29:53 host47 kernel: [13775187.132007]  [<ffffffff81511f89>] net_rx_action+0x149/0x230
Jan 21 05:29:54 host47 kernel: [13775187.132015]  [<ffffffff810654c0>] __do_softirq+0xd0/0x210
Jan 21 05:30:25 host47 kernel: [13775199.706346]  [<ffffffffa00cfee5>] handle_tx_kick+0x15/0x20 [vhost_net]
Jan 21 05:30:26 host47 kernel: [13775199.706357]  [<ffffffffa00cc83d>] vhost_worker+0xed/0x190 [vhost_net]
Jan 21 05:30:27 host47 kernel: [13775199.706368]  [<ffffffffa00cc750>] ? memory_access_ok.isra.11+0xd0/0xd0 [vhost_net]
Jan 21 05:30:28 host47 kernel: [13775199.706375]  [<ffffffff8107fde3>] kthread+0x93/0xa0
Jan 21 05:30:29 host47 kernel: [13775199.706383]  [<ffffffff81627f04>] kernel_thread_helper+0x4/0x10
Jan 21 05:30:30 host47 kernel: [13775199.706390]  [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70
Jan 21 05:30:31 host47 kernel: [13775199.706406]  [<ffffffff81627f00>] ? gs_change+0x13/0x13
Jan 21 05:30:32 host47 kernel: [13775199.706408] ---[ end trace f788b6ed554f1608 ]---
Jan 21 05:30:33 host47 kernel: [13775199.709236] ------------[ cut here ]------------



I have several servers that experience this issue from quite some time now.

The machines are Supermicro H8QG6 whith

3.6.8-2.fc17.x86_64 #1 SMP Tue Jun 25 20:49:58 EEST 2013 x86_64 x86_64 x86_64 GNU/Linux

There are two 10g interfaces (p1p1 and p1p2: 03:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
) that are in lacp bonding (bond0), which has vlan (bond0.10), which has bridge interface (br10).

I've tried set to off every offload option on all of but with no success:
ethtool -k br10
Offload parameters for br10:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
.

Comment 14 Tony 2014-01-23 12:13:55 UTC

Yes, still happening with 3.13.

Comment 15 Tony 2014-01-25 21:40:46 UTC

https://retrace.fedoraproject.org/faf/reports/322143/

Reports that made it through to the abrt server.

Comment 16 Tony 2014-02-17 23:51:28 UTC

Still happening w/ 3.14.0-0.rc2.git3.2.fc21.x86_64 #1 SMP Thu Feb 13 19:01:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 Tony 2014-03-10 22:07:23 UTC

How to proceed? Can anything be done?

Comment 18 Michele Baldessari 2014-04-01 20:19:24 UTC

Hi Tony,

sorry for the delay, I've raised the issue upstream. I'll ping you if more infos
are needed.

regards,
Michele

Comment 19 Justin M. Forbes 2014-05-21 19:39:25 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.14.4-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 20 Reilly Hall 2014-06-16 14:39:39 UTC

Description of problem:
I started up a VM under libvirt/KVM/qemu and I just started accumulating this kernel oopses.  Same VMs I've been running for a while.  This is the first time I booted them up under this kernel however.  They were running fine with 3.14.4 unsure anout 3.14.5 or 3.14.6 as I am not completely sure when was the last time I booted them up.

The VMs appear to be running fine and so does the host machine, but I cannot be certain.  Thankfully these are my personal VMs and none of this is production affecting.

Version-Release number of selected component:
kernel

Additional info:
reporter:       libreport-2.2.2
cmdline:        BOOT_IMAGE=/vmlinuz-3.14.7-200.fc20.x86_64 root=UUID=1480ee94-2b82-4f59-8789-d071313a416c ro vconsole.font=latarcyrheb-sun16 rhgb quiet clocksource=hpet elevator=deadline LANG=en_US.UTF-8
kernel:         3.14.7-200.fc20.x86_64
runlevel:       N 5
type:           Kerneloops

Truncated backtrace:
WARNING: CPU: 1 PID: 4392 at net/core/dev.c:2238 skb_warn_bad_offload+0xcd/0xda()
r8169: caps=(0x0000000100004180, 0x0000000000000000) len=15165 data_len=15099 gso_size=1448 gso_type=5 ip_summed=1
Modules linked in: vhost_net vhost macvtap macvlan tcp_diag inet_diag ipt_MASQUERADE xt_CHECKSUM tun iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_rpfilter ip6t_REJECT xt_conntrack bnep bluetooth 6lowpan_iphc cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw it87 hwmon_vid iTCO_wdt iTCO_vendor_support ppdev vfat fat x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel microcode snd_hda_codec i2c_i801 snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core mei_me mei snd_timer snd shpchp soundcore parport_pc parport nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper drm r8169 mii i2c_core video
CPU: 1 PID: 4392 Comm: qemu-system-x86 Not tainted 3.14.7-200.fc20.x86_64 #1
Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.01 02/21/2014
 0000000000000000 000000004560ae5c ffff88081fa43838 ffffffff816f04b2
 ffff88081fa43880 ffff88081fa43870 ffffffff8108a1cd ffff88078c34cc00
 ffff8807f2b42000 0000000000000005 0000000000000001 ffff88078c34cc00
Call Trace:
 <IRQ>  [<ffffffff816f04b2>] dump_stack+0x45/0x56
 [<ffffffff8108a1cd>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8108a24c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff8135a6b3>] ? ___ratelimit+0x93/0x100
 [<ffffffff816f2baf>] skb_warn_bad_offload+0xcd/0xda
 [<ffffffff815e5c99>] __skb_gso_segment+0x79/0xc0
 [<ffffffff815e5fca>] dev_hard_start_xmit+0x18a/0x5d0
 [<ffffffff816076d0>] sch_direct_xmit+0xe0/0x1c0
 [<ffffffff815e6611>] __dev_queue_xmit+0x201/0x4c0
 [<ffffffff815e68e0>] dev_queue_xmit+0x10/0x20
 [<ffffffff81624c99>] ip_finish_output+0x339/0x440
 [<ffffffff81626148>] ip_output+0x58/0x90
 [<ffffffff81621edb>] ip_forward_finish+0x8b/0x1c0
 [<ffffffff81622352>] ip_forward+0x342/0x440
 [<ffffffff8161ffad>] ip_rcv_finish+0x7d/0x350
 [<ffffffff816208f8>] ip_rcv+0x298/0x3d0
 [<ffffffff815e47d6>] __netif_receive_skb_core+0x646/0x830
 [<ffffffff81616faa>] ? nf_iterate+0xaa/0xc0
 [<ffffffff815e49d8>] __netif_receive_skb+0x18/0x60
 [<ffffffff815e4a60>] netif_receive_skb_internal+0x40/0xc0
 [<ffffffff815e4afc>] netif_receive_skb+0x1c/0x70
 [<ffffffffa0497933>] br_handle_frame_finish+0x1f3/0x3f0 [bridge]
 [<ffffffffa049ed6d>] br_nf_pre_routing_finish+0x1bd/0x3d0 [bridge]
 [<ffffffffa049f21c>] br_nf_pre_routing+0x29c/0x660 [bridge]
 [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge]
 [<ffffffff81616faa>] nf_iterate+0xaa/0xc0
 [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge]
 [<ffffffff81617044>] nf_hook_slow+0x84/0x140
 [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge]
 [<ffffffffa0497ca8>] br_handle_frame+0x178/0x230 [bridge]
 [<ffffffff815e4402>] __netif_receive_skb_core+0x272/0x830
 [<ffffffff8109c931>] ? send_sigqueue+0x101/0x1e0
 [<ffffffff815e49d8>] __netif_receive_skb+0x18/0x60
 [<ffffffff815e569e>] process_backlog+0xae/0x180
 [<ffffffff815e4e79>] net_rx_action+0x149/0x240
 [<ffffffff8108f8c5>] __do_softirq+0xf5/0x2a0
 [<ffffffff8170219c>] do_softirq_own_stack+0x1c/0x30
 <EOI>  [<ffffffff8108fb15>] do_softirq+0x55/0x60
 [<ffffffff815e4094>] netif_rx_ni+0x34/0x70
 [<ffffffffa04eb864>] tun_get_user+0x424/0x890 [tun]
 [<ffffffffa04ebdcb>] tun_chr_aio_write+0x7b/0xa0 [tun]
 [<ffffffff811e93d9>] do_sync_readv_writev+0x59/0xa0
 [<ffffffff811ea933>] do_readv_writev+0xc3/0x240
 [<ffffffff811eab30>] vfs_writev+0x30/0x60
 [<ffffffff811eaca9>] SyS_writev+0x59/0xf0
 [<ffffffff81700869>] system_call_fastpath+0x16/0x1b

Comment 22 Alexey Ivanov 2014-07-08 23:34:52 UTC

I have similar problem in RHEL6 2.6.32-431.11.2.el6.x86_64 with bonding and ixgbe drivers:

------------[ cut here ]------------
WARNING: at net/core/dev.c:1907 skb_warn_bad_offload+0xc2/0xf0() (Not tainted)
Hardware name: UCSC-C220-M3L
bonding: caps=(0xf1d3a5, 0x0) len=1618 data_len=1552 ip_summed=1
Modules linked in: ip_vs_rr ip_vs libcrc32c autofs4 acpi_cpufreq freq_table mperf bonding 8021q garp stp llc iptable_filter iptable_mangle ip_tables ip6table_filter xt_MARK xt_multiport ip6table_mangle ip6_tables ipv6 iTCO_wdt iTCO_vendor_support microcode ipmi_devintf power_meter sb_edac edac_core lpc_ich mfd_core i2c_i801 sg ixgbe mdio igb dca i2c_algo_bit i2c_core ptp pps_core ext4 jbd2 mbcache raid1 sd_mod crc_t10dif isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff8145b442>] ? skb_warn_bad_offload+0xc2/0xf0
 [<ffffffff81460531>] ? __skb_gso_segment+0x71/0xc0
 [<ffffffff81460593>] ? skb_gso_segment+0x13/0x20
 [<ffffffff8146063b>] ? dev_hard_start_xmit+0x9b/0x480
 [<ffffffffa0192093>] ? ipt_post_routing_hook+0x23/0x30 [iptable_mangle]
 [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81460c5d>] ? dev_queue_xmit+0x1bd/0x320
 [<ffffffff8149a6d8>] ? ip_finish_output+0x148/0x310
 [<ffffffffa02ab8c0>] ? dst_output+0x0/0x20 [ip_vs]
 [<ffffffff8149a958>] ? ip_output+0xb8/0xc0
 [<ffffffffa02ac91a>] ? ip_vs_dr_xmit+0x17a/0x1b0 [ip_vs]
 [<ffffffffa02a5e82>] ? ip_vs_in+0x202/0x3b0 [ip_vs]
 [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0
 [<ffffffff814946e0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81489aa6>] ? nf_hook_slow+0x76/0x120
 [<ffffffff814946e0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81494a0a>] ? ip_local_deliver+0x5a/0xa0
 [<ffffffff81493f0d>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81494495>] ? ip_rcv+0x275/0x350
 [<ffffffff8145b9bb>] ? __netif_receive_skb+0x4ab/0x750
 [<ffffffff8145f628>] ? netif_receive_skb+0x58/0x60
 [<ffffffff8145f730>] ? napi_skb_finish+0x50/0x70
 [<ffffffff81460e99>] ? napi_gro_receive+0x39/0x50
 [<ffffffffa019e6bf>] ? ixgbe_poll+0x54f/0x12c0 [ixgbe]
 [<ffffffff8106621c>] ? rebalance_domains+0x3cc/0x5a0
 [<ffffffff81094fbd>] ? insert_work+0x6d/0xb0
 [<ffffffff81460fb3>] ? net_rx_action+0x103/0x2f0
 [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0
 [<ffffffff810e6eb0>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
 [<ffffffff8107a795>] ? irq_exit+0x85/0x90
 [<ffffffff81531605>] ? do_IRQ+0x75/0xf0
 [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff812e0bee>] ? intel_idle+0xde/0x170
 [<ffffffff812e0bd1>] ? intel_idle+0xc1/0x170
 [<ffffffff81426b67>] ? cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
 [<ffffffff8152143c>] ? start_secondary+0x2ac/0x2ef
---[ end trace 825299608e8eaa5b ]---

Current workaround is to turn off GSO along with all other hw optimizations on eth* interfaces.

Comment 23 John Kinsella 2014-07-24 01:34:13 UTC

Running into this as well, subscribing.

Comment 24 Tony 2014-07-24 11:36:24 UTC

This is still happening. Here are some details:

[root@muscaria oops-2014-07-10-09:51:34-20624-3]# cat reason 
WARNING: CPU: 0 PID: 1890 at net/core/dev.c:2233 skb_warn_bad_offload+0xcd/0xda()[root@muscaria oops-2014-07-10-09:51:34-20624-3]# uname -a
Linux muscaria 3.16.0-0.rc4.git2.2.fc22.x86_64 #1 SMP Wed Jul 9 22:19:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@muscaria oops-2014-07-10-09:51:34-20624-3]# ls -la .. | grep oops | wc -l
20
[root@muscaria oops-2014-07-10-09:51:34-20624-3]#

Comment 25 Justin M. Forbes 2014-12-10 15:00:53 UTC

This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 26 Rashid Khan 2015-02-23 14:45:11 UTC

I am not sure what information I can provide about Comment 24 above. 
If there is something needed from me please do ask it more explicitly.
Thanks
Rashid

Comment 27 Matyas Koszik 2015-07-08 14:20:25 UTC

I've just noticed this bug when starting up a FreeBSD guest, so I did some digging, and found out that this is actually a bug in FreeBSD's virtio implementation: http://www.spinics.net/lists/netdev/msg293976.html

Comment 28 higkoo 2016-08-22 08:58:48 UTC

I have meet this error when using LVS.

The machine is both LVS.Master and LVS.Realserver, If I exec :
ethtool -K bond0 gro off gso off
ethtool -K eth0 tso off gro off gso off
ethtool -K eth1 tso off gro off gso off

Then Carsh:

Linux version 3.16.0-4-amd64 (debian-kernel.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02)

Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=UUID=6e817684-b791-4099-aa24-6e45f2f58997 ro net.ifnames=0 thash_entries=1048576 rhash_entries=1048576 biosdevname=0 nohz=off enforcing=0 ipv6.disable_ipv6=1 nmi_watchdog=0 selinux=0 transparent_hugepage=never cgroup_enable=memory swapaccount=1 vga=771

Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
DMI: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015

WARNING: CPU: 9 PID: 0 at /build/linux-7z1rSb/linux-3.16.7-ckt25/net/core/dev.c:2247 skb_warn_bad_offload+0xc6/0xd1()
ixgbe: caps=(0x00000806602083b3, 0x0000000000000000) len=1494 data_len=1440 gso_size=1440 gso_type=1 ip_summed=1
Modules linked in: binfmt_misc xt_multiport iptable_filter ip_tables x_tables dell_rbu ip_vs_wrr ip_vs nf_conntrack crc32c_generic bonding x86_pkg_temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul ttm drm_kms_helper aesni_intel aes_x86_64 lrw drm gf128mul glue_helper i2c_algo_bit iTCO_wdt ablk_helper i2c_core cryptd iTCO_vendor_support dcdbas evdev pcspkr wmi acpi_power_meter lpc_ich shpchp mei_me processor mfd_core mei thermal_sys button ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 xfs libcrc32c sr_mod cdrom sg sd_mod crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul ehci_pci tg3 ixgbe ehci_hcd megaraid_sas crct10dif_common libata dca ptp crc32c_intel usbcore pps_core scsi_mod libphy usb_common mdio
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W     3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3
Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015
 0000000000000000 ffffffff8150e08f ffff88087e523a50 0000000000000009
 ffffffff81067777 ffff8807b1441500 ffff88087e523aa0 0000000000000001
 0000000000000001 ffff8807b1441500 ffffffff810677dc ffffffff817774d0
Call Trace:
 <IRQ>  [<ffffffff8150e08f>] ? dump_stack+0x5d/0x78
 [<ffffffff81067777>] ? warn_slowpath_common+0x77/0x90
 [<ffffffff810677dc>] ? warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8150f7be>] ? skb_warn_bad_offload+0xc6/0xd1
 [<ffffffff81420241>] ? __skb_gso_segment+0x71/0xc0
 [<ffffffff8142058a>] ? dev_hard_start_xmit+0x16a/0x560
 [<ffffffff81417b00>] ? __skb_tx_hash+0x100/0x100
 [<ffffffff81440b39>] ? sch_direct_xmit+0xc9/0x1a0
 [<ffffffff81420b74>] ? __dev_queue_xmit+0x1f4/0x4c0
 [<ffffffffa028e608>] ? bond_xmit_slave_id+0x88/0x120 [bonding]
 [<ffffffffa02924ab>] ? bond_start_xmit+0x15b/0x420 [bonding]
 [<ffffffff814206ff>] ? dev_hard_start_xmit+0x2df/0x560
 [<ffffffff81420cc4>] ? __dev_queue_xmit+0x344/0x4c0
 [<ffffffff8145a41b>] ? ip_finish_output+0x69b/0x850
 [<ffffffffa04e1265>] ? ip_vs_dr_xmit+0xd5/0x1d0 [ip_vs]
 [<ffffffffa04d8f0a>] ? ip_vs_in+0x28a/0x5d0 [ip_vs]
 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff8144f465>] ? nf_iterate+0x65/0xa0
 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff8144f516>] ? nf_hook_slow+0x76/0x130
 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff81455d7b>] ? ip_local_deliver+0x6b/0x90
 [<ffffffff8141eae3>] ? __netif_receive_skb_core+0x543/0x750
 [<ffffffff8141f905>] ? process_backlog+0x95/0x160
 [<ffffffff8141f0f0>] ? net_rx_action+0x140/0x240
 [<ffffffff8106c621>] ? __do_softirq+0xf1/0x290
 [<ffffffff8106c9f5>] ? irq_exit+0x95/0xa0
 [<ffffffff815155fd>] ? call_function_single_interrupt+0x6d/0x80
 <EOI>  [<ffffffff8101155e>] ? __switch_to+0xde/0x5a0
 [<ffffffff813dfb02>] ? cpuidle_enter_state+0x52/0xc0
 [<ffffffff813dfaf8>] ? cpuidle_enter_state+0x48/0xc0
 [<ffffffff810a82e8>] ? cpu_startup_entry+0x2f8/0x400
 [<ffffffff81042c9f>] ? start_secondary+0x20f/0x2d0
---[ end trace 205378674dfd0cbc ]---

Comment 29 higkoo 2016-08-22 09:21:34 UTC

背景：
              目前我们webcdn基本信息如下：
              双服务器，网卡bond0，绑定eth4和eth5
auto bond0
iface bond0 inet static
slaves eth4 eth5
bond-mode balance-rr
              服务器通过LVS-DR和keepalived实现双活。服务器1为主服务器，LVS转发到本机与服务器2 的80端口。服务器2为热backup，LVS只转发流量到本机80端口。
 
问题：
              webcdn的LVS主服务器日志中不定期出现错误日志如下（热backup服务器未出现）:
Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: Detected Tx Unit Hang
  Tx Queue             <11>
  TDH, TDT             <15d>, <b6>
  next_to_use          <b6>
  next_to_clean        <15d>
tx_buffer_info[next_to_clean]
  time_stamp           <123e1619f>
  jiffies              <123e16669>
[Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: tx hang 1 detected on queue 5, resetting adapter
……
[Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: tx hang 1 detected on queue 11, resetting adapter
[Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: initiating reset due to tx timeout
……
[Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: initiating reset due to tx timeout
[Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: Reset adapter
[Wed Jul 27 11:51:52 2016] ixgbe 0000:04:00.1 eth5: tx hang 2 detected on queue 3, resetting adapter
[Wed Jul 27 11:51:53 2016] ixgbe 0000:04:00.1 eth5: detected SFP+: 6
[Wed Jul 27 11:51:53 2016] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: RX/TX
 
              为了解决这个问题，尝试关闭网卡tso，gro和gso。通过这些命令操作：
ethtool -K bond0 gro off gso off
ethtool -K eth4 tso off gro off gso off
ethtool -K eth5 tso off gro off gso off
 
              然而操作之后，服务器异常卡顿，丢包率和负载上升，kernel出现大量错误如下：
WARNING: CPU: 9 PID: 0 at /build/linux-7z1rSb/linux-3.16.7-ckt25/net/core/dev.c:2247 skb_warn_bad_offload+0xc6/0xd1()
ixgbe: caps=(0x00000806602083b3, 0x0000000000000000) len=1494 data_len=1440 gso_size=1440 gso_type=1 ip_summed=1
Modules linked in: binfmt_misc xt_multiport iptable_filter ip_tables x_tables dell_rbu ip_vs_wrr ip_vs nf_conntrack crc32c_generic bonding x86_pkg_
temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul ttm drm_kms_helper aesni_intel aes_x86_64 lrw drm gf128mul glue_helper i2c_algo_bit iTCO_wdt ablk_helper i2c_core cryptd iTCO_vendor_support dcdbas evdev pcspkr wmi acpi_power_meter lpc_ich shpchp mei_me processor mfd_core mei thermal_sys button ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 xfs libcrc32c sr_mod cdrom sg sd_mod crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul ehci_pci tg3 ixgbe ehci_hcd megaraid_sas crct10dif_common libata dca ptp crc32c_intel usbcore pps_core scsi_mod libphy usb_common mdio
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W     3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3
Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015
0000000000000000 ffffffff8150e08f ffff88087e523a50 0000000000000009
ffffffff81067777 ffff8807b1441500 ffff88087e523aa0 0000000000000001
0000000000000001 ffff8807b1441500 ffffffff810677dc ffffffff817774d0
Call Trace:
<IRQ>  [<ffffffff8150e08f>] ? dump_stack+0x5d/0x78
[<ffffffff81067777>] ? warn_slowpath_common+0x77/0x90
[<ffffffff810677dc>] ? warn_slowpath_fmt+0x4c/0x50
[<ffffffff8150f7be>] ? skb_warn_bad_offload+0xc6/0xd1
[<ffffffff81420241>] ? __skb_gso_segment+0x71/0xc0
[<ffffffff8142058a>] ? dev_hard_start_xmit+0x16a/0x560
[<ffffffff81417b00>] ? __skb_tx_hash+0x100/0x100
[<ffffffff81440b39>] ? sch_direct_xmit+0xc9/0x1a0
[<ffffffff81420b74>] ? __dev_queue_xmit+0x1f4/0x4c0
[<ffffffffa028e608>] ? bond_xmit_slave_id+0x88/0x120 [bonding]
[<ffffffffa02924ab>] ? bond_start_xmit+0x15b/0x420 [bonding]
[<ffffffff814206ff>] ? dev_hard_start_xmit+0x2df/0x560
[<ffffffff81420cc4>] ? __dev_queue_xmit+0x344/0x4c0
[<ffffffff8145a41b>] ? ip_finish_output+0x69b/0x850                                                
 [<ffffffffa04e1265>] ? ip_vs_dr_xmit+0xd5/0x1d0 [ip_vs]
[<ffffffffa04d8f0a>] ? ip_vs_in+0x28a/0x5d0 [ip_vs]
[<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
[<ffffffff8144f465>] ? nf_iterate+0x65/0xa0
[<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
[<ffffffff8144f516>] ? nf_hook_slow+0x76/0x130
[<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350
[<ffffffff81455d7b>] ? ip_local_deliver+0x6b/0x90
[<ffffffff8141eae3>] ? __netif_receive_skb_core+0x543/0x750
[<ffffffff8141f905>] ? process_backlog+0x95/0x160
[<ffffffff8141f0f0>] ? net_rx_action+0x140/0x240
[<ffffffff8106c621>] ? __do_softirq+0xf1/0x290
             
比较奇怪的一点是，只有主服务器（LVS vip所在机器）会出现这个错误。在热backup服务器上关闭网卡tso，gro和gso并不会上述错误。
 
服务器其他信息：
                     uname –a: 
Linux bilibili-w-01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux
                     ethtool -i eth4:
ethtool -i eth4
driver: ixgbe
version: 3.19.1-k
firmware-version: 0x80000827
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
                     ethtool –k eth4:
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: on [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
busy-poll: on [fixed]
                     ip a:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 120.xx.xx.xx(VIP)/32 brd 120.xx.xx.xx scope global lo:88-27
valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether a0:36:9f:98:5a:78 brd ff:ff:ff:ff:ff:ff
    inet 120.xx.xx.xx/28 brd 120.xx.xx.xx scope global bond0
       valid_lft forever preferred_lft forever
    inet 120.xx.xx.xx(VIP)/32 scope global bond0
valid_lft forever preferred_lft forever
                     ipvsadm –Ln(主服务器)
ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4194304)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  120.xx.xx.xx:80 wrr
  -> 120.xx.xx.xx:80             Route   10     283921     100287   
  -> 127.0.0.1:80                 Route   10     283834     100067

Note You need to log in before you can comment on or make changes to this bug.