Description of problem: This is a server (Intel DQ77KB) that I've been using to run a few VMs (FreeBSD, Fedora, an XP VM) using KVM and OpenVSwitch. Since moving to Fedora 20, esp some updates that I installed in the past hour, this machine has become unstable. Additional info: reporter: libreport-2.1.10 WARNING: CPU: 0 PID: 3630 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda() : caps=(0x00000008801948c9, 0x0000000000000000) len=1898 data_len=1832 gso_size=1448 gso_type=5 ip_summed=0 Modules linked in: vhost_net vhost macvtap macvlan tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE bnep bluetooth ip6t_REJECT xt_conntrack cfg80211 rfkill openvswitch vxlan ip_tunnel gre libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev hid_logitech_dj snd_hda_intel microcode snd_hda_codec snd_hwdep i2c_i801 snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer lpc_ich mfd_core snd soundcore shpchp e1000e ptp pps_core mei_me mei tpm_tis tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd sunrpc raid1 i915 i2c_algo_bit drm_kms_helper drm i2c_core video CPU: 0 PID: 3630 Comm: vhost-3613 Not tainted 3.12.5-302.fc20.x86_64 #1 Hardware name: /DQ77KB, BIOS KBQ7710H.86A.0052.2013.0708.1336 07/08/2013 0000000000000009 ffff88041e203a10 ffffffff81662d11 ffff88041e203a58 ffff88041e203a48 ffffffff810691dd ffff8803f39d6c00 ffff8803cf1b8000 0000000000000005 0000000000000000 ffff8803f39d6c00 ffff88041e203aa8 Call Trace: <IRQ> [<ffffffff81662d11>] dump_stack+0x45/0x56 [<ffffffff810691dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106924c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff81308743>] ? ___ratelimit+0x93/0x100 [<ffffffff816652a2>] skb_warn_bad_offload+0xcd/0xda [<ffffffff81566d01>] __skb_gso_segment+0x71/0xc0 [<ffffffff8156700a>] dev_hard_start_xmit+0x18a/0x570 [<ffffffff815859f0>] sch_direct_xmit+0xe0/0x1c0 [<ffffffff815675e9>] dev_queue_xmit+0x1f9/0x4a0 [<ffffffffa0507ecb>] netdev_send+0x4b/0xc0 [openvswitch] [<ffffffffa05033d2>] ? ovs_masked_flow_lookup+0x122/0x260 [openvswitch] [<ffffffffa050783d>] ovs_vport_send+0x1d/0x80 [openvswitch] [<ffffffffa04fe16a>] do_output+0x2a/0x50 [openvswitch] [<ffffffffa04fe613>] do_execute_actions+0x2e3/0xb20 [openvswitch] [<ffffffff810a43d2>] ? enqueue_task_fair+0x412/0x660 [<ffffffffa04fee7b>] ovs_execute_actions+0x2b/0x30 [openvswitch] [<ffffffffa05022e8>] ovs_dp_process_received_packet+0x88/0x100 [openvswitch] [<ffffffff8109ab37>] ? try_to_wake_up+0xe7/0x290 [<ffffffffa05077aa>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa0508211>] netdev_frame_hook+0xc1/0x120 [openvswitch] [<ffffffff81565072>] __netif_receive_skb_core+0x252/0x820 [<ffffffff81565658>] __netif_receive_skb+0x18/0x60 [<ffffffff8156617e>] process_backlog+0xae/0x180 [<ffffffff81565a49>] net_rx_action+0x149/0x240 [<ffffffff8106e747>] __do_softirq+0xf7/0x240 [<ffffffff8167361c>] call_softirq+0x1c/0x30 <EOI> [<ffffffff810146a5>] do_softirq+0x55/0x90 [<ffffffff81564d58>] netif_rx_ni+0x28/0x30 [<ffffffffa06236e1>] tun_get_user+0x401/0x820 [tun] [<ffffffffa0623b5a>] tun_sendmsg+0x5a/0x80 [tun] [<ffffffffa063dc9c>] handle_tx+0x1bc/0x530 [vhost_net] [<ffffffffa063e045>] handle_tx_kick+0x15/0x20 [vhost_net] [<ffffffffa062bdb2>] vhost_worker+0xf2/0x190 [vhost] [<ffffffffa062bcc0>] ? vhost_dev_reset_owner+0x30/0x30 [vhost] [<ffffffff8108b0d0>] kthread+0xc0/0xd0 [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40 [<ffffffff81671cbc>] ret_from_fork+0x7c/0xb0 [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40
Created attachment 844264 [details] File: dmesg
This was claimed to be fixed with 3.10: http://openvswitch.org/pipermail/discuss/2013-May/009977.html Tony, can you get us a bit more details about the networking set up? - ip a - ethtool -k <interface> Where <interface> is each interface that is involved here. Also 'cat /proc/net/bonding/*' if you are using bonding. Thanks, Michele
Sure... There are currently four VMs running, so four OVS vnet? interfaces. The two physical interfaces are em1: VM Trunk, em2: Management. [root@muscaria ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether e8:40:f2:e3:1c:b5 brd ff:ff:ff:ff:ff:ff inet 10.1.4.30/24 brd 10.1.4.255 scope global dynamic em2 valid_lft 5998sec preferred_lft 5998sec inet6 2607:f2c0:f00e:8f0a:ea40:f2ff:fee3:1cb5/128 scope global dynamic valid_lft 86385sec preferred_lft 86385sec inet6 fe80::ea40:f2ff:fee3:1cb5/64 scope link valid_lft forever preferred_lft forever 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000 link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff inet6 fe80::ea40:f2ff:fee3:1cb6/64 scope link valid_lft forever preferred_lft forever 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 76:5b:1f:56:aa:38 brd ff:ff:ff:ff:ff:ff 5: ovs_DMZbr0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff inet6 fe80::7423:24ff:fe73:ac44/64 scope link valid_lft forever preferred_lft forever 6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:c4:dc:2c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fec4:dc2c/64 scope link valid_lft forever preferred_lft forever 7: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:e1:ca:15 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fee1:ca15/64 scope link valid_lft forever preferred_lft forever 14: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:b6:3a:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:feb6:3aee/64 scope link valid_lft forever preferred_lft forever 16: vnet3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:6e:6a:63 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe6e:6a63/64 scope link valid_lft forever preferred_lft forever [root@muscaria ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether e8:40:f2:e3:1c:b5 brd ff:ff:ff:ff:ff:ff inet 10.1.4.30/24 brd 10.1.4.255 scope global dynamic em2 valid_lft 5998sec preferred_lft 5998sec inet6 2607:f2c0:f00e:8f0a:ea40:f2ff:fee3:1cb5/128 scope global dynamic valid_lft 86385sec preferred_lft 86385sec inet6 fe80::ea40:f2ff:fee3:1cb5/64 scope link valid_lft forever preferred_lft forever 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000 link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff inet6 fe80::ea40:f2ff:fee3:1cb6/64 scope link valid_lft forever preferred_lft forever 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 76:5b:1f:56:aa:38 brd ff:ff:ff:ff:ff:ff 5: ovs_DMZbr0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether e8:40:f2:e3:1c:b6 brd ff:ff:ff:ff:ff:ff inet6 fe80::7423:24ff:fe73:ac44/64 scope link valid_lft forever preferred_lft forever 6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:c4:dc:2c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fec4:dc2c/64 scope link valid_lft forever preferred_lft forever 7: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:e1:ca:15 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fee1:ca15/64 scope link valid_lft forever preferred_lft forever 14: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:b6:3a:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:feb6:3aee/64 scope link valid_lft forever preferred_lft forever 16: vnet3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 500 link/ether fe:54:00:6e:6a:63 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe6e:6a63/64 scope link valid_lft forever preferred_lft forever
Thanks Tony, can you also get me the output of: ethtool -i em1 ethtool -i em2 ethtool -k em1 ethtool -k em2 With that I should have enough to raise it upstream. thanks, Michele
[root@muscaria ~]# ethtool -i em1 driver: e1000e version: 2.3.2-k firmware-version: 2.1-3 bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no [root@muscaria ~]# ethtool -i em2 driver: e1000e version: 2.3.2-k firmware-version: 0.13-4 bus-info: 0000:00:19.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no [root@muscaria ~]# ethtool -k em1 Features for em1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: on loopback: off [fixed] rx-fcs: off rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] [root@muscaria ~]# ethtool -k em2 Features for em2: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: on loopback: off [fixed] rx-fcs: off rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed]
Thank You.
Hi Tony, how quickly can you reproduce this warning? I ask because there might be one patch worth trying before taking it on netdev/ovs-dev. Namely: commit 3cdb35b074142c915a463c535839886ae08fdfd4 Author: Pravin B Shelar <pshelar> Date: Fri Oct 25 15:12:33 2013 -0700 openvswitch: Enable all GSO features on internal port. OVS already can handle all types of segmentation offloads that are supported by the kernel. Following patch specifically enables UDP and IPV6 segmentation offloads. Signed-off-by: Pravin B Shelar <pshelar> Signed-off-by: Jesse Gross <jesse> If you can reproduce it quickly it is worth trying a kernel with the above patch. If not I'll ping upstream. thanks, Michele
I can see if that helps, but likely not until the weekend.
...THIS weekend. So, with the patch applied, the errors continue. The most recent: WARNING: CPU: 3 PID: 3192 at net/core/dev.c:2218 skb_warn_bad_offload+0xcd/0xda()
Michele, Is there anything else to try? Any more offloads types to enable?
Hi Tony, (catching up on my backlog) How quickly can you reproduce this issue? Does it also happen on 3.13? (http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/x86_64/) thanks, Michele
Yup :( A few since booting with 3.13 already.
Hello, I'm experiencing the same problem: an 21 05:29:32 host47 kernel: [13775187.131611] WARNING: at net/core/dev.c:1919 skb_warn_bad_offload+0xc2/0xcf() Jan 21 05:29:33 host47 kernel: [13775187.131619] Hardware name: AS -2042G-6RF Jan 21 05:29:34 host47 kernel: [13775187.131625] : caps=(0x0000000000115829, 0x0000000000000000) len=7161 data_len=7109 gso_size=1448 gso_type=5 ip_summed=0 Jan 21 05:29:35 host47 kernel: [13775187.131627] Modules linked in: iptable_nat raid1 iptable_mangle ebt_arp ebtable_nat ebt_limit ebt_ip ebtable_filter ebtables ipt_MASQUERADE nf_nat xt_conntrack lockd sunrpc bridge 8021q garp stp llc bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ip6t_REJECT ib_core nf_conntrack_ipv6 nf_defrag_ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter nf_conntrack_ipv4 ip6_tables nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM binfmt_misc raid10 microcode serio_raw sp5100_tco amd64_edac_mod edac_core i2c_piix4 fam15h_power k10temp edac_mce_amd i2c_core vhost_net tun macvtap macvlan ixgbe kvm_amd kvm mdio igb ptp pps_core dca raid0 crc32c_intel ghash_clmulni_intel usb_storage mpt2sas raid_class scsi_transport_sas [last unloaded: iptable_mangle] Jan 21 05:29:36 host47 kernel: [13775187.131711] Pid: 26764, comm: vhost-26741 Tainted: G W 3.6.8-2.cs.fc17.x86_64 #1 Jan 21 05:29:37 host47 kernel: [13775187.131845] Call Trace: Jan 21 05:29:38 host47 kernel: [13775187.131847] <IRQ> [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0 Jan 21 05:29:39 host47 kernel: [13775187.131874] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50 Jan 21 05:29:40 host47 kernel: [13775187.131884] [<ffffffff8161bc6d>] skb_warn_bad_offload+0xc2/0xcf Jan 21 05:29:41 host47 kernel: [13775187.131892] [<ffffffff8150f720>] skb_gso_segment+0x220/0x290 Jan 21 05:29:42 host47 kernel: [13775187.131900] [<ffffffff81512949>] dev_hard_start_xmit+0x239/0x690 Jan 21 05:29:43 host47 kernel: [13775187.131910] [<ffffffff8151312f>] dev_queue_xmit+0x38f/0x610 Jan 21 05:29:44 host47 kernel: [13775187.131924] [<ffffffffa038dd7f>] br_dev_queue_push_xmit+0x7f/0xd0 [bridge] Jan 21 05:29:45 host47 kernel: [13775187.131935] [<ffffffffa038e052>] br_forward_finish+0x22/0x60 [bridge] Jan 21 05:29:46 host47 kernel: [13775187.131945] [<ffffffffa038e0ed>] __br_forward+0x5d/0xb0 [bridge] Jan 21 05:29:47 host47 kernel: [13775187.131955] [<ffffffffa038e2fd>] br_forward+0x5d/0x70 [bridge] Jan 21 05:29:48 host47 kernel: [13775187.131964] [<ffffffffa038f036>] br_handle_frame_finish+0x1f6/0x290 [bridge] Jan 21 05:29:49 host47 kernel: [13775187.131975] [<ffffffffa038f246>] br_handle_frame+0x176/0x260 [bridge] Jan 21 05:29:50 host47 kernel: [13775187.131984] [<ffffffff81510cc6>] __netif_receive_skb+0x226/0x8a0 Jan 21 05:29:51 host47 kernel: [13775187.131993] [<ffffffff8106d33e>] ? run_timer_softirq+0x3e/0x350 Jan 21 05:29:52 host47 kernel: [13775187.132000] [<ffffffff815113f2>] process_backlog+0xb2/0x180 Jan 21 05:29:53 host47 kernel: [13775187.132007] [<ffffffff81511f89>] net_rx_action+0x149/0x230 Jan 21 05:29:54 host47 kernel: [13775187.132015] [<ffffffff810654c0>] __do_softirq+0xd0/0x210 Jan 21 05:30:25 host47 kernel: [13775199.706346] [<ffffffffa00cfee5>] handle_tx_kick+0x15/0x20 [vhost_net] Jan 21 05:30:26 host47 kernel: [13775199.706357] [<ffffffffa00cc83d>] vhost_worker+0xed/0x190 [vhost_net] Jan 21 05:30:27 host47 kernel: [13775199.706368] [<ffffffffa00cc750>] ? memory_access_ok.isra.11+0xd0/0xd0 [vhost_net] Jan 21 05:30:28 host47 kernel: [13775199.706375] [<ffffffff8107fde3>] kthread+0x93/0xa0 Jan 21 05:30:29 host47 kernel: [13775199.706383] [<ffffffff81627f04>] kernel_thread_helper+0x4/0x10 Jan 21 05:30:30 host47 kernel: [13775199.706390] [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70 Jan 21 05:30:31 host47 kernel: [13775199.706406] [<ffffffff81627f00>] ? gs_change+0x13/0x13 Jan 21 05:30:32 host47 kernel: [13775199.706408] ---[ end trace f788b6ed554f1608 ]--- Jan 21 05:30:33 host47 kernel: [13775199.709236] ------------[ cut here ]------------ I have several servers that experience this issue from quite some time now. The machines are Supermicro H8QG6 whith 3.6.8-2.fc17.x86_64 #1 SMP Tue Jun 25 20:49:58 EEST 2013 x86_64 x86_64 x86_64 GNU/Linux There are two 10g interfaces (p1p1 and p1p2: 03:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) ) that are in lacp bonding (bond0), which has vlan (bond0.10), which has bridge interface (br10). I've tried set to off every offload option on all of but with no success: ethtool -k br10 Offload parameters for br10: rx-checksumming: off tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off rx-vlan-offload: off tx-vlan-offload: off ntuple-filters: off receive-hashing: off .
Yes, still happening with 3.13.
https://retrace.fedoraproject.org/faf/reports/322143/ Reports that made it through to the abrt server.
Still happening w/ 3.14.0-0.rc2.git3.2.fc21.x86_64 #1 SMP Thu Feb 13 19:01:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
How to proceed? Can anything be done?
Hi Tony, sorry for the delay, I've raised the issue upstream. I'll ping you if more infos are needed. regards, Michele
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Description of problem: I started up a VM under libvirt/KVM/qemu and I just started accumulating this kernel oopses. Same VMs I've been running for a while. This is the first time I booted them up under this kernel however. They were running fine with 3.14.4 unsure anout 3.14.5 or 3.14.6 as I am not completely sure when was the last time I booted them up. The VMs appear to be running fine and so does the host machine, but I cannot be certain. Thankfully these are my personal VMs and none of this is production affecting. Version-Release number of selected component: kernel Additional info: reporter: libreport-2.2.2 cmdline: BOOT_IMAGE=/vmlinuz-3.14.7-200.fc20.x86_64 root=UUID=1480ee94-2b82-4f59-8789-d071313a416c ro vconsole.font=latarcyrheb-sun16 rhgb quiet clocksource=hpet elevator=deadline LANG=en_US.UTF-8 kernel: 3.14.7-200.fc20.x86_64 runlevel: N 5 type: Kerneloops Truncated backtrace: WARNING: CPU: 1 PID: 4392 at net/core/dev.c:2238 skb_warn_bad_offload+0xcd/0xda() r8169: caps=(0x0000000100004180, 0x0000000000000000) len=15165 data_len=15099 gso_size=1448 gso_type=5 ip_summed=1 Modules linked in: vhost_net vhost macvtap macvlan tcp_diag inet_diag ipt_MASQUERADE xt_CHECKSUM tun iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_rpfilter ip6t_REJECT xt_conntrack bnep bluetooth 6lowpan_iphc cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw it87 hwmon_vid iTCO_wdt iTCO_vendor_support ppdev vfat fat x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel microcode snd_hda_codec i2c_i801 snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core mei_me mei snd_timer snd shpchp soundcore parport_pc parport nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper drm r8169 mii i2c_core video CPU: 1 PID: 4392 Comm: qemu-system-x86 Not tainted 3.14.7-200.fc20.x86_64 #1 Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.01 02/21/2014 0000000000000000 000000004560ae5c ffff88081fa43838 ffffffff816f04b2 ffff88081fa43880 ffff88081fa43870 ffffffff8108a1cd ffff88078c34cc00 ffff8807f2b42000 0000000000000005 0000000000000001 ffff88078c34cc00 Call Trace: <IRQ> [<ffffffff816f04b2>] dump_stack+0x45/0x56 [<ffffffff8108a1cd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8108a24c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff8135a6b3>] ? ___ratelimit+0x93/0x100 [<ffffffff816f2baf>] skb_warn_bad_offload+0xcd/0xda [<ffffffff815e5c99>] __skb_gso_segment+0x79/0xc0 [<ffffffff815e5fca>] dev_hard_start_xmit+0x18a/0x5d0 [<ffffffff816076d0>] sch_direct_xmit+0xe0/0x1c0 [<ffffffff815e6611>] __dev_queue_xmit+0x201/0x4c0 [<ffffffff815e68e0>] dev_queue_xmit+0x10/0x20 [<ffffffff81624c99>] ip_finish_output+0x339/0x440 [<ffffffff81626148>] ip_output+0x58/0x90 [<ffffffff81621edb>] ip_forward_finish+0x8b/0x1c0 [<ffffffff81622352>] ip_forward+0x342/0x440 [<ffffffff8161ffad>] ip_rcv_finish+0x7d/0x350 [<ffffffff816208f8>] ip_rcv+0x298/0x3d0 [<ffffffff815e47d6>] __netif_receive_skb_core+0x646/0x830 [<ffffffff81616faa>] ? nf_iterate+0xaa/0xc0 [<ffffffff815e49d8>] __netif_receive_skb+0x18/0x60 [<ffffffff815e4a60>] netif_receive_skb_internal+0x40/0xc0 [<ffffffff815e4afc>] netif_receive_skb+0x1c/0x70 [<ffffffffa0497933>] br_handle_frame_finish+0x1f3/0x3f0 [bridge] [<ffffffffa049ed6d>] br_nf_pre_routing_finish+0x1bd/0x3d0 [bridge] [<ffffffffa049f21c>] br_nf_pre_routing+0x29c/0x660 [bridge] [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge] [<ffffffff81616faa>] nf_iterate+0xaa/0xc0 [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge] [<ffffffff81617044>] nf_hook_slow+0x84/0x140 [<ffffffffa0497740>] ? br_handle_local_finish+0x70/0x70 [bridge] [<ffffffffa0497ca8>] br_handle_frame+0x178/0x230 [bridge] [<ffffffff815e4402>] __netif_receive_skb_core+0x272/0x830 [<ffffffff8109c931>] ? send_sigqueue+0x101/0x1e0 [<ffffffff815e49d8>] __netif_receive_skb+0x18/0x60 [<ffffffff815e569e>] process_backlog+0xae/0x180 [<ffffffff815e4e79>] net_rx_action+0x149/0x240 [<ffffffff8108f8c5>] __do_softirq+0xf5/0x2a0 [<ffffffff8170219c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8108fb15>] do_softirq+0x55/0x60 [<ffffffff815e4094>] netif_rx_ni+0x34/0x70 [<ffffffffa04eb864>] tun_get_user+0x424/0x890 [tun] [<ffffffffa04ebdcb>] tun_chr_aio_write+0x7b/0xa0 [tun] [<ffffffff811e93d9>] do_sync_readv_writev+0x59/0xa0 [<ffffffff811ea933>] do_readv_writev+0xc3/0x240 [<ffffffff811eab30>] vfs_writev+0x30/0x60 [<ffffffff811eaca9>] SyS_writev+0x59/0xf0 [<ffffffff81700869>] system_call_fastpath+0x16/0x1b
I have similar problem in RHEL6 2.6.32-431.11.2.el6.x86_64 with bonding and ixgbe drivers: ------------[ cut here ]------------ WARNING: at net/core/dev.c:1907 skb_warn_bad_offload+0xc2/0xf0() (Not tainted) Hardware name: UCSC-C220-M3L bonding: caps=(0xf1d3a5, 0x0) len=1618 data_len=1552 ip_summed=1 Modules linked in: ip_vs_rr ip_vs libcrc32c autofs4 acpi_cpufreq freq_table mperf bonding 8021q garp stp llc iptable_filter iptable_mangle ip_tables ip6table_filter xt_MARK xt_multiport ip6table_mangle ip6_tables ipv6 iTCO_wdt iTCO_vendor_support microcode ipmi_devintf power_meter sb_edac edac_core lpc_ich mfd_core i2c_i801 sg ixgbe mdio igb dca i2c_algo_bit i2c_core ptp pps_core ext4 jbd2 mbcache raid1 sd_mod crc_t10dif isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff8145b442>] ? skb_warn_bad_offload+0xc2/0xf0 [<ffffffff81460531>] ? __skb_gso_segment+0x71/0xc0 [<ffffffff81460593>] ? skb_gso_segment+0x13/0x20 [<ffffffff8146063b>] ? dev_hard_start_xmit+0x9b/0x480 [<ffffffffa0192093>] ? ipt_post_routing_hook+0x23/0x30 [iptable_mangle] [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0 [<ffffffff81460c5d>] ? dev_queue_xmit+0x1bd/0x320 [<ffffffff8149a6d8>] ? ip_finish_output+0x148/0x310 [<ffffffffa02ab8c0>] ? dst_output+0x0/0x20 [ip_vs] [<ffffffff8149a958>] ? ip_output+0xb8/0xc0 [<ffffffffa02ac91a>] ? ip_vs_dr_xmit+0x17a/0x1b0 [ip_vs] [<ffffffffa02a5e82>] ? ip_vs_in+0x202/0x3b0 [ip_vs] [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0 [<ffffffff814946e0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81489aa6>] ? nf_hook_slow+0x76/0x120 [<ffffffff814946e0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81494a0a>] ? ip_local_deliver+0x5a/0xa0 [<ffffffff81493f0d>] ? ip_rcv_finish+0x12d/0x440 [<ffffffff81494495>] ? ip_rcv+0x275/0x350 [<ffffffff8145b9bb>] ? __netif_receive_skb+0x4ab/0x750 [<ffffffff8145f628>] ? netif_receive_skb+0x58/0x60 [<ffffffff8145f730>] ? napi_skb_finish+0x50/0x70 [<ffffffff81460e99>] ? napi_gro_receive+0x39/0x50 [<ffffffffa019e6bf>] ? ixgbe_poll+0x54f/0x12c0 [ixgbe] [<ffffffff8106621c>] ? rebalance_domains+0x3cc/0x5a0 [<ffffffff81094fbd>] ? insert_work+0x6d/0xb0 [<ffffffff81460fb3>] ? net_rx_action+0x103/0x2f0 [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0 [<ffffffff810e6eb0>] ? handle_IRQ_event+0x60/0x170 [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0 [<ffffffff8107a795>] ? irq_exit+0x85/0x90 [<ffffffff81531605>] ? do_IRQ+0x75/0xf0 [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffff812e0bee>] ? intel_idle+0xde/0x170 [<ffffffff812e0bd1>] ? intel_idle+0xc1/0x170 [<ffffffff81426b67>] ? cpuidle_idle_call+0xa7/0x140 [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 [<ffffffff8152143c>] ? start_secondary+0x2ac/0x2ef ---[ end trace 825299608e8eaa5b ]--- Current workaround is to turn off GSO along with all other hw optimizations on eth* interfaces.
Running into this as well, subscribing.
This is still happening. Here are some details: [root@muscaria oops-2014-07-10-09:51:34-20624-3]# cat reason WARNING: CPU: 0 PID: 1890 at net/core/dev.c:2233 skb_warn_bad_offload+0xcd/0xda()[root@muscaria oops-2014-07-10-09:51:34-20624-3]# uname -a Linux muscaria 3.16.0-0.rc4.git2.2.fc22.x86_64 #1 SMP Wed Jul 9 22:19:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux [root@muscaria oops-2014-07-10-09:51:34-20624-3]# ls -la .. | grep oops | wc -l 20 [root@muscaria oops-2014-07-10-09:51:34-20624-3]#
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
I am not sure what information I can provide about Comment 24 above. If there is something needed from me please do ask it more explicitly. Thanks Rashid
I've just noticed this bug when starting up a FreeBSD guest, so I did some digging, and found out that this is actually a bug in FreeBSD's virtio implementation: http://www.spinics.net/lists/netdev/msg293976.html
I have meet this error when using LVS. The machine is both LVS.Master and LVS.Realserver, If I exec : ethtool -K bond0 gro off gso off ethtool -K eth0 tso off gro off gso off ethtool -K eth1 tso off gro off gso off Then Carsh: Linux version 3.16.0-4-amd64 (debian-kernel.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=UUID=6e817684-b791-4099-aa24-6e45f2f58997 ro net.ifnames=0 thash_entries=1048576 rhash_entries=1048576 biosdevname=0 nohz=off enforcing=0 ipv6.disable_ipv6=1 nmi_watchdog=0 selinux=0 transparent_hugepage=never cgroup_enable=memory swapaccount=1 vga=771 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz DMI: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015 WARNING: CPU: 9 PID: 0 at /build/linux-7z1rSb/linux-3.16.7-ckt25/net/core/dev.c:2247 skb_warn_bad_offload+0xc6/0xd1() ixgbe: caps=(0x00000806602083b3, 0x0000000000000000) len=1494 data_len=1440 gso_size=1440 gso_type=1 ip_summed=1 Modules linked in: binfmt_misc xt_multiport iptable_filter ip_tables x_tables dell_rbu ip_vs_wrr ip_vs nf_conntrack crc32c_generic bonding x86_pkg_temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul ttm drm_kms_helper aesni_intel aes_x86_64 lrw drm gf128mul glue_helper i2c_algo_bit iTCO_wdt ablk_helper i2c_core cryptd iTCO_vendor_support dcdbas evdev pcspkr wmi acpi_power_meter lpc_ich shpchp mei_me processor mfd_core mei thermal_sys button ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 xfs libcrc32c sr_mod cdrom sg sd_mod crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul ehci_pci tg3 ixgbe ehci_hcd megaraid_sas crct10dif_common libata dca ptp crc32c_intel usbcore pps_core scsi_mod libphy usb_common mdio CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3 Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015 0000000000000000 ffffffff8150e08f ffff88087e523a50 0000000000000009 ffffffff81067777 ffff8807b1441500 ffff88087e523aa0 0000000000000001 0000000000000001 ffff8807b1441500 ffffffff810677dc ffffffff817774d0 Call Trace: <IRQ> [<ffffffff8150e08f>] ? dump_stack+0x5d/0x78 [<ffffffff81067777>] ? warn_slowpath_common+0x77/0x90 [<ffffffff810677dc>] ? warn_slowpath_fmt+0x4c/0x50 [<ffffffff8150f7be>] ? skb_warn_bad_offload+0xc6/0xd1 [<ffffffff81420241>] ? __skb_gso_segment+0x71/0xc0 [<ffffffff8142058a>] ? dev_hard_start_xmit+0x16a/0x560 [<ffffffff81417b00>] ? __skb_tx_hash+0x100/0x100 [<ffffffff81440b39>] ? sch_direct_xmit+0xc9/0x1a0 [<ffffffff81420b74>] ? __dev_queue_xmit+0x1f4/0x4c0 [<ffffffffa028e608>] ? bond_xmit_slave_id+0x88/0x120 [bonding] [<ffffffffa02924ab>] ? bond_start_xmit+0x15b/0x420 [bonding] [<ffffffff814206ff>] ? dev_hard_start_xmit+0x2df/0x560 [<ffffffff81420cc4>] ? __dev_queue_xmit+0x344/0x4c0 [<ffffffff8145a41b>] ? ip_finish_output+0x69b/0x850 [<ffffffffa04e1265>] ? ip_vs_dr_xmit+0xd5/0x1d0 [ip_vs] [<ffffffffa04d8f0a>] ? ip_vs_in+0x28a/0x5d0 [ip_vs] [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff8144f465>] ? nf_iterate+0x65/0xa0 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff8144f516>] ? nf_hook_slow+0x76/0x130 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff81455d7b>] ? ip_local_deliver+0x6b/0x90 [<ffffffff8141eae3>] ? __netif_receive_skb_core+0x543/0x750 [<ffffffff8141f905>] ? process_backlog+0x95/0x160 [<ffffffff8141f0f0>] ? net_rx_action+0x140/0x240 [<ffffffff8106c621>] ? __do_softirq+0xf1/0x290 [<ffffffff8106c9f5>] ? irq_exit+0x95/0xa0 [<ffffffff815155fd>] ? call_function_single_interrupt+0x6d/0x80 <EOI> [<ffffffff8101155e>] ? __switch_to+0xde/0x5a0 [<ffffffff813dfb02>] ? cpuidle_enter_state+0x52/0xc0 [<ffffffff813dfaf8>] ? cpuidle_enter_state+0x48/0xc0 [<ffffffff810a82e8>] ? cpu_startup_entry+0x2f8/0x400 [<ffffffff81042c9f>] ? start_secondary+0x20f/0x2d0 ---[ end trace 205378674dfd0cbc ]---
背景: 目前我们webcdn基本信息如下: 双服务器,网卡bond0,绑定eth4和eth5 auto bond0 iface bond0 inet static slaves eth4 eth5 bond-mode balance-rr 服务器通过LVS-DR和keepalived实现双活。服务器1为主服务器,LVS转发到本机与服务器2 的80端口。服务器2为热backup,LVS只转发流量到本机80端口。 问题: webcdn的LVS主服务器日志中不定期出现错误日志如下(热backup服务器未出现): Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: Detected Tx Unit Hang Tx Queue <11> TDH, TDT <15d>, <b6> next_to_use <b6> next_to_clean <15d> tx_buffer_info[next_to_clean] time_stamp <123e1619f> jiffies <123e16669> [Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: tx hang 1 detected on queue 5, resetting adapter …… [Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: tx hang 1 detected on queue 11, resetting adapter [Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: initiating reset due to tx timeout …… [Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: initiating reset due to tx timeout [Wed Jul 27 11:51:51 2016] ixgbe 0000:04:00.1 eth5: Reset adapter [Wed Jul 27 11:51:52 2016] ixgbe 0000:04:00.1 eth5: tx hang 2 detected on queue 3, resetting adapter [Wed Jul 27 11:51:53 2016] ixgbe 0000:04:00.1 eth5: detected SFP+: 6 [Wed Jul 27 11:51:53 2016] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: RX/TX 为了解决这个问题,尝试关闭网卡tso,gro和gso。通过这些命令操作: ethtool -K bond0 gro off gso off ethtool -K eth4 tso off gro off gso off ethtool -K eth5 tso off gro off gso off 然而操作之后,服务器异常卡顿,丢包率和负载上升,kernel出现大量错误如下: WARNING: CPU: 9 PID: 0 at /build/linux-7z1rSb/linux-3.16.7-ckt25/net/core/dev.c:2247 skb_warn_bad_offload+0xc6/0xd1() ixgbe: caps=(0x00000806602083b3, 0x0000000000000000) len=1494 data_len=1440 gso_size=1440 gso_type=1 ip_summed=1 Modules linked in: binfmt_misc xt_multiport iptable_filter ip_tables x_tables dell_rbu ip_vs_wrr ip_vs nf_conntrack crc32c_generic bonding x86_pkg_ temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul ttm drm_kms_helper aesni_intel aes_x86_64 lrw drm gf128mul glue_helper i2c_algo_bit iTCO_wdt ablk_helper i2c_core cryptd iTCO_vendor_support dcdbas evdev pcspkr wmi acpi_power_meter lpc_ich shpchp mei_me processor mfd_core mei thermal_sys button ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 xfs libcrc32c sr_mod cdrom sg sd_mod crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul ehci_pci tg3 ixgbe ehci_hcd megaraid_sas crct10dif_common libata dca ptp crc32c_intel usbcore pps_core scsi_mod libphy usb_common mdio CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3 Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.5.4 10/05/2015 0000000000000000 ffffffff8150e08f ffff88087e523a50 0000000000000009 ffffffff81067777 ffff8807b1441500 ffff88087e523aa0 0000000000000001 0000000000000001 ffff8807b1441500 ffffffff810677dc ffffffff817774d0 Call Trace: <IRQ> [<ffffffff8150e08f>] ? dump_stack+0x5d/0x78 [<ffffffff81067777>] ? warn_slowpath_common+0x77/0x90 [<ffffffff810677dc>] ? warn_slowpath_fmt+0x4c/0x50 [<ffffffff8150f7be>] ? skb_warn_bad_offload+0xc6/0xd1 [<ffffffff81420241>] ? __skb_gso_segment+0x71/0xc0 [<ffffffff8142058a>] ? dev_hard_start_xmit+0x16a/0x560 [<ffffffff81417b00>] ? __skb_tx_hash+0x100/0x100 [<ffffffff81440b39>] ? sch_direct_xmit+0xc9/0x1a0 [<ffffffff81420b74>] ? __dev_queue_xmit+0x1f4/0x4c0 [<ffffffffa028e608>] ? bond_xmit_slave_id+0x88/0x120 [bonding] [<ffffffffa02924ab>] ? bond_start_xmit+0x15b/0x420 [bonding] [<ffffffff814206ff>] ? dev_hard_start_xmit+0x2df/0x560 [<ffffffff81420cc4>] ? __dev_queue_xmit+0x344/0x4c0 [<ffffffff8145a41b>] ? ip_finish_output+0x69b/0x850 [<ffffffffa04e1265>] ? ip_vs_dr_xmit+0xd5/0x1d0 [ip_vs] [<ffffffffa04d8f0a>] ? ip_vs_in+0x28a/0x5d0 [ip_vs] [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff8144f465>] ? nf_iterate+0x65/0xa0 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff8144f516>] ? nf_hook_slow+0x76/0x130 [<ffffffff814559f0>] ? ip_rcv_finish+0x350/0x350 [<ffffffff81455d7b>] ? ip_local_deliver+0x6b/0x90 [<ffffffff8141eae3>] ? __netif_receive_skb_core+0x543/0x750 [<ffffffff8141f905>] ? process_backlog+0x95/0x160 [<ffffffff8141f0f0>] ? net_rx_action+0x140/0x240 [<ffffffff8106c621>] ? __do_softirq+0xf1/0x290 比较奇怪的一点是,只有主服务器(LVS vip所在机器)会出现这个错误。在热backup服务器上关闭网卡tso,gro和gso并不会上述错误。 服务器其他信息: uname –a: Linux bilibili-w-01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux ethtool -i eth4: ethtool -i eth4 driver: ixgbe version: 3.19.1-k firmware-version: 0x80000827 bus-info: 0000:04:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no ethtool –k eth4: Features for eth4: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off busy-poll: on [fixed] ip a: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 120.xx.xx.xx(VIP)/32 brd 120.xx.xx.xx scope global lo:88-27 valid_lft forever preferred_lft forever 8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether a0:36:9f:98:5a:78 brd ff:ff:ff:ff:ff:ff inet 120.xx.xx.xx/28 brd 120.xx.xx.xx scope global bond0 valid_lft forever preferred_lft forever inet 120.xx.xx.xx(VIP)/32 scope global bond0 valid_lft forever preferred_lft forever ipvsadm –Ln(主服务器) ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4194304) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 120.xx.xx.xx:80 wrr -> 120.xx.xx.xx:80 Route 10 283921 100287 -> 127.0.0.1:80 Route 10 283834 100067