Bug 1266601

Summary: "hw csum failure"
Product: [Fedora] Fedora Reporter: Stefan Ring <stefanrin>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-28 12:19:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefan Ring 2015-09-25 19:49:44 UTC
Description of problem:

I've recently updated my kernel from 3.17.8 to 4.2.1. Unfortunately, I haven't been able to upgrade earlier due to my usage of ZFS on Linux and a long-standing incompatibility with anything 3.18+, so I don't know if this happened with earlier versions. There is one other report though where someone started seeing this with 4.2 as well: https://github.com/raspberrypi/linux/issues/1083

This is an unmodified kernel from http://pkgs.fedoraproject.org/cgit/kernel.git/commit/?h=f23&id=1dedebfc98c5866fcf6a3b65bed15fb594d957d1 rebuilt on/for F22.

Since the upgrade, I get this when another machine on the network does DHCP requests:

[  192.721601] p35p1: hw csum failure
[  192.722550] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           OE   4.2.1-300.str.fc22.x86_64 #1
[  192.722588] Hardware name: System manufacturer System Product Name/P5K WS, BIOS 1201    06/26/2008
[  192.722588]  0000000000000000 09b50d4da9aa776f ffff88022fc03998 ffffffff8177170a
[  192.722588]  0000000000000000 ffff8800365fa000 ffff88022fc039b8 ffffffff8165d250
[  192.722588]  ffffffff8164b190 ffff88021e62c600 ffff88022fc039e8 ffffffff81653dcc
[  192.722588] Call Trace:
[  192.722588]  <IRQ>  [<ffffffff8177170a>] dump_stack+0x45/0x57
[  192.722588]  [<ffffffff8165d250>] netdev_rx_csum_fault+0x40/0x50
[  192.722588]  [<ffffffff8164b190>] ? reqsk_fastopen_remove+0x160/0x160
[  192.722588]  [<ffffffff81653dcc>] __skb_checksum_complete+0xbc/0xd0
[  192.722588]  [<ffffffff81754278>] ipv6_mc_validate_checksum+0x98/0x150
[  192.722588]  [<ffffffff8164fffe>] skb_checksum_trimmed+0x9e/0x190
[  192.722588]  [<ffffffff81754459>] ipv6_mc_check_mld+0x129/0x340
[  192.722588]  [<ffffffffa0866447>] br_multicast_rcv+0x87/0xcc0 [bridge]
[  192.722588]  [<ffffffffa085d4d2>] br_handle_frame_finish+0x2a2/0x5f0 [bridge]
[  192.722588]  [<ffffffffa085d98a>] br_handle_frame+0x16a/0x290 [bridge]
[  192.722588]  [<ffffffff816602b4>] __netif_receive_skb_core+0x384/0xa00
[  192.722588]  [<ffffffff81752be0>] ? ipv6_gro_receive+0x230/0x320
[  192.722588]  [<ffffffff81660948>] __netif_receive_skb+0x18/0x60
[  192.722588]  [<ffffffff816609d0>] netif_receive_skb_internal+0x40/0xb0
[  192.722588]  [<ffffffff816615d5>] napi_gro_receive+0xb5/0xf0
[  192.722588]  [<ffffffffa0006373>] sky2_poll+0x613/0xda0 [sky2]
[  192.722588]  [<ffffffff81777b5e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  192.722588]  [<ffffffff814a3c28>] ? credit_entropy_bits+0x258/0x320
[  192.722588]  [<ffffffff814a3006>] ? __mix_pool_bytes+0x36/0x80
[  192.722588]  [<ffffffff81660ebc>] net_rx_action+0x20c/0x310
[  192.722588]  [<ffffffff810a281b>] __do_softirq+0xfb/0x290
[  192.722588]  [<ffffffff810a2bc9>] irq_exit+0x119/0x120
[  192.722588]  [<ffffffff8177ad28>] do_IRQ+0x58/0xe0
[  192.722588]  [<ffffffff81778c2b>] common_interrupt+0x6b/0x6b
[  192.722588]  <EOI>  [<ffffffff8101f07c>] ? mwait_idle+0x8c/0x140
[  192.722588]  [<ffffffff8101f61f>] arch_cpu_idle+0xf/0x20
[  192.722588]  [<ffffffff810dfc3a>] default_idle_call+0x2a/0x40
[  192.722588]  [<ffffffff810dff79>] cpu_startup_entry+0x2c9/0x320
[  192.722588]  [<ffffffff81767e4c>] rest_init+0x7c/0x80
[  192.722588]  [<ffffffff81d5702d>] start_kernel+0x49d/0x4be
[  192.722588]  [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120
[  192.722588]  [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c
[  192.722588]  [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d

Version-Release number of selected component (if applicable):

4.2.1-300

How reproducible:

Boot the Fedora machine. Let it connect to the network and remain idle. Plug network cable into my Macbook.

Steps to Reproduce:
1. Start the Fedora machine. Let it connect to the network and remain idle
2. Plug network cable into Macbook

Actual results:

Stack trace shown above

Expected results:

Nothing

Additional info:

This is a simple bridge configuration. The physical ethernet card is the only member attached to it.

Comment 1 Stefan Ring 2015-11-19 19:49:01 UTC
Still there with 4.2.6-300.fc23.x86_64, this time using the binaries from the Fedora update mirror.

Comment 2 Stefan Ring 2015-11-30 20:05:10 UTC
Since I just tried a 4.1.13 kernel for experimentation unrelated to this bug, I can confirm that the faulty behavior does not happen with this version and really seems to have started with 4.2.

I can see some activity regarding MLD message validation and bridges in the git log leading up to v4.2. My first guess would be towards these changes.

Comment 3 Stefan Ring 2016-03-25 18:56:50 UTC
This is fixed by mainline kernel commits:

9b368814b336b0a1a479135eb2815edbc00efd3c
f8ffad69c9f8b8dfb0b633425d4ef4d2493ba61a
fdc5432a7b44ab7de17141beec19d946b9344e91

Comment 4 Josh Boyer 2016-03-28 12:19:56 UTC
Those are all in the 4.5 upstream kernel release.  F23 will be rebased to 4.5 around the 4.5.2 timeframe.