1572872 – 4.16.3-200 kernel update breaks ixgbe ethernet networking on Atom C3758 board

Bug 1572872 - 4.16.3-200 kernel update breaks ixgbe ethernet networking on Atom C3758 board

Summary: 4.16.3-200 kernel update breaks ixgbe ethernet networking on Atom C3758 board

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	27
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1574153 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-28 10:17 UTC by Georg Sauthoff
Modified:	2018-11-30 22:31 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-11-30 22:31:38 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Georg Sauthoff 2018-04-28 10:17:15 UTC

Description of problem:
After upgrading the kernel to 4.16.3-200 the machine cannot bring up its ethernet ports, anymore.

Version-Release number of selected component (if applicable):
kernel-4.16.3-200.fc27.x86_64

How reproducible:
always

Steps to Reproduce:
1. upgrade from 4.15.17-300.fc27.x86_64 to kernel-4.16.3-200.fc27.x86_64
2. reboot into new kernel
3.

Actual results:
network manager fails to configure the ethernet devices; the dhcp client tries and retries but doesn't seem to retrieve any response. Thus, the `ip addr ` command doesn't show any IP addresses.

Expected results:
Proper networking, i.e. NetworkManager immediately configures the ethernet devices, the dhcp client successfully sends and retrieves requests/responses.

Additional info:
The network works perfectly fine with 4.15.17-300.fc27.x86_64. The onboard ethernet ports are recognized by the ixgbe driver.

That means it worked before the `yum update` system update and it works after the update when booting into the previous kernel 4.15.17-300.fc27.x86_64.

The system journal doesn't show any obvious error messages related to the ixgbe driver when booting kernel-4.16.3-200.fc27.x86_64.

Comment 1 Joe Doss 2018-05-11 04:15:01 UTC

This issue is is also present in Fedora 28 Atomic Host using 4.16.3-301.fc28.x86_64 on the same Intel Atom C3758 Atom board with Intel C3000 SoC Quad Gigabit Ethernet. Network manager fails to pull an IP via DHCP and setting a static IP results in total lack of network connectivity. After of being online for ~1 hour the kernel kicks this error out:

# dmesg
*snip*
[ 6745.359830] ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang 
                 Tx Queue             <2>
                 TDH, TDT             <64>, <65>
                 next_to_use          <65>
                 next_to_clean        <64>
               tx_buffer_info[next_to_clean]
                 time_stamp           <100624b4e>
                 jiffies              <1006259c0>
[ 6745.382128] ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 2, resetting adapter
[ 6745.385034] ixgbe 0000:05:00.0 eno1: initiating reset due to tx timeout
[ 6745.387961] ixgbe 0000:05:00.0 eno1: Reset adapter
[ 6745.390533] ixgbe 0000:05:00.0 eno1: NIC Link is Down
[ 6749.114555] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None

and the NIC resets and then network works as it should. Here is the relevant output from journald:

May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang 
                                        Tx Queue             <2>
                                        TDH, TDT             <64>, <65>
                                        next_to_use          <65>
                                        next_to_clean        <64>
                                      tx_buffer_info[next_to_clean]
                                        time_stamp           <100624b4e>
                                        jiffies              <1006259c0>
May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 2, resetting adapter
May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: initiating reset due to tx timeout
May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: Reset adapter
May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: NIC Link is Down
May 10 22:33:09 atomic kernel: ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None
May 10 22:33:09 atomic NetworkManager[911]: <info>  [1526009589.8657] device (eno1): carrier: link connected
May 10 22:33:12 atomic dhclient[1794]: DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 15 (xid=0x8fc9a106)
May 10 22:33:27 atomic dhclient[1794]: DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 21 (xid=0x8fc9a106)
May 10 22:33:28 atomic dhclient[1794]: DHCPREQUEST on eno1 to 255.255.255.255 port 67 (xid=0x8fc9a106)
May 10 22:33:28 atomic dhclient[1794]: DHCPOFFER from 192.168.1.1
May 10 22:33:28 atomic dhclient[1794]: DHCPACK from 192.168.1.1 (xid=0x8fc9a106)
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0856] dhcp4 (eno1):   address 192.168.1.196
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0856] dhcp4 (eno1):   plen 24 (255.255.255.0)
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0856] dhcp4 (eno1):   gateway 192.168.1.1
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0857] dhcp4 (eno1):   lease time 7200
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0857] dhcp4 (eno1):   nameserver '192.168.1.1'
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0857] dhcp4 (eno1):   domain name 'inf7.net'
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0858] dhcp4 (eno1): state changed unknown -> bound
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.0873] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-stat>
May 10 22:33:28 atomic dbus-daemon[905]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-di>
May 10 22:33:28 atomic dhclient[1794]: bound to 192.168.1.196 -- renewal in 3039 seconds.
May 10 22:33:28 atomic systemd[1]: Starting Network Manager Script Dispatcher Service...
May 10 22:33:28 atomic dbus-daemon[905]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
May 10 22:33:28 atomic systemd[1]: Started Network Manager Script Dispatcher Service.
May 10 22:33:28 atomic audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dis>
May 10 22:33:28 atomic nm-dispatcher[1806]: req:1 'pre-up' [eno1]: new request (1 scripts)
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1253] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-st>
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1259] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-s>
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1261] manager: NetworkManager state is now CONNECTED_LOCAL
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1313] manager: NetworkManager state is now CONNECTED_SITE
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1315] policy: set 'eno1' (eno1) as default for IPv4 routing and DNS
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1316] policy: set 'eno1' (eno1) as default for IPv6 routing and DNS
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1385] device (eno1): Activation: successful, device activated.
May 10 22:33:28 atomic NetworkManager[911]: <info>  [1526009608.1400] manager: NetworkManager state is now CONNECTED_GLOBAL
May 10 22:33:28 atomic nm-dispatcher[1806]: req:2 'up' [eno1]: new request (6 scripts)
May 10 22:33:28 atomic nm-dispatcher[1806]: req:2 'up' [eno1]: start running ordered scripts...
May 10 22:33:28 atomic nm-dispatcher[1806]: req:3 'connectivity-change': new request (6 scripts)

Comment 2 Joe Doss 2018-05-29 03:50:27 UTC

The issue persists in 4.16.8-300.fc28.

Comment 3 Joe Doss 2018-05-29 06:03:26 UTC

Compiling the ixgbe driver from source using the most current stable version [1] results in the NICs working as expected.

[1] https://sourceforge.net/projects/e1000/files/ixgbe%20stable/5.3.7/

Comment 4 Joe Doss 2018-06-14 21:50:11 UTC

The issue persists in 4.17.0-200.fc28.

Comment 5 Joe Doss 2018-06-16 00:57:02 UTC

This issue has this upstream thread about the problem [1] and per this Archlinux forum post [2], setting CONFIG_INET_ESP_OFFLOAD=n and CONFIG_INET6_ESP_OFFLOAD=n fixes the problem. I have built a kernel with these unset and verified that these changes work. 

Georg Sauthoff give it a try with this kernel RPM [3] and see if it fixes your issues while upstream figures out the problem. 

1: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20180528/thread.html#13004
2: https://bbs.archlinux.org/viewtopic.php?pid=1790848#p1790848
3: https://copr.fedorainfracloud.org/coprs/jdoss/kernel/build/767725/

Comment 6 Joe Doss 2018-07-16 14:32:54 UTC

It looks like this was fixed in 4.18-rc1 and later with these commits:

> e433f3a5e272625c166d780f79ecc8fe456a5fc9 ixgbe: Use 
> CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM
>
> de7a7e34e27c029fbb3c4e764db045548629b834 ixgbe: Move ipsec init 
> function to before reset call
>
> e9f655ee97f14b4f5eba7b6b5a56a7c298573e67 ixgbe: Avoid loopback and fix 
> boolean logic in ipsec_stop_data
>
> 421d954c4f1e9afd55bc65398bfc64ceba38df21 ixgbe: Fix bit definitions 
> and add support for testing for ipsec support

Any chance we can get these applied to 4.17.x?

Comment 7 Justin M. Forbes 2018-07-23 15:19:31 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 8 Georg Sauthoff 2018-07-27 07:47:58 UTC

As expected, it's still reproducible with kernel-4.17.7-100.fc27.x86_64.

See also Joe Doss' comment (Comment 6) for a possible route how to get this fixed with 4.17 kernels.

Comment 9 Joe Doss 2018-08-03 02:08:10 UTC

4.18.0-0.rc7.git1.1.fc29.x86_64 has been working well on my Intel Atom C3758 board. Unfortunately fresh Fedora 28 Server installs are dead in the water until you manually upgrade to the F29 4.18.x kernel.

Comment 10 Phil Wiggum 2018-08-05 18:38:00 UTC

*** Bug 1574153 has been marked as a duplicate of this bug. ***

Comment 11 Phil Wiggum 2018-08-05 18:46:02 UTC

Reproducible on 4.17.11-100.fc27.x86_64.

Start working after being online for ~1 hour, after adapter Reset. 

# dmesg | grep ixgbe
[    2.951894] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k
[    2.951897] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[    3.343793] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
[    3.470491] ixgbe 0000:06:00.0: MAC: 6, PHY: 27, PBA No: 030000-000
[    3.470494] ixgbe 0000:06:00.0: ac:1f:6b:45:e3:68
[    3.514770] ixgbe 0000:06:00.0: Intel(R) 10 Gigabit Network Connection
[    3.893210] ixgbe 0000:06:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
[    4.020436] ixgbe 0000:06:00.1: MAC: 6, PHY: 27, PBA No: 030000-000
[    4.020439] ixgbe 0000:06:00.1: ac:1f:6b:45:e3:69
[    4.064631] ixgbe 0000:06:00.1: Intel(R) 10 Gigabit Network Connection
[    4.443039] ixgbe 0000:07:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
[    4.570460] ixgbe 0000:07:00.0: MAC: 6, PHY: 27, PBA No: 030000-000
[    4.570463] ixgbe 0000:07:00.0: ac:1f:6b:45:e3:6a
[    4.614748] ixgbe 0000:07:00.0: Intel(R) 10 Gigabit Network Connection
[    4.996452] ixgbe 0000:07:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
[    5.123467] ixgbe 0000:07:00.1: MAC: 6, PHY: 27, PBA No: 030000-000
[    5.123470] ixgbe 0000:07:00.1: ac:1f:6b:45:e3:6b
[    5.167752] ixgbe 0000:07:00.1: Intel(R) 10 Gigabit Network Connection
[    5.169524] ixgbe 0000:06:00.0 eno1: renamed from eth0
[    5.181681] ixgbe 0000:07:00.1 eno4: renamed from eth3
[    5.194117] ixgbe 0000:06:00.1 eno2: renamed from eth1
[    5.204581] ixgbe 0000:07:00.0 eno3: renamed from eth2
[   10.397232] ixgbe 0000:06:00.0: registered PHC device on eno1
[   10.617219] ixgbe 0000:06:00.1: registered PHC device on eno2
[   10.833491] ixgbe 0000:07:00.0: registered PHC device on eno3
[   11.064703] ixgbe 0000:07:00.1: registered PHC device on eno4
[   14.497787] ixgbe 0000:06:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: RX/TX
[ 5457.922789] ixgbe 0000:06:00.0 eno1: Detected Tx Unit Hang 
[ 5457.944579] ixgbe 0000:06:00.0 eno1: tx hang 1 detected on queue 4, resetting adapter
[ 5457.944587] ixgbe 0000:06:00.0 eno1: initiating reset due to tx timeout
[ 5457.944687] ixgbe 0000:06:00.0 eno1: Reset adapter
[ 5457.946663] ixgbe 0000:06:00.0 eno1: NIC Link is Down
[ 5461.574214] ixgbe 0000:06:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: RX/TX

Comment 12 Georg Sauthoff 2018-09-19 06:59:24 UTC

As expected, the 4.18.7-100.fc27.x86_64 kernel update (available from stable) fixes this issue for me.

Comment 13 Ben Cotton 2018-11-27 14:39:26 UTC

This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Ben Cotton 2018-11-30 22:31:38 UTC

Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.