Description of problem: After upgrading the kernel to 4.16.3-200 the machine cannot bring up its ethernet ports, anymore. Version-Release number of selected component (if applicable): kernel-4.16.3-200.fc27.x86_64 How reproducible: always Steps to Reproduce: 1. upgrade from 4.15.17-300.fc27.x86_64 to kernel-4.16.3-200.fc27.x86_64 2. reboot into new kernel 3. Actual results: network manager fails to configure the ethernet devices; the dhcp client tries and retries but doesn't seem to retrieve any response. Thus, the `ip addr ` command doesn't show any IP addresses. Expected results: Proper networking, i.e. NetworkManager immediately configures the ethernet devices, the dhcp client successfully sends and retrieves requests/responses. Additional info: The network works perfectly fine with 4.15.17-300.fc27.x86_64. The onboard ethernet ports are recognized by the ixgbe driver. That means it worked before the `yum update` system update and it works after the update when booting into the previous kernel 4.15.17-300.fc27.x86_64. The system journal doesn't show any obvious error messages related to the ixgbe driver when booting kernel-4.16.3-200.fc27.x86_64.
This issue is is also present in Fedora 28 Atomic Host using 4.16.3-301.fc28.x86_64 on the same Intel Atom C3758 Atom board with Intel C3000 SoC Quad Gigabit Ethernet. Network manager fails to pull an IP via DHCP and setting a static IP results in total lack of network connectivity. After of being online for ~1 hour the kernel kicks this error out: # dmesg *snip* [ 6745.359830] ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang Tx Queue <2> TDH, TDT <64>, <65> next_to_use <65> next_to_clean <64> tx_buffer_info[next_to_clean] time_stamp <100624b4e> jiffies <1006259c0> [ 6745.382128] ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 2, resetting adapter [ 6745.385034] ixgbe 0000:05:00.0 eno1: initiating reset due to tx timeout [ 6745.387961] ixgbe 0000:05:00.0 eno1: Reset adapter [ 6745.390533] ixgbe 0000:05:00.0 eno1: NIC Link is Down [ 6749.114555] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None and the NIC resets and then network works as it should. Here is the relevant output from journald: May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang Tx Queue <2> TDH, TDT <64>, <65> next_to_use <65> next_to_clean <64> tx_buffer_info[next_to_clean] time_stamp <100624b4e> jiffies <1006259c0> May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 2, resetting adapter May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: initiating reset due to tx timeout May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: Reset adapter May 10 22:33:06 atomic kernel: ixgbe 0000:05:00.0 eno1: NIC Link is Down May 10 22:33:09 atomic kernel: ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None May 10 22:33:09 atomic NetworkManager[911]: <info> [1526009589.8657] device (eno1): carrier: link connected May 10 22:33:12 atomic dhclient[1794]: DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 15 (xid=0x8fc9a106) May 10 22:33:27 atomic dhclient[1794]: DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 21 (xid=0x8fc9a106) May 10 22:33:28 atomic dhclient[1794]: DHCPREQUEST on eno1 to 255.255.255.255 port 67 (xid=0x8fc9a106) May 10 22:33:28 atomic dhclient[1794]: DHCPOFFER from 192.168.1.1 May 10 22:33:28 atomic dhclient[1794]: DHCPACK from 192.168.1.1 (xid=0x8fc9a106) May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0856] dhcp4 (eno1): address 192.168.1.196 May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0856] dhcp4 (eno1): plen 24 (255.255.255.0) May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0856] dhcp4 (eno1): gateway 192.168.1.1 May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0857] dhcp4 (eno1): lease time 7200 May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0857] dhcp4 (eno1): nameserver '192.168.1.1' May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0857] dhcp4 (eno1): domain name 'inf7.net' May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0858] dhcp4 (eno1): state changed unknown -> bound May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.0873] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-stat> May 10 22:33:28 atomic dbus-daemon[905]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-di> May 10 22:33:28 atomic dhclient[1794]: bound to 192.168.1.196 -- renewal in 3039 seconds. May 10 22:33:28 atomic systemd[1]: Starting Network Manager Script Dispatcher Service... May 10 22:33:28 atomic dbus-daemon[905]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' May 10 22:33:28 atomic systemd[1]: Started Network Manager Script Dispatcher Service. May 10 22:33:28 atomic audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dis> May 10 22:33:28 atomic nm-dispatcher[1806]: req:1 'pre-up' [eno1]: new request (1 scripts) May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1253] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-st> May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1259] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-s> May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1261] manager: NetworkManager state is now CONNECTED_LOCAL May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1313] manager: NetworkManager state is now CONNECTED_SITE May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1315] policy: set 'eno1' (eno1) as default for IPv4 routing and DNS May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1316] policy: set 'eno1' (eno1) as default for IPv6 routing and DNS May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1385] device (eno1): Activation: successful, device activated. May 10 22:33:28 atomic NetworkManager[911]: <info> [1526009608.1400] manager: NetworkManager state is now CONNECTED_GLOBAL May 10 22:33:28 atomic nm-dispatcher[1806]: req:2 'up' [eno1]: new request (6 scripts) May 10 22:33:28 atomic nm-dispatcher[1806]: req:2 'up' [eno1]: start running ordered scripts... May 10 22:33:28 atomic nm-dispatcher[1806]: req:3 'connectivity-change': new request (6 scripts)
The issue persists in 4.16.8-300.fc28.
Compiling the ixgbe driver from source using the most current stable version [1] results in the NICs working as expected. [1] https://sourceforge.net/projects/e1000/files/ixgbe%20stable/5.3.7/
The issue persists in 4.17.0-200.fc28.
This issue has this upstream thread about the problem [1] and per this Archlinux forum post [2], setting CONFIG_INET_ESP_OFFLOAD=n and CONFIG_INET6_ESP_OFFLOAD=n fixes the problem. I have built a kernel with these unset and verified that these changes work. Georg Sauthoff give it a try with this kernel RPM [3] and see if it fixes your issues while upstream figures out the problem. 1: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20180528/thread.html#13004 2: https://bbs.archlinux.org/viewtopic.php?pid=1790848#p1790848 3: https://copr.fedorainfracloud.org/coprs/jdoss/kernel/build/767725/
It looks like this was fixed in 4.18-rc1 and later with these commits: > e433f3a5e272625c166d780f79ecc8fe456a5fc9 ixgbe: Use > CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM > > de7a7e34e27c029fbb3c4e764db045548629b834 ixgbe: Move ipsec init > function to before reset call > > e9f655ee97f14b4f5eba7b6b5a56a7c298573e67 ixgbe: Avoid loopback and fix > boolean logic in ipsec_stop_data > > 421d954c4f1e9afd55bc65398bfc64ceba38df21 ixgbe: Fix bit definitions > and add support for testing for ipsec support Any chance we can get these applied to 4.17.x?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those.
As expected, it's still reproducible with kernel-4.17.7-100.fc27.x86_64. See also Joe Doss' comment (Comment 6) for a possible route how to get this fixed with 4.17 kernels.
4.18.0-0.rc7.git1.1.fc29.x86_64 has been working well on my Intel Atom C3758 board. Unfortunately fresh Fedora 28 Server installs are dead in the water until you manually upgrade to the F29 4.18.x kernel.
*** Bug 1574153 has been marked as a duplicate of this bug. ***
Reproducible on 4.17.11-100.fc27.x86_64. Start working after being online for ~1 hour, after adapter Reset. # dmesg | grep ixgbe [ 2.951894] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k [ 2.951897] ixgbe: Copyright (c) 1999-2016 Intel Corporation. [ 3.343793] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0 [ 3.470491] ixgbe 0000:06:00.0: MAC: 6, PHY: 27, PBA No: 030000-000 [ 3.470494] ixgbe 0000:06:00.0: ac:1f:6b:45:e3:68 [ 3.514770] ixgbe 0000:06:00.0: Intel(R) 10 Gigabit Network Connection [ 3.893210] ixgbe 0000:06:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0 [ 4.020436] ixgbe 0000:06:00.1: MAC: 6, PHY: 27, PBA No: 030000-000 [ 4.020439] ixgbe 0000:06:00.1: ac:1f:6b:45:e3:69 [ 4.064631] ixgbe 0000:06:00.1: Intel(R) 10 Gigabit Network Connection [ 4.443039] ixgbe 0000:07:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0 [ 4.570460] ixgbe 0000:07:00.0: MAC: 6, PHY: 27, PBA No: 030000-000 [ 4.570463] ixgbe 0000:07:00.0: ac:1f:6b:45:e3:6a [ 4.614748] ixgbe 0000:07:00.0: Intel(R) 10 Gigabit Network Connection [ 4.996452] ixgbe 0000:07:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0 [ 5.123467] ixgbe 0000:07:00.1: MAC: 6, PHY: 27, PBA No: 030000-000 [ 5.123470] ixgbe 0000:07:00.1: ac:1f:6b:45:e3:6b [ 5.167752] ixgbe 0000:07:00.1: Intel(R) 10 Gigabit Network Connection [ 5.169524] ixgbe 0000:06:00.0 eno1: renamed from eth0 [ 5.181681] ixgbe 0000:07:00.1 eno4: renamed from eth3 [ 5.194117] ixgbe 0000:06:00.1 eno2: renamed from eth1 [ 5.204581] ixgbe 0000:07:00.0 eno3: renamed from eth2 [ 10.397232] ixgbe 0000:06:00.0: registered PHC device on eno1 [ 10.617219] ixgbe 0000:06:00.1: registered PHC device on eno2 [ 10.833491] ixgbe 0000:07:00.0: registered PHC device on eno3 [ 11.064703] ixgbe 0000:07:00.1: registered PHC device on eno4 [ 14.497787] ixgbe 0000:06:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: RX/TX [ 5457.922789] ixgbe 0000:06:00.0 eno1: Detected Tx Unit Hang [ 5457.944579] ixgbe 0000:06:00.0 eno1: tx hang 1 detected on queue 4, resetting adapter [ 5457.944587] ixgbe 0000:06:00.0 eno1: initiating reset due to tx timeout [ 5457.944687] ixgbe 0000:06:00.0 eno1: Reset adapter [ 5457.946663] ixgbe 0000:06:00.0 eno1: NIC Link is Down [ 5461.574214] ixgbe 0000:06:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: RX/TX
As expected, the 4.18.7-100.fc27.x86_64 kernel update (available from stable) fixes this issue for me.
This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.