Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1330893 - NetworkManager.service never reaches its 'startup complete' state IFF MTU=9000 (ixgbe driver)
NetworkManager.service never reaches its 'startup complete' state IFF MTU=900...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager (Show other bugs)
7.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Beniamino Galvani
Desktop QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-04-27 05:13 EDT by Karsten Weiss
Modified: 2016-11-03 15:09 EDT (History)
9 users (show)

See Also:
Fixed In Version: NetworkManager-1.4.0-0.1.git20160606.b769b4df.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 15:09:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
NetworkManager logs, nm-online strace, systemd-analyze plots (130.35 KB, application/x-bzip)
2016-04-27 05:13 EDT, Karsten Weiss
no flags Details
[PATCH] device: remove pending dhcp actions also in IP_DONE state (2.05 KB, patch)
2016-05-02 11:31 EDT, Beniamino Galvani
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2581 normal SHIPPED_LIVE Low: NetworkManager security, bug fix, and enhancement update 2016-11-03 08:08:07 EDT

  None (edit)
Description Karsten Weiss 2016-04-27 05:13:31 EDT
Created attachment 1151252 [details]
NetworkManager logs, nm-online strace, systemd-analyze plots

Description of problem:

NetworkManager.service never reaches its 'startup complete' state and
thus NetworkManager-wait-online.service (nm-online -s -timeout=30)
always times out and fails IFF I set MTU=9000 on ens1f0 (ixgbe driver).

Version-Release number of selected component (if applicable):

(Full disclosure: I see this on a CentOS 7.2 system. So feel free to ignore
this bug report.)

NetworkManager-1.0.6-29.el7_2.x86_64
kernel-3.10.0-327.13.1.el7.x86_64
initscripts-9.49.30-1.el7_2.2.x86_64

How reproducible:

Always, if I use "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0.

It works fine if I comment his line (=> default MTU=1500).

Steps to Reproduce:
1. Set "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0
2. Reboot
3. "systemctl --failed" will show NetworkManager-wait-online.service as a
failed service.

Actual results:

NetworkManager-wait-online.service times out after 30s and fails during
startup.

Reason: "nm-online -s --timeout=30" is not able to connect to NetworkManager
because NM doesn't reach "startup complete" state. ("-s" : --wait-for-startup)

Please notice that despite of this fact all the network devices are
actually configured correctly - including the MTU=9000 on ens1f0!

Expected results:

NetworkManager-wait-online.service (nm-online) finishes successfully after
a reasonable number of seconds (as it does with MTU=1500).

Additional info:

# grep 'complete' NetworkManager_MTU1500_info.txt 
Apr 26 16:20:34 smtcfc0157 NetworkManager[1331]: <info>  startup complete
# grep 'complete' NetworkManager_MTU9000_info.txt 
#

Setting MTU=9000 seems to trigger a DHCPv4 renewal:

$ grep ens1f0 NetworkManager-dispatcher_MTU9000.txt |cut -d: -f4-
 ------------ Action ID 0x7f352c0031e0 'up' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'up' for ens1f0
 Dispatch 'up' on ens1f0 complete
 ------------ Action ID 0x7f352c003170 'dhcp4-change' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'dhcp4-change' for ens1f0
 Dispatch 'dhcp4-change' on ens1f0 complete

Compare this to the MTU=1500 case:

$ grep ens1f0 NetworkManager-dispatcher_MTU1500.txt |cut -d: -f4-
 ------------ Action ID 0x7f344c0031e0 'up' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'up' for ens1f0
 Dispatch 'up' on ens1f0 complete

$ ethtool -i ens1f0
driver: ixgbe
version: 4.0.1-k-rh7.2
firmware-version: 0x80000868
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

# nmcli d s
DEVICE  TYPE        STATE        CONNECTION  
eno1    ethernet    connected    team1-slave 
eno2    ethernet    connected    team1-slave 
eno3    ethernet    connected    team1-slave 
eno4    ethernet    connected    team1-slave 
ens1f0  ethernet    connected    private     
team1   team        connected    team1       
ens1f1  ethernet    unavailable  --          
ens5f0  ethernet    unavailable  --          
ens5f1  ethernet    unavailable  --          
ib0     infiniband  unmanaged    --          
ib1     infiniband  unmanaged    --          
lo      loopback    unmanaged    --          

# nmcli c s
NAME           UUID                                  TYPE            DEVICE 
System ens1f1  3ba7a201-5d77-d373-4bef-c46ac05ad53e  802-3-ethernet  --     
System ens5f1  d6a47d27-79f0-63fc-251b-e991514a87a6  802-3-ethernet  --     
team1-slave    24871ea9-4411-efbd-924f-49cd9fbda6e2  802-3-ethernet  eno3   
team1-slave    abf4c85b-57cc-4484-4fa9-b4a71689c359  802-3-ethernet  eno1   
team1          4293abb7-d898-84ff-dae6-bffba04cbee9  team            team1  
System ens5f0  c7ca5207-4897-488b-a379-6ba658e133cf  802-3-ethernet  --     
private        0720bdf0-87bd-7885-f805-bbeef9d40ecb  802-3-ethernet  ens1f0 
team1-slave    b186f945-cc80-911d-668c-b51be8596980  802-3-ethernet  eno2   
team1-slave    8e777a66-a032-83ef-59c9-77e69b94ede4  802-3-ethernet  eno4 

Increasing the NetworkManager-wait-online.service timeout does not help
as it will still time out. The boot process will just take more time.

Disabling NetworkManager-wait-online.service does not help as it is pulled
in by network.service anyway.

Setting MTU=9000 via DHCP doesn't help either.

I've attached a some log files (info and debug) from NetworkManager
and NetworkManager-dispatcher, a strace from NetworkManager-wait-online's
nm-online and a systemd-analyze plot.
Comment 2 Beniamino Galvani 2016-05-02 11:31 EDT
Created attachment 1152996 [details]
[PATCH] device: remove pending dhcp actions also in IP_DONE state

Untested fix.
Comment 3 Francesco Giudici 2016-05-10 09:31:14 EDT
LGTM
Comment 6 Vladimir Benes 2016-09-26 11:06:39 EDT
[root@hp-z240-01.ml2 ~]# systemctl --failed --all
  UNIT                               LOAD   ACTIVE SUB    DESCRIPTION
● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.


in 1.0.6 and no fail in 1.4.0-11

was verified with:
01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
Comment 8 errata-xmlrpc 2016-11-03 15:09:12 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html

Note You need to log in before you can comment on or make changes to this bug.