RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1330893 - NetworkManager.service never reaches its 'startup complete' state IFF MTU=9000 (ixgbe driver)
Summary: NetworkManager.service never reaches its 'startup complete' state IFF MTU=900...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-27 09:13 UTC by Karsten Weiss
Modified: 2016-11-03 19:09 UTC (History)
9 users (show)

Fixed In Version: NetworkManager-1.4.0-0.1.git20160606.b769b4df.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 19:09:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
NetworkManager logs, nm-online strace, systemd-analyze plots (130.35 KB, application/x-bzip)
2016-04-27 09:13 UTC, Karsten Weiss
no flags Details
[PATCH] device: remove pending dhcp actions also in IP_DONE state (2.05 KB, patch)
2016-05-02 15:31 UTC, Beniamino Galvani
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2581 0 normal SHIPPED_LIVE Low: NetworkManager security, bug fix, and enhancement update 2016-11-03 12:08:07 UTC

Description Karsten Weiss 2016-04-27 09:13:31 UTC
Created attachment 1151252 [details]
NetworkManager logs, nm-online strace, systemd-analyze plots

Description of problem:

NetworkManager.service never reaches its 'startup complete' state and
thus NetworkManager-wait-online.service (nm-online -s -timeout=30)
always times out and fails IFF I set MTU=9000 on ens1f0 (ixgbe driver).

Version-Release number of selected component (if applicable):

(Full disclosure: I see this on a CentOS 7.2 system. So feel free to ignore
this bug report.)

NetworkManager-1.0.6-29.el7_2.x86_64
kernel-3.10.0-327.13.1.el7.x86_64
initscripts-9.49.30-1.el7_2.2.x86_64

How reproducible:

Always, if I use "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0.

It works fine if I comment his line (=> default MTU=1500).

Steps to Reproduce:
1. Set "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0
2. Reboot
3. "systemctl --failed" will show NetworkManager-wait-online.service as a
failed service.

Actual results:

NetworkManager-wait-online.service times out after 30s and fails during
startup.

Reason: "nm-online -s --timeout=30" is not able to connect to NetworkManager
because NM doesn't reach "startup complete" state. ("-s" : --wait-for-startup)

Please notice that despite of this fact all the network devices are
actually configured correctly - including the MTU=9000 on ens1f0!

Expected results:

NetworkManager-wait-online.service (nm-online) finishes successfully after
a reasonable number of seconds (as it does with MTU=1500).

Additional info:

# grep 'complete' NetworkManager_MTU1500_info.txt 
Apr 26 16:20:34 smtcfc0157 NetworkManager[1331]: <info>  startup complete
# grep 'complete' NetworkManager_MTU9000_info.txt 
#

Setting MTU=9000 seems to trigger a DHCPv4 renewal:

$ grep ens1f0 NetworkManager-dispatcher_MTU9000.txt |cut -d: -f4-
 ------------ Action ID 0x7f352c0031e0 'up' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'up' for ens1f0
 Dispatch 'up' on ens1f0 complete
 ------------ Action ID 0x7f352c003170 'dhcp4-change' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'dhcp4-change' for ens1f0
 Dispatch 'dhcp4-change' on ens1f0 complete

Compare this to the MTU=1500 case:

$ grep ens1f0 NetworkManager-dispatcher_MTU1500.txt |cut -d: -f4-
 ------------ Action ID 0x7f344c0031e0 'up' Interface ens1f0 Environment ------------
   DEVICE_IP_IFACE=ens1f0
   DEVICE_IFACE=ens1f0
   CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0
 Dispatching action 'up' for ens1f0
 Dispatch 'up' on ens1f0 complete

$ ethtool -i ens1f0
driver: ixgbe
version: 4.0.1-k-rh7.2
firmware-version: 0x80000868
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

# nmcli d s
DEVICE  TYPE        STATE        CONNECTION  
eno1    ethernet    connected    team1-slave 
eno2    ethernet    connected    team1-slave 
eno3    ethernet    connected    team1-slave 
eno4    ethernet    connected    team1-slave 
ens1f0  ethernet    connected    private     
team1   team        connected    team1       
ens1f1  ethernet    unavailable  --          
ens5f0  ethernet    unavailable  --          
ens5f1  ethernet    unavailable  --          
ib0     infiniband  unmanaged    --          
ib1     infiniband  unmanaged    --          
lo      loopback    unmanaged    --          

# nmcli c s
NAME           UUID                                  TYPE            DEVICE 
System ens1f1  3ba7a201-5d77-d373-4bef-c46ac05ad53e  802-3-ethernet  --     
System ens5f1  d6a47d27-79f0-63fc-251b-e991514a87a6  802-3-ethernet  --     
team1-slave    24871ea9-4411-efbd-924f-49cd9fbda6e2  802-3-ethernet  eno3   
team1-slave    abf4c85b-57cc-4484-4fa9-b4a71689c359  802-3-ethernet  eno1   
team1          4293abb7-d898-84ff-dae6-bffba04cbee9  team            team1  
System ens5f0  c7ca5207-4897-488b-a379-6ba658e133cf  802-3-ethernet  --     
private        0720bdf0-87bd-7885-f805-bbeef9d40ecb  802-3-ethernet  ens1f0 
team1-slave    b186f945-cc80-911d-668c-b51be8596980  802-3-ethernet  eno2   
team1-slave    8e777a66-a032-83ef-59c9-77e69b94ede4  802-3-ethernet  eno4 

Increasing the NetworkManager-wait-online.service timeout does not help
as it will still time out. The boot process will just take more time.

Disabling NetworkManager-wait-online.service does not help as it is pulled
in by network.service anyway.

Setting MTU=9000 via DHCP doesn't help either.

I've attached a some log files (info and debug) from NetworkManager
and NetworkManager-dispatcher, a strace from NetworkManager-wait-online's
nm-online and a systemd-analyze plot.

Comment 2 Beniamino Galvani 2016-05-02 15:31:33 UTC
Created attachment 1152996 [details]
[PATCH] device: remove pending dhcp actions also in IP_DONE state

Untested fix.

Comment 3 Francesco Giudici 2016-05-10 13:31:14 UTC
LGTM

Comment 6 Vladimir Benes 2016-09-26 15:06:39 UTC
[root ~]# systemctl --failed --all
  UNIT                               LOAD   ACTIVE SUB    DESCRIPTION
● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.


in 1.0.6 and no fail in 1.4.0-11

was verified with:
01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

Comment 8 errata-xmlrpc 2016-11-03 19:09:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html


Note You need to log in before you can comment on or make changes to this bug.