Bug 1330893
Summary: | NetworkManager.service never reaches its 'startup complete' state IFF MTU=9000 (ixgbe driver) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Karsten Weiss <knweiss> | ||||||
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.2 | CC: | aloughla, atragler, bgalvani, fgiudici, lrintel, mjtrangoni, rkhan, thaller, vbenes | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | NetworkManager-1.4.0-0.1.git20160606.b769b4df.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-11-03 19:09:12 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 1152996 [details]
[PATCH] device: remove pending dhcp actions also in IP_DONE state
Untested fix.
LGTM Applied to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0b66eb298e1d6ac1ef516635e7055a6c819b4e09 nm-1-2: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=21ca2cf0f68c79516a885928c33268bb3f12b47e nm-1-0: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-0&id=46ef667de8405f952fc641ad968e1c1dd3c245d5 [root ~]# systemctl --failed --all UNIT LOAD ACTIVE SUB DESCRIPTION ● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed. To show all installed unit files use 'systemctl list-unit-files'. in 1.0.6 and no fail in 1.4.0-11 was verified with: 01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html |
Created attachment 1151252 [details] NetworkManager logs, nm-online strace, systemd-analyze plots Description of problem: NetworkManager.service never reaches its 'startup complete' state and thus NetworkManager-wait-online.service (nm-online -s -timeout=30) always times out and fails IFF I set MTU=9000 on ens1f0 (ixgbe driver). Version-Release number of selected component (if applicable): (Full disclosure: I see this on a CentOS 7.2 system. So feel free to ignore this bug report.) NetworkManager-1.0.6-29.el7_2.x86_64 kernel-3.10.0-327.13.1.el7.x86_64 initscripts-9.49.30-1.el7_2.2.x86_64 How reproducible: Always, if I use "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0. It works fine if I comment his line (=> default MTU=1500). Steps to Reproduce: 1. Set "MTU=9000" in /etc/sysconfig/network-scripts/ifcfg-ens1f0 2. Reboot 3. "systemctl --failed" will show NetworkManager-wait-online.service as a failed service. Actual results: NetworkManager-wait-online.service times out after 30s and fails during startup. Reason: "nm-online -s --timeout=30" is not able to connect to NetworkManager because NM doesn't reach "startup complete" state. ("-s" : --wait-for-startup) Please notice that despite of this fact all the network devices are actually configured correctly - including the MTU=9000 on ens1f0! Expected results: NetworkManager-wait-online.service (nm-online) finishes successfully after a reasonable number of seconds (as it does with MTU=1500). Additional info: # grep 'complete' NetworkManager_MTU1500_info.txt Apr 26 16:20:34 smtcfc0157 NetworkManager[1331]: <info> startup complete # grep 'complete' NetworkManager_MTU9000_info.txt # Setting MTU=9000 seems to trigger a DHCPv4 renewal: $ grep ens1f0 NetworkManager-dispatcher_MTU9000.txt |cut -d: -f4- ------------ Action ID 0x7f352c0031e0 'up' Interface ens1f0 Environment ------------ DEVICE_IP_IFACE=ens1f0 DEVICE_IFACE=ens1f0 CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0 Dispatching action 'up' for ens1f0 Dispatch 'up' on ens1f0 complete ------------ Action ID 0x7f352c003170 'dhcp4-change' Interface ens1f0 Environment ------------ DEVICE_IP_IFACE=ens1f0 DEVICE_IFACE=ens1f0 CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0 Dispatching action 'dhcp4-change' for ens1f0 Dispatch 'dhcp4-change' on ens1f0 complete Compare this to the MTU=1500 case: $ grep ens1f0 NetworkManager-dispatcher_MTU1500.txt |cut -d: -f4- ------------ Action ID 0x7f344c0031e0 'up' Interface ens1f0 Environment ------------ DEVICE_IP_IFACE=ens1f0 DEVICE_IFACE=ens1f0 CONNECTION_FILENAME=/etc/sysconfig/network-scripts/ifcfg-ens1f0 Dispatching action 'up' for ens1f0 Dispatch 'up' on ens1f0 complete $ ethtool -i ens1f0 driver: ixgbe version: 4.0.1-k-rh7.2 firmware-version: 0x80000868 bus-info: 0000:06:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no # nmcli d s DEVICE TYPE STATE CONNECTION eno1 ethernet connected team1-slave eno2 ethernet connected team1-slave eno3 ethernet connected team1-slave eno4 ethernet connected team1-slave ens1f0 ethernet connected private team1 team connected team1 ens1f1 ethernet unavailable -- ens5f0 ethernet unavailable -- ens5f1 ethernet unavailable -- ib0 infiniband unmanaged -- ib1 infiniband unmanaged -- lo loopback unmanaged -- # nmcli c s NAME UUID TYPE DEVICE System ens1f1 3ba7a201-5d77-d373-4bef-c46ac05ad53e 802-3-ethernet -- System ens5f1 d6a47d27-79f0-63fc-251b-e991514a87a6 802-3-ethernet -- team1-slave 24871ea9-4411-efbd-924f-49cd9fbda6e2 802-3-ethernet eno3 team1-slave abf4c85b-57cc-4484-4fa9-b4a71689c359 802-3-ethernet eno1 team1 4293abb7-d898-84ff-dae6-bffba04cbee9 team team1 System ens5f0 c7ca5207-4897-488b-a379-6ba658e133cf 802-3-ethernet -- private 0720bdf0-87bd-7885-f805-bbeef9d40ecb 802-3-ethernet ens1f0 team1-slave b186f945-cc80-911d-668c-b51be8596980 802-3-ethernet eno2 team1-slave 8e777a66-a032-83ef-59c9-77e69b94ede4 802-3-ethernet eno4 Increasing the NetworkManager-wait-online.service timeout does not help as it will still time out. The boot process will just take more time. Disabling NetworkManager-wait-online.service does not help as it is pulled in by network.service anyway. Setting MTU=9000 via DHCP doesn't help either. I've attached a some log files (info and debug) from NetworkManager and NetworkManager-dispatcher, a strace from NetworkManager-wait-online's nm-online and a systemd-analyze plot.