Bug 1787219 - [RHEL8.2] NetworkManager failed to bring up eth0 while launching new image in aws/Azure
Summary: [RHEL8.2] NetworkManager failed to bring up eth0 while launching new image in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.2
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: 8.2
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-01 15:53 UTC by Xiao, Liang
Modified: 2020-04-28 16:54 UTC (History)
18 users (show)

Fixed In Version: NetworkManager-1.22.4-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:54:11 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
messages (856.99 KB, text/plain)
2020-01-01 15:53 UTC, Xiao, Liang
no flags Details
Trace log of NetworkManager (175.69 KB, text/plain)
2020-01-02 09:30 UTC, Yuxin Sun
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1847 0 None None None 2020-04-28 16:54:38 UTC

Description Xiao, Liang 2020-01-01 15:53:40 UTC
Created attachment 1649077 [details]
messages

In aws ec2 environment, update to RHEL-8.2.0-20200101.n.0 from RHEL-8.2.0-20191230.n.0, there are only NetworkManager and rhui pkgs updated.
Create a private AMI for RHEL-8.2.0-20200101.n.0 and the system failed to bring eth0 up when launching new instance from it.

Below is snippet from "/var/log/messages"(full log is attached):
<snip>
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5481] manager: NetworkManager state is now CONNECTING
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5483] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5489] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5492] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5501] dhcp4 (eth0): state changed unknown -> fail
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5501] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
</snip>

Below is updating log:
Upgrading:
 NetworkManager                               x86_64                        1:1.22.2-1.el8                           repo0                                              2.3 M
 NetworkManager-libnm                         x86_64                        1:1.22.2-1.el8                           repo0                                              1.7 M
 NetworkManager-team                          x86_64                        1:1.22.2-1.el8                           repo0                                              134 k
 NetworkManager-tui                           x86_64                        1:1.22.2-1.el8                           repo0                                              305 k
 rh-amazon-rhui-client                        noarch                        3.0.23-1.el8                             rhui-client-config-server-8                         35 k

[root@ip-10-116-2-235 network-scripts]# cat ifcfg-ens3
# Generated by dracut initrd
NAME="ens3"
DEVICE="ens3"
ONBOOT="yes"
NETBOOT="yes"
UUID="522d46ea-91a5-4761-97f7-754afc9dda66"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
[root@ip-10-116-2-235 network-scripts]# cat ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=0a:71:8b:7a:0f:4e
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
[root@ip-10-116-2-235 network-scripts]# cat ifcfg-ens3
# Generated by dracut initrd
NAME="ens3"
DEVICE="ens3"
ONBOOT="yes"
NETBOOT="yes"
UUID="522d46ea-91a5-4761-97f7-754afc9dda66"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"

Version-Release number of selected component (if applicable):

RHEL Version:
RHEL8.2(4.18.0-147.el8.x86_64)

How reproducible:
100%

Steps to Reproduce:
1. Start a RHEL8.2 and update it to the latest nightly build
2. Create AMI from it and try to launch it

Actual results:
system is not accessible
Expected results:
should be accessible

Additional info:
- No such issue in RHEL-8.2.0-20191230.n.0.

Comment 1 Xiao, Liang 2020-01-01 15:56:19 UTC
Correct kernel version.
RHEL Version:
RHEL8.2(4.18.0-167.el8.x86_64)

Comment 3 Thomas Haller 2020-01-02 09:06:46 UTC
Interestingly, at

> Jan  1 14:34:47 ip-10-116-2-40 NetworkManager[1418]: <info>  [1577889287.1216] device (eth0): state change: ip-config -> ip-check

it works, but later

> Jan  1 14:40:16 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889616.7393] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')

it doesn't. (both times the same NM version 1.22.2-1.el8).



Can you provide full level=TRACE logfile that shows the issue?

See https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 for hints about logging.



Or maybe better, can you provide access to such a machine? (E.g. via email: my GPG key https://tynq.net/49EA7C670E0850E7419514F629C2366E4DFC5728.gpg and my SSH public key https://tynq.net/id_rsa.pub).



A workaround is probably to switch the DHCP plugin. It would be interesting to know whether the issue is avoided by

  [main]
  dhcp=systemd

in /etc/NetworkManager/conf.d/99-my-workaround.conf





Thank you.

Comment 4 Yuxin Sun 2020-01-02 09:30:11 UTC
Created attachment 1649160 [details]
Trace log of NetworkManager

Hi Thomas,

I reproduce this issue in Azure and I tried with your steps in comment 3. 
1. Please check the attachment about the trace log.
2. The workaround worked well. eth0 can get ip address.
a). # cat /etc/NetworkManager/conf.d/99-my-workaround.conf
[main]
dhcp=systemd
b). # systemctl restart NetworkManager

Thanks!

Comment 6 Xiao, Liang 2020-01-02 10:31:23 UTC
(In reply to Thomas Haller from comment #3)
Hi Thomas,

Please checked yuxin's attached log which is dumped from Azure platform.

As there is no interactive console in aws, I mount the root volume to another host and copy the log out. The log show that eth0 is up with same NM version.
I think it is because they comes from different systems, below is a brief introduction:

We update system1 firstly, and it works well with the latest NM installed, the eth0 is up.
Then we create a private image from system1, the history messages was saved to this image too(eg. eth0 is up).
We start another system2 from this image, and the system is not accessible, so the failed log was appended to messages(eth0 failed).

> Interestingly, at
> 
> > Jan  1 14:34:47 ip-10-116-2-40 NetworkManager[1418]: <info>  [1577889287.1216] device (eth0): state change: ip-config -> ip-check
> 
> it works, but later
> 
> > Jan  1 14:40:16 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889616.7393] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
> 
> it doesn't. (both times the same NM version 1.22.2-1.el8).
> 
>

Comment 10 Vladimir Benes 2020-01-07 12:12:06 UTC
We do see failures in various NM tests. It seems to be related:
ipv4_dhcp_client_id_set 
ipv4_dhcp_client_id_set_internal 
nmcli_reconnect_openvswitch_vlan_configuration 
NM_reboot_openvswitch_vlan_configuration

Comment 11 Beniamino Galvani 2020-01-08 16:47:57 UTC
 <debug> [1577956756.7399] dhcp4 (eth0): received NACK from 168.63.129.16
 <trace> [1577956756.7400] dhcp4 (eth0): client event 3
 <info>  [1577956756.7400] dhcp4 (eth0): state changed unknown -> fail

This looks similar to https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/325 and should be fixed by the upstream merge request indicated in the comments. The fix will be in the next package build of NM.

Comment 15 errata-xmlrpc 2020-04-28 16:54:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1847


Note You need to log in before you can comment on or make changes to this bug.