1787219 – [RHEL8.2] NetworkManager failed to bring up eth0 while launching new image in aws/Azure

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1787219 - [RHEL8.2] NetworkManager failed to bring up eth0 while launching new image in aws/Azure

Summary: [RHEL8.2] NetworkManager failed to bring up eth0 while launching new image in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	8.2
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.2
Assignee:	Beniamino Galvani
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-01 15:53 UTC by Frank Liang
Modified:	2022-05-02 04:59 UTC (History)
CC List:	18 users (show)
Fixed In Version:	NetworkManager-1.22.4-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-28 16:54:11 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
messages (856.99 KB, text/plain) 2020-01-01 15:53 UTC, Frank Liang	no flags	Details
Trace log of NetworkManager (175.69 KB, text/plain) 2020-01-02 09:30 UTC, Yuxin Sun	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-31426	0	None	None	None	2022-05-02 04:59:57 UTC
Red Hat Product Errata	RHBA-2020:1847	0	None	None	None	2020-04-28 16:54:38 UTC

Description Frank Liang 2020-01-01 15:53:40 UTC

Created attachment 1649077 [details]
messages

In aws ec2 environment, update to RHEL-8.2.0-20200101.n.0 from RHEL-8.2.0-20191230.n.0, there are only NetworkManager and rhui pkgs updated.
Create a private AMI for RHEL-8.2.0-20200101.n.0 and the system failed to bring eth0 up when launching new instance from it.

Below is snippet from "/var/log/messages"(full log is attached):
<snip>
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5481] manager: NetworkManager state is now CONNECTING
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5483] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5489] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5492] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5501] dhcp4 (eth0): state changed unknown -> fail
Jan  1 14:45:17 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889917.5501] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
</snip>

Below is updating log:
Upgrading:
 NetworkManager                               x86_64                        1:1.22.2-1.el8                           repo0                                              2.3 M
 NetworkManager-libnm                         x86_64                        1:1.22.2-1.el8                           repo0                                              1.7 M
 NetworkManager-team                          x86_64                        1:1.22.2-1.el8                           repo0                                              134 k
 NetworkManager-tui                           x86_64                        1:1.22.2-1.el8                           repo0                                              305 k
 rh-amazon-rhui-client                        noarch                        3.0.23-1.el8                             rhui-client-config-server-8                         35 k

[root@ip-10-116-2-235 network-scripts]# cat ifcfg-ens3
# Generated by dracut initrd
NAME="ens3"
DEVICE="ens3"
ONBOOT="yes"
NETBOOT="yes"
UUID="522d46ea-91a5-4761-97f7-754afc9dda66"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
[root@ip-10-116-2-235 network-scripts]# cat ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=0a:71:8b:7a:0f:4e
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
[root@ip-10-116-2-235 network-scripts]# cat ifcfg-ens3
# Generated by dracut initrd
NAME="ens3"
DEVICE="ens3"
ONBOOT="yes"
NETBOOT="yes"
UUID="522d46ea-91a5-4761-97f7-754afc9dda66"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"

Version-Release number of selected component (if applicable):

RHEL Version:
RHEL8.2(4.18.0-147.el8.x86_64)

How reproducible:
100%

Steps to Reproduce:
1. Start a RHEL8.2 and update it to the latest nightly build
2. Create AMI from it and try to launch it

Actual results:
system is not accessible
Expected results:
should be accessible

Additional info:
- No such issue in RHEL-8.2.0-20191230.n.0.

Comment 1 Frank Liang 2020-01-01 15:56:19 UTC

Correct kernel version.
RHEL Version:
RHEL8.2(4.18.0-167.el8.x86_64)

Comment 3 Thomas Haller 2020-01-02 09:06:46 UTC

Interestingly, at

> Jan  1 14:34:47 ip-10-116-2-40 NetworkManager[1418]: <info>  [1577889287.1216] device (eth0): state change: ip-config -> ip-check

it works, but later

> Jan  1 14:40:16 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889616.7393] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')

it doesn't. (both times the same NM version 1.22.2-1.el8).



Can you provide full level=TRACE logfile that shows the issue?

See https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 for hints about logging.



Or maybe better, can you provide access to such a machine? (E.g. via email: my GPG key https://tynq.net/49EA7C670E0850E7419514F629C2366E4DFC5728.gpg and my SSH public key https://tynq.net/id_rsa.pub).



A workaround is probably to switch the DHCP plugin. It would be interesting to know whether the issue is avoided by

  [main]
  dhcp=systemd

in /etc/NetworkManager/conf.d/99-my-workaround.conf





Thank you.

Comment 4 Yuxin Sun 2020-01-02 09:30:11 UTC

Created attachment 1649160 [details]
Trace log of NetworkManager

Hi Thomas,

I reproduce this issue in Azure and I tried with your steps in comment 3. 
1. Please check the attachment about the trace log.
2. The workaround worked well. eth0 can get ip address.
a). # cat /etc/NetworkManager/conf.d/99-my-workaround.conf
[main]
dhcp=systemd
b). # systemctl restart NetworkManager

Thanks!

Comment 6 Frank Liang 2020-01-02 10:31:23 UTC

(In reply to Thomas Haller from comment #3)
Hi Thomas,

Please checked yuxin's attached log which is dumped from Azure platform.

As there is no interactive console in aws, I mount the root volume to another host and copy the log out. The log show that eth0 is up with same NM version.
I think it is because they comes from different systems, below is a brief introduction:

We update system1 firstly, and it works well with the latest NM installed, the eth0 is up.
Then we create a private image from system1, the history messages was saved to this image too(eg. eth0 is up).
We start another system2 from this image, and the system is not accessible, so the failed log was appended to messages(eth0 failed).

> Interestingly, at
> 
> > Jan  1 14:34:47 ip-10-116-2-40 NetworkManager[1418]: <info>  [1577889287.1216] device (eth0): state change: ip-config -> ip-check
> 
> it works, but later
> 
> > Jan  1 14:40:16 ip-10-116-2-134 NetworkManager[1021]: <info>  [1577889616.7393] device (eth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
> 
> it doesn't. (both times the same NM version 1.22.2-1.el8).
> 
>

Comment 10 Vladimir Benes 2020-01-07 12:12:06 UTC

We do see failures in various NM tests. It seems to be related:
ipv4_dhcp_client_id_set 
ipv4_dhcp_client_id_set_internal 
nmcli_reconnect_openvswitch_vlan_configuration 
NM_reboot_openvswitch_vlan_configuration

Comment 11 Beniamino Galvani 2020-01-08 16:47:57 UTC

 <debug> [1577956756.7399] dhcp4 (eth0): received NACK from 168.63.129.16
 <trace> [1577956756.7400] dhcp4 (eth0): client event 3
 <info>  [1577956756.7400] dhcp4 (eth0): state changed unknown -> fail

This looks similar to https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/325 and should be fixed by the upstream merge request indicated in the comments. The fix will be in the next package build of NM.

Comment 15 errata-xmlrpc 2020-04-28 16:54:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1847

Note You need to log in before you can comment on or make changes to this bug.