Bug 1912236

Summary:

[Azure][RHEL-8.5] nm-cloud-setup.service makes network connectivity loss

Product:

Red Hat Enterprise Linux 8

Reporter:

Yuxin Sun <yuxisun>

Component:

NetworkManager

Assignee:

Thomas Haller <thaller>

Status:

CLOSED ERRATA

QA Contact:

Yuxin Sun <yuxisun>

Severity:

high

Docs Contact:

Marc Muehlfeld <mmuehlfe>

Priority:

high

Version:

8.4

CC:

acardace, atragler, bgalvani, fge, hhei, huzhao, jmaxwell, lrintel, pasik, rkhan, sdubewar, sukulkar, thaller, till, xialiu, xuli, yacao

Target Milestone:

Keywords:

Triaged, ZStream

Target Release:

---

Flags:

pm-rhel: mirror+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

NetworkManager-1.32.0-0.2.el8

Doc Type:

Bug Fix

Doc Text:

.`nm-cloud-setup` utility now sets the correct default route on Microsoft Azure Previously, on Microsoft Azure, the `nm-cloud-setup` utility failed to detect the correct gateway of the cloud environment. As a consequence, the utility set an incorrect default route, and connectivity failed. This update fixes the problem. As a result, `nm-cloud-setup` utility now sets the correct default route on Microsoft Azure.

Story Points:

---

Clone Of:

Clones:

2013208 2014510 (view as bug list)

Environment:

Last Closed:

2021-11-09 19:28:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1791212, 1935910, 2013208, 2014510

Attachments:

Description	Flags
trace log for comment 1	none

Description Yuxin Sun 2021-01-04 08:26:08 UTC

Description of problem:
On Azure, if start nm-cloud-setup.service, the network connection will be broken. Cannot ssh to the VM, and inside the VM cannot get content from Internet.

Version-Release number of selected components (if applicable):
NetworkManager-cloud-setup-1.30.0-0.5.el8.x86_64

# rpm -qa|grep NetworkManager
NetworkManager-libnm-1.30.0-0.5.el8.x86_64
NetworkManager-cloud-setup-1.30.0-0.5.el8.x86_64
NetworkManager-tui-1.30.0-0.5.el8.x86_64
NetworkManager-1.30.0-0.5.el8.x86_64
NetworkManager-team-1.30.0-0.5.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM on Azure and install NetworkManager-cloud-setup
2. Modify /usr/lib/systemd/system/nm-cloud-setup.service, uncomment "Environment=NM_CLOUD_SETUP_AZURE=yes" line
3. systemctl start nm-cloud-setup.service

Actual results:
All the inbound/outbound network connections are broken.

Expected results:
Should not break the network connection

Additional info:

Comment 1 Thomas Haller 2021-01-06 11:08:36 UTC

Hi.

Could you please provide debug logs?


- in NetworkManager, set level=TRACE and restart. See https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 for details about logging.

- in nm-cloud-setup.service, set `Environment=NM_CLOUD_SETUP_LOG=TRACE` (and reload the unit file with `systemctl daemon-reload`)


Please reproduce the issue. Afterwards (in the broken state), please also show `ip route`, `ip addr`, `ip rule` and `nmcli connection` and `nmcli device` output.


Thank you.

Comment 2 Yuxin Sun 2021-01-06 12:29:37 UTC

Hi Thomas,

Here're the outputs in the broken state:

[root@wala84k26812290321-vm1 ~]# ip route
default via 10.0.0.1 dev eth0 proto dhcp metric 100
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.4 metric 100
168.63.129.16 via 10.0.0.1 dev eth0 proto dhcp metric 100
169.254.169.254 via 10.0.0.1 dev eth0 proto dhcp metric 100
[root@wala84k26812290321-vm1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:22:48:1c:60:0d brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.4/24 brd 10.0.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
[root@wala84k26812290321-vm1 ~]# ip rule
0:      from all lookup local
30400:  from 10.0.0.4 lookup 30400
32766:  from all lookup main
32767:  from all lookup default
[root@wala84k26812290321-vm1 ~]# nmcli connection
NAME         UUID                                  TYPE      DEVICE
System eth0  5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  ethernet  eth0
ens3         17dbeb76-4897-49fc-bcb5-4ed19e3bb7a5  ethernet  --
[root@wala84k26812290321-vm1 ~]# nmcli device
DEVICE  TYPE      STATE      CONNECTION
eth0    ethernet  connected  System eth0
lo      loopback  unmanaged  --

Comment 3 Yuxin Sun 2021-01-06 12:46:18 UTC

Created attachment 1744878 [details]
trace log for comment 1

Comment 4 Yuxin Sun 2021-01-06 12:55:55 UTC

I created another VM(vm2, 10.0.0.5) in the same subnet with this test VM(vm1, 10.0.0.4). In the broken state, it still work if ping from vm1 to vm2, but cannot ping from vm2 to vm1.

Comment 5 Thomas Haller 2021-01-11 08:32:16 UTC

from looking at the log, it seems nm-cloud-setup did what was implemented -- but apparently not what would be correct.

Is it possible to provide access to such a machine so we can better understand what should be done on Azure?

Comment 6 Yuxin Sun 2021-01-12 03:26:32 UTC

(In reply to Thomas Haller from comment #5)
> from looking at the log, it seems nm-cloud-setup did what was implemented --
> but apparently not what would be correct.
> 
> Is it possible to provide access to such a machine so we can better
> understand what should be done on Azure?

Hi Thomas,

I've sent you an email about how to access to the Azure VM serial console. Thanks!

Comment 7 Thomas Haller 2021-01-22 14:54:21 UTC

Thank you Yuxin Sun, for all the testing and the help.

Unfortunately, for 8.4 it doesn't seem that we have the capacity to fix this.Moving to 8.5.

This of course makes nm-cloud-setup probably unusable on Azure for now, and must be fixed...

Comment 12 Thomas Haller 2021-04-20 14:39:54 UTC

WIP: gihttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/821

Comment 13 Thomas Haller 2021-04-20 15:55:08 UTC

fixed on master.

Comment 29 Marc Muehlfeld 2021-07-29 11:49:29 UTC

Is this still a known issue? The ticket status is VERIFIED, which sounds like the bug has been fixed.

Comment 30 Thomas Haller 2021-07-29 12:49:01 UTC

the bug state is correct.

The issue will be fixed in rhel-8.5. It is thus -- with rhel-8.5 -- no longer a (known) issue.

It was a known issue in rhel-8.4.


OK?

Comment 31 Marc Muehlfeld 2021-07-29 13:06:41 UTC

I changed the Doc Type to "Bug Fix".

Thomas, can you please provide me the following information so that I can write the bug fix release note:
- Cause
- Consequence
- Fix
- Result

Thanks.

Comment 33 Thomas Haller 2021-10-04 11:25:46 UTC

The release notes lgtm. Thanks Marc!!

Comment 38 errata-xmlrpc 2021-11-09 19:28:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361