Bug 1979192 - NetworkManager configures wrong, spurious "local" route for IP address after DHCP address change
Summary: NetworkManager configures wrong, spurious "local" route for IP address after ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: beta
: ---
Assignee: Thomas Haller
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-05 08:16 UTC by yangfei
Modified: 2021-11-10 07:02 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-1.32.6-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 19:30:32 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:4361 0 None None None 2021-11-09 19:31:12 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 935 0 None opened [th/external-routes-no-sync] treat external routes specially and ignore them during sync 2021-07-20 12:06:47 UTC

Description yangfei 2021-07-05 08:16:09 UTC
Description of problem:

IP changed from dhcp server after unplug/plug cable 10 times and the previous ip still could connected. on os side just one ip showed by "ip addr" command.

Version-Release number of selected component (if applicable):

kernel-4.18.0-305.el8.x86_64
NetworkManager-1.30.0-7.el8.x86_64

How reproducible:

[root@dell-per640-test ~]# uname -r
4.18.0-305.el8.x86_64
[root@dell-per640-test ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.4 (Ootpa)

[root@dell-per640-test ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens2f0 
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=ens2f0
DEVICE=ens2f0
ONBOOT=yes


[root@dell-per640-test ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-07-01 23:49:54 EDT; 8min ago
     Docs: man:NetworkManager(8)
 Main PID: 1874 (NetworkManager)
    Tasks: 3 (limit: 820752)
   Memory: 13.1M
   CGroup: /system.slice/NetworkManager.service
           └─1874 /usr/sbin/NetworkManager --no-daemon


[root@dell-per640-test ~]# ethtool -i ens2f0
driver: mlx5_core
version: 5.0-0
firmware-version: 14.28.1300 (MT_2420110004)
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Steps to Reproduce:
1. At first IP got from dhcp server was 10.73.179.137 , and then ping it from a client.
2. After 10 times unplug/plug cable, ip changed to 10.73.179.138.
ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:1d:2d:c0:85:a2 brd ff:ff:ff:ff:ff:ff
    inet 10.73.179.138/23 brd 10.73.179.255 scope global dynamic noprefixroute ens2f0
       valid_lft 42841sec preferred_lft 42841sec

Actual results:

ping 10.73.179.137 and 10.73.179.138 both okay, and ssh from 10.73.179.137 and 10.73.179.138 are the same machine.

On customer's testing, if unplug/plug cable more and more, and then the ip will changed for many times, and all the ip could ping and connect. 

Expected results:

Just one IP which showed by "ip addr" could connect.
 
Additional info:

Customer tried "ignore-carrier=no" by below kcs, and the issue won't reproduce after 20 times unplug/plug cable.

https://access.redhat.com/solutions/894763

and from man page of NetworkManager.conf it said "Additionally, it will allow any active connection (whether static or dynamic) to remain active on the device when carrier is lost."

Comment 1 Thomas Haller 2021-07-05 15:51:51 UTC
hi,

when you are in that situation, what gives:

ip addr
ip -6 addr
ip -4 route show table all
ip -6 route show table all
ip -4 rule
ip -6 rule




Also, if you are about to reproduce, then please enable `level=TRACE` log first, reproduce, and provide complete logs. See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27

thank you

Comment 2 yangfei 2021-07-06 03:00:29 UTC
the results of "ip" command pls check the attachment ip.zip

Comment 5 Thomas Haller 2021-07-12 08:51:10 UTC
Sidenote: the logfile from comment 4 does not have debug logs. https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 tries to explain how to get the logs. Also, don't merely attach /var/log/messages, instead use `journalctl -b 0`.


The ip-commands from comment 3 are good, but it's not clear to me what "before" and "after" means. Also, comment 0 talks about 10.73.179.137 while comment 3 show entirely different IP addresses (100.x.y.z, which aren't even RFC1918 private address ranges).



`after/ip4_route.log` is interesting. We see some left over local routes that shouldn't be there.


Would it be possible to reproduce this again and show the complete `journalctl -b 0`, but with `level=TRACE` log of NetworkManager?

Comment 6 yangfei 2021-07-12 08:54:11 UTC
"before"  means before unplug the cable.
"after"  means unplug and then plug the cable.

Comment 7 Thomas Haller 2021-07-12 08:57:15 UTC
is it possible to get a full log?

Comment 8 yangfei 2021-07-12 09:06:21 UTC
Let me clarify the environment, the attachment data are from customer side. And the #0 was from my test env, so we just focus on the customer's data.

I'll ask cu to reproduce this again and show the complete `journalctl -b 0`  with `level=TRACE` log of NetworkManager.

Comment 9 yangfei 2021-07-12 09:41:22 UTC
Hello Thomas, 

Was the log under sos_commands/logs useful for you ?

Comment 11 Thomas Haller 2021-07-12 15:01:58 UTC
(In reply to Thomas Haller from comment #10)

At timestamp 1625538296.7987, we see that the route 100.2.34.50/32 is wrongly re-added.

As such, this is like bug 1907661, which however was supposed to be fixed by 1:1.30.0-6 (the log shows the issue for 1:1.30.0-7).

Comment 12 Thomas Haller 2021-07-21 08:10:23 UTC
fixed upstream by MR#935.

Comment 16 yangfei 2021-10-18 03:31:39 UTC
Customer updated:

We upgraded NetworkManager-1.32.10-2.el8.x86_64.rpm on RHEL8.3, this installation package already includes bug 1979192.
[root@localhost ~]# rpm -qi NetworkManager-1.32.10-2.el8.x86_64.rpm --changelog|grep 1979192
warning: NetworkManager-1.32.10-2.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
-core: fix adding stale local routes when address changes (rh #1979192)

Then we tested again, and found that the ip address changed when unplug/plug cable,  but ping previous ip would be failure. We want to know whether this behavior is expected ? 

We expected the IP address remains unchanged after unplug/plug cable.

Comment 18 Beniamino Galvani 2021-10-19 09:42:28 UTC
I see two issues here. One is that after plugging the cable back, the
DHCP server takes 40 seconds to reply. The other thing is that the
DHCP server assigns a different IP address. Those issues don't seem
related to the client, but to the infrastructure (DHCP server, etc).

Comment 20 errata-xmlrpc 2021-11-09 19:30:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361


Note You need to log in before you can comment on or make changes to this bug.