Bug 1979192

Summary: NetworkManager configures wrong, spurious "local" route for IP address after DHCP address change
Product: Red Hat Enterprise Linux 8 Reporter: yangfei <feyang>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 8.4CC: atragler, bgalvani, djasa, ferferna, fge, lrintel, rkhan, shuali, sukulkar, thaller, till, vbenes, weihao.bj
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.32.6-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:30:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yangfei 2021-07-05 08:16:09 UTC
Description of problem:

IP changed from dhcp server after unplug/plug cable 10 times and the previous ip still could connected. on os side just one ip showed by "ip addr" command.

Version-Release number of selected component (if applicable):

kernel-4.18.0-305.el8.x86_64
NetworkManager-1.30.0-7.el8.x86_64

How reproducible:

[root@dell-per640-test ~]# uname -r
4.18.0-305.el8.x86_64
[root@dell-per640-test ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.4 (Ootpa)

[root@dell-per640-test ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens2f0 
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=ens2f0
DEVICE=ens2f0
ONBOOT=yes


[root@dell-per640-test ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-07-01 23:49:54 EDT; 8min ago
     Docs: man:NetworkManager(8)
 Main PID: 1874 (NetworkManager)
    Tasks: 3 (limit: 820752)
   Memory: 13.1M
   CGroup: /system.slice/NetworkManager.service
           └─1874 /usr/sbin/NetworkManager --no-daemon


[root@dell-per640-test ~]# ethtool -i ens2f0
driver: mlx5_core
version: 5.0-0
firmware-version: 14.28.1300 (MT_2420110004)
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Steps to Reproduce:
1. At first IP got from dhcp server was 10.73.179.137 , and then ping it from a client.
2. After 10 times unplug/plug cable, ip changed to 10.73.179.138.
ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:1d:2d:c0:85:a2 brd ff:ff:ff:ff:ff:ff
    inet 10.73.179.138/23 brd 10.73.179.255 scope global dynamic noprefixroute ens2f0
       valid_lft 42841sec preferred_lft 42841sec

Actual results:

ping 10.73.179.137 and 10.73.179.138 both okay, and ssh from 10.73.179.137 and 10.73.179.138 are the same machine.

On customer's testing, if unplug/plug cable more and more, and then the ip will changed for many times, and all the ip could ping and connect. 

Expected results:

Just one IP which showed by "ip addr" could connect.
 
Additional info:

Customer tried "ignore-carrier=no" by below kcs, and the issue won't reproduce after 20 times unplug/plug cable.

https://access.redhat.com/solutions/894763

and from man page of NetworkManager.conf it said "Additionally, it will allow any active connection (whether static or dynamic) to remain active on the device when carrier is lost."

Comment 1 Thomas Haller 2021-07-05 15:51:51 UTC
hi,

when you are in that situation, what gives:

ip addr
ip -6 addr
ip -4 route show table all
ip -6 route show table all
ip -4 rule
ip -6 rule




Also, if you are about to reproduce, then please enable `level=TRACE` log first, reproduce, and provide complete logs. See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27

thank you

Comment 2 yangfei 2021-07-06 03:00:29 UTC
the results of "ip" command pls check the attachment ip.zip

Comment 5 Thomas Haller 2021-07-12 08:51:10 UTC
Sidenote: the logfile from comment 4 does not have debug logs. https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 tries to explain how to get the logs. Also, don't merely attach /var/log/messages, instead use `journalctl -b 0`.


The ip-commands from comment 3 are good, but it's not clear to me what "before" and "after" means. Also, comment 0 talks about 10.73.179.137 while comment 3 show entirely different IP addresses (100.x.y.z, which aren't even RFC1918 private address ranges).



`after/ip4_route.log` is interesting. We see some left over local routes that shouldn't be there.


Would it be possible to reproduce this again and show the complete `journalctl -b 0`, but with `level=TRACE` log of NetworkManager?

Comment 6 yangfei 2021-07-12 08:54:11 UTC
"before"  means before unplug the cable.
"after"  means unplug and then plug the cable.

Comment 7 Thomas Haller 2021-07-12 08:57:15 UTC
is it possible to get a full log?

Comment 8 yangfei 2021-07-12 09:06:21 UTC
Let me clarify the environment, the attachment data are from customer side. And the #0 was from my test env, so we just focus on the customer's data.

I'll ask cu to reproduce this again and show the complete `journalctl -b 0`  with `level=TRACE` log of NetworkManager.

Comment 9 yangfei 2021-07-12 09:41:22 UTC
Hello Thomas, 

Was the log under sos_commands/logs useful for you ?

Comment 11 Thomas Haller 2021-07-12 15:01:58 UTC
(In reply to Thomas Haller from comment #10)

At timestamp 1625538296.7987, we see that the route 100.2.34.50/32 is wrongly re-added.

As such, this is like bug 1907661, which however was supposed to be fixed by 1:1.30.0-6 (the log shows the issue for 1:1.30.0-7).

Comment 12 Thomas Haller 2021-07-21 08:10:23 UTC
fixed upstream by MR#935.

Comment 16 yangfei 2021-10-18 03:31:39 UTC
Customer updated:

We upgraded NetworkManager-1.32.10-2.el8.x86_64.rpm on RHEL8.3, this installation package already includes bug 1979192.
[root@localhost ~]# rpm -qi NetworkManager-1.32.10-2.el8.x86_64.rpm --changelog|grep 1979192
warning: NetworkManager-1.32.10-2.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
-core: fix adding stale local routes when address changes (rh #1979192)

Then we tested again, and found that the ip address changed when unplug/plug cable,  but ping previous ip would be failure. We want to know whether this behavior is expected ? 

We expected the IP address remains unchanged after unplug/plug cable.

Comment 18 Beniamino Galvani 2021-10-19 09:42:28 UTC
I see two issues here. One is that after plugging the cable back, the
DHCP server takes 40 seconds to reply. The other thing is that the
DHCP server assigns a different IP address. Those issues don't seem
related to the client, but to the infrastructure (DHCP server, etc).

Comment 20 errata-xmlrpc 2021-11-09 19:30:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361