Bug 2004212

Summary: IPv6 Default gateway deleted from routing table
Product: Red Hat Enterprise Linux 8 Reporter: Roni <reliezer>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: David Jaša <djasa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.4CC: amusil, atragler, bgalvani, djasa, fge, lrintel, mburman, mperina, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rcKeywords: AutomationBlocker, Regression, Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.32.10-3.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2006408 2007264 (view as bug list) Environment:
Last Closed: 2021-11-09 19:30:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2006408, 2007264    
Attachments:
Description Flags
Reproducer none

Description Roni 2021-09-14 17:44:43 UTC
Created attachment 1823069 [details]
NM logs + tcpdump

Description of problem:
IPv6 default-gateway is deleted from the routing table


Version-Release number of selected component (if applicable):
NetworkManager 1.30.0-10
RHEL 8.4, kernel 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Tue Sep 7 07:07:31 EDT 2021 x86_64 x86_64

How reproducible:
100%

Steps to Reproduce:
1. Use RHEL 8.4 
2. Define network connection with IPv6=Auto mode 
   (Other settings: ipv6.addr-gen-mode: eui64, ipv4=DHCP)  
3. run 'ip -6 route' and verify ipv6 default GW exists
   e.g.: 
   default via fe80::9ecc:aaaa:bbbb:cccc dev eno1 proto ra metric 100 pref medium
4. Wait some hours
5. run 'ip -6 route' again and verify ipv6 default GW exists 


Actual results:
ipv6 default-gateway route line does not exist


Expected results:
ipv6 default-gateway route line should exist


Additional info:
- The problem reproduces with RHEL 8.4 host that was installed
with oVirt VDSM.
The management NIC is connected to a network bridge (ovirtmgmt)
and the route line is assigned to this bridge

- The attached include NM logs and tcpdump.
the tcpdump was taken after the problem was introduced
From this capture, we can see that the Router is sending the RA packet
Although the solicitation packet was not generated by this Host

Comment 1 Thomas Haller 2021-09-15 06:20:23 UTC
Used NetworkManager version:

  <info>  [1631550128.5105] NetworkManager (version 1.30.0-10.el8_4) is starting... (for the first time)





The disappearing of the IPv6 default route ("gateway") is not the real problem. It gets deleted after expiring after 1800 seconds as no updating Router Announcement was received.

  <debug> [1631550131.1103] ndisc[0x5614dad28ac0,"ovirtmgmt"]:   gateway fe80::9ecc:8301:b756:4b60 pref medium exp 1800.000
  ...
  <debug> [1631551931.1185] platform: (ovirtmgmt) ip6-route: delete type unicast ::/0 via fe80::9ecc:8301:b756:4b60 dev 9 metric 42


We also see:

  <warn>  [1631553850.0992] ndisc[0x5614dad28ac0,"ovirtmgmt"]: solicit: failure sending router solicitation: Network is unreachable (101)

which shows NetworkManager is unable to send router solicitations. Possibly it is also unable to receive them...



That is probably due to 

  <debug> [1631550139.2935] platform: (ovirtmgmt) ip6-route: delete type unicast table 255 ff00::/8 via :: dev 9 metric 256 mss 0 rt-src rt-boot
  ...
  <info>  [1631550139.2944] audit: op="device-reapply" interface="ovirtmgmt" ifindex=9 pid=3196 uid=0 result="success"


That is, during a reapply we (wrongly) delete the ff00::/8 route, which is very bad.


Such route is added automatically by kernel, and must not be deleted.


Sidenote: more recent kernels would configure the route

    multicast ff00::/8 dev x table local proto kernel scope global metric 256 pref medium

instead

   unicast ff00::/8 dev x table local proto boot scope global metric 256 pref medium

which is due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ceed9038b2783d14e0422bdc6fd04f70580efb4c and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a826b04303a40d52439aa141035fca5654ccaccd.



This problem is similar to [1], [2].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1907661#c29
[2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/557644f5e03a77b3ebe09ceba672217959cf3bdc

Maybe the solution should be similar to that. On the other hand, the solution for [1] is not good, because we try to anticipate what kernel does, generating "dependent routes", only to ignore them later. In the first step, a similar solution probably should be done, but "next" branch will solve the problem differently.

Comment 23 errata-xmlrpc 2021-11-09 19:30:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361