Bug 2033991 - DHCP routes missing after lease acquired
Summary: DHCP routes missing after lease acquired
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 36
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Beniamino Galvani
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-19 11:23 UTC by François Rigault
Modified: 2023-05-25 19:27 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-05-25 19:27:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1049 0 None opened core: set force-commit flag for generated routes 2021-12-20 04:19:05 UTC

Description François Rigault 2021-12-19 11:23:00 UTC
Description of problem:
after the lease expires, my VM running rawhide manages to retrieve an IP from DHCP however no route is installed.


Version-Release number of selected component (if applicable):
Fedora-Cloud-Base-Rawhide-20211217.n.1.x86_64.qcow2
NetworkManager-1.36.0-0.3.fc36.x86_64
dhcp-client-4.4.2-17.P1.fc36.x86_64


How reproducible:
discovered the problem by suspending the hypervisor, then after waking up seeing the VMs not getting network back. I manage to reproduce with a little bit of set-up:


Steps to Reproduce (using libvirt and latest rawhide VM)
1. on hypervisor, ensure the dhcp lease is set to 2min (alternatively, prepare to wait for 1h)
    $ sudo grep dhcp-range /var/lib/libvirt/dnsmasq/default.conf
    dhcp-range=10.224.122.11,10.224.122.100,255.255.255.0,2m
2. boot a VM, check its ip address and route is fine:
    $ ip a show dev eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
        link/ether 52:54:00:9e:82:6c brd ff:ff:ff:ff:ff:ff
        altname enp1s0
        inet 10.224.122.84/24 brd 10.224.122.255 scope global dynamic noprefixroute eth0
           valid_lft 97sec preferred_lft 97sec
        inet6 fe80::5054:ff:fe9e:826c/64 scope link 
           valid_lft forever preferred_lft forever
    $ ip route
    default via 10.224.122.1 dev eth0 proto dhcp metric 100 
    10.224.122.0/24 dev eth0 proto kernel scope link src 10.224.122.84 metric 100 
3. on hypervisor, pause the dnsmasq process for 2 minutes
    $ sudo pkill -STOP -u libvirt-dnsmasq dnsmasq
    $ sleep 120
    $ sudo pkill -CONT -u libvirt-dnsmasq dnsmasq
4. on VM, check again the ip and route


Actual results:
    $ ip a show dev eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
        link/ether 52:54:00:9e:82:6c brd ff:ff:ff:ff:ff:ff
        altname enp1s0
        inet 10.224.122.84/24 brd 10.224.122.255 scope global dynamic noprefixroute eth0
           valid_lft 73sec preferred_lft 73sec
        inet6 fe80::5054:ff:fe9e:826c/64 scope link 
           valid_lft forever preferred_lft forever
    $ ip route
    <empty>


Expected results:
ip route should be properly set


Additional info:
it "Recovers" by restarting NetworkManager:
logs (actual results above with ip address but not route was taken at 11:13:27)

Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4280] device (eth0): Activation: starting connection 'System eth0' (5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e0>
Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4285] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4306] manager: NetworkManager state is now CONNECTING
Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4320] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4346] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:00 raw NetworkManager[1367]: <info>  [1639912200.4363] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2055] dhcp4 (eth0): state changed new lease, address=10.224.122.84
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2074] policy: set 'System eth0' (eth0) as default for IPv4 routing and DNS
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2279] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2555] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2560] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2575] manager: NetworkManager state is now CONNECTED_SITE
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2589] device (eth0): Activation: successful, device activated.
Dec 19 11:10:44 raw NetworkManager[1367]: <info>  [1639912244.2601] manager: NetworkManager state is now CONNECTED_GLOBAL
Dec 19 11:12:33 raw NetworkManager[1367]: <info>  [1639912353.0510] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Dec 19 11:12:33 raw NetworkManager[1367]: <info>  [1639912353.0513] dhcp4 (eth0): state changed no lease
Dec 19 11:12:43 raw NetworkManager[1367]: <info>  [1639912363.3796] dhcp4 (eth0): state changed new lease, address=10.224.122.84
Dec 19 11:13:31 raw NetworkManager[1367]: <info>  [1639912411.5304] caught SIGTERM, shutting down normally.
Dec 19 11:13:31 raw NetworkManager[1367]: <info>  [1639912411.5329] dhcp4 (eth0): canceled DHCP transaction
Dec 19 11:13:31 raw NetworkManager[1367]: <info>  [1639912411.5329] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Dec 19 11:13:31 raw NetworkManager[1367]: <info>  [1639912411.5329] dhcp4 (eth0): state changed no lease
Dec 19 11:13:31 raw systemd[1]: Stopping Network Manager...


I believe the policy is missing (policy: set 'System eth0' (eth0) as default for IPv4 routing and DNS)

Comment 1 François Rigault 2021-12-19 12:26:13 UTC
with traces activated, we see there was a network unreachable while adding a route:

Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4896] platform: (eth0) address: adding or updating IPv4 address: 10.224.122.84/24 brd 10.224.122.255 lft 116sec pref 116sec lifetime 295-0[116,116] dev 2 flags noprefixroute src unknown
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4921] platform-linux: event-notification: RTM_NEWADDR, flags 0, seq 28: 10.224.122.84/24 brd 10.224.122.255 lft 116sec pref 116sec lifetime 295-294[117,117] dev 2 flags noprefixroute src kernel
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4922] platform: (eth0) signal: address 4   added: 10.224.122.84/24 brd 10.224.122.255 lft 116sec pref 116sec lifetime 295-294[117,117] dev 2 flags noprefixroute src kernel
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4923] l3cfg[30ba27dff5063a0e,ifindex=2]: obj-state: appeared in platform: [05979bb96a16e910, ip4-address, 10.224.122.84/24 brd* 10.224.122.255 lft 116sec pref 116sec lifetime 295-291[120,120] dev 2 src dhcp force-commit], nm-configured, in-platform
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4923] l3cfg[30ba27dff5063a0e,ifindex=2]: emit signal (platform-change, obj-type=ip4-address, change=added, obj=10.224.122.84/24 brd 10.224.122.255 lft 116sec pref 116sec lifetime 295-294[117,117] dev 2 flags noprefixroute src kernel)
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4924] platform-linux: event-notification: RTM_NEWROUTE, flags excl,create, seq 0: type local table 255 10.224.122.84/32 via 0.0.0.0 dev 2 metric 0 mss 0 rt-src rt-kernel scope host pref-src 10.224.122.84
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4925] platform: (eth0) signal: route   4   added: type local table 255 10.224.122.84/32 via 0.0.0.0 dev 2 metric 0 mss 0 rt-src rt-kernel scope host pref-src 10.224.122.84
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4925] l3cfg[30ba27dff5063a0e,ifindex=2]: emit signal (platform-change, obj-type=ip4-route, change=added, obj=type local table 255 10.224.122.84/32 via 0.0.0.0 dev 2 metric 0 mss 0 rt-src rt-kernel scope host pref-src 10.224.122.84)
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4926] platform-linux: event-notification: RTM_NEWROUTE, flags excl,create, seq 0: ignore
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4927] platform-linux: do-add-ip4-address[2: 10.224.122.84/24]: success
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4928] platform: (eth0) route: append     IPv4 route: type unicast table 0 0.0.0.0/0 via 10.224.122.1 dev 2 metric 100 mss 0 rt-src dhcp force-commit
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4931] platform-linux: netlink: recvmsg: error message from kernel: Network is unreachable (101) "Nexthop has invalid gateway" for request 29
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4932] platform-linux: do-add-ip4-route[type unicast 0.0.0.0/0 via 10.224.122.1 dev 2 metric 100 mss 0 rt-src rt-dhcp scope global force-commit]: failure 101 (Network is unreachable - Nexthop has invalid gateway)
Dec 19 12:18:00 raw NetworkManager[1310]: <debug> [1639916280.4933] platform: (eth0) route-sync: ignore failure to add IPv4 route: type unicast table 0 0.0.0.0/0 via 10.224.122.1 dev 2 metric 100 mss 0 rt-src dhcp force-commit: Network is unreachable
Dec 19 12:18:00 raw NetworkManager[1310]: <trace> [1639916280.4948] l3cfg[30ba27dff5063a0e,ifindex=2]: committing IPv6 configuration (update)

Comment 2 Beniamino Galvani 2021-12-20 04:19:06 UTC
This should be fixed by:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1049

The loss of prefix route causes a failure when adding again the default route because the next-hop is not directly reachable.

Comment 3 Ben Cotton 2022-02-08 21:20:52 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 4 Ben Cotton 2023-04-25 18:23:07 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 5 Ludek Smid 2023-05-25 19:27:30 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.