RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2105088 - NetworkManager didn't automatically renew the lease on the VLAN interface configured using nmstate
Summary: NetworkManager didn't automatically renew the lease on the VLAN interface con...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: NetworkManager Development Team
QA Contact: Vladimir Benes
URL:
Whiteboard:
: 2106903 (view as bug list)
Depends On:
Blocks: 2109411 2109412 2110000 2113992 2113993 2113994 2113997 2113998 2113999
TreeView+ depends on / blocked
 
Reported: 2022-07-07 21:05 UTC by Mridul Markandey
Modified: 2023-02-21 12:55 UTC (History)
18 users (show)

Fixed In Version: NetworkManager-1.39.10-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2109411 2109412 2110000 (view as bug list)
Environment:
Last Closed: 2022-11-08 10:10:38 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
traces from journalctl from Jul 13 (original ticket open date) (2.81 MB, text/plain)
2022-07-15 23:01 UTC, Darren Carpenter
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NMT-317 0 None None None 2023-02-21 12:55:25 UTC
Red Hat Issue Tracker RHELPLAN-127336 0 None None None 2022-07-08 19:22:26 UTC
Red Hat Product Errata RHBA-2022:7680 0 None None None 2022-11-08 10:12:04 UTC
freedesktop.org Gitlab NetworkManager NetworkManager-ci merge_requests 1121 0 None opened ipv4: add test for DHCP NAK during renewal 2022-07-25 13:11:48 UTC
freedesktop.org Gitlab NetworkManager NetworkManager issues 993 0 None closed dhcp4 fails with "selecting lease failed: -131" 2022-07-25 13:11:48 UTC

Comment 9 Darren Carpenter 2022-07-15 22:59:24 UTC
Adding <trace>'s from journalctl/NetworkManager from the original date of the ticket, Jul 13.
Customer preferred to not reboot the node to enable the machine-config, if more details are needed ~ please let us know.

Comment 10 Darren Carpenter 2022-07-15 23:01:27 UTC
Created attachment 1897581 [details]
traces from journalctl from Jul 13 (original ticket open date)

Comment 12 Darren Carpenter 2022-07-16 11:39:24 UTC
Apologies, never gave a problem statement.

This is an openshift env. with storage IPs being delivered via dhcp. (Ideally these would be static)
When the lease expired for storage, it was unable to negotiate a new IP.

--

From what I can tell on worker-18, it receives 'something' from the server, but ultimately cannot connect;

dmesg/worker-18

11674:[9115102.767156] NFS: Server wrote zero bytes, expected 4096.
11678:[9115257.189423] nfs: server 69.100.109.84 not responding, still trying
11679:[9115257.192668] nfs: server 69.100.109.84 not responding, still trying
11680:[9115264.357663] nfs: server 69.100.109.84 not responding, timed out
11681:[9115264.365017] nfs: server 69.100.109.85 not responding, timed out
11683:[9115282.789355] nfs: server 69.100.109.85 not responding, still trying
11684:[9115282.789364] nfs: server 69.100.109.85 not responding, still trying

Journalctl seems to echo the above, but the piece we locked in on was the 'not an infinite DHCP4 lease, ignoring';

journalctl/worker-18

3822730:Jul 13 16:18:53 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Checking for nameservers in /var/run/NetworkManager/resolv.conf
3822733:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn>  [1657729133.4581] dhcp4 (ens2f0): request timed out
3822734:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.4582] dhcp4 (ens2f0): state changed unknown -> timeout
3822735:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.4582] device (ens2f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
3822736:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn>  [1657729133.4597] device (ens2f0): Activation: failed for connection 'Wired connection 8'
3822737:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.4601] device (ens2f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
3822738:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5015] dhcp4 (ens2f0): canceled DHCP transaction
3822739:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5015] dhcp4 (ens2f0): state changed timeout -> done
3822740:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5076] policy: auto-activating connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56)
3822741:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5088] device (ens2f0): Activation: starting connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56)
3822742:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5089] device (ens2f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
3822743:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5094] device (ens2f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
3822744:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5100] device (ens2f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
3822745:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info>  [1657729133.5104] dhcp4 (ens2f0): activation: beginning transaction (timeout in 45 seconds)
3822819:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Prepending 'nameserver 69.100.108.159' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)
3822824:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not an infinite DHCP4 lease. Ignoring.'
3822825:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not an infinite DHCP4 lease. Ignoring.
3822828:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not a DHCP6 address. Ignoring.'
3822829:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not a DHCP6 address. Ignoring.
3822831:Jul 13 16:18:54 worker-18 root[1882843]: NM mdns-hostname triggered by dhcp4-change.
3822832:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: <13>Jul 13 16:18:54 root: NM mdns-hostname triggered by dhcp4-change.
3822969:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out
3822970:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out
3822972:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Succeeded.
3822973:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Consumed 470ms CPU time
3823029:Jul 13 16:19:10 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying
3823035:Jul 13 16:19:13 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out
3823161:Jul 13 16:19:18 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out
3823190:Jul 13 16:19:23 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out
3823214:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying
3823215:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out
3823242:Jul 13 16:19:33 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out


Lastly, we see;

<warn>  [1657729182.9473] dhcp4 (bond0.284): selecting lease failed: -131

Which led me to [1], which appears to show that NetworkManagers' code at this step leads to;

return -ENOTRECOVERABLE;

----------

If you need any further details/have questions, please let me know.

[1] gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/993

Comment 13 Darren Carpenter 2022-07-18 16:18:40 UTC
Hi All,

I know the trace+details we're only recently added, but just checking in to see if any progress has been made. Customer is putting a little bit of pressure on a resolution for this one. 

Thanks in advance ~ and sorry to bug.

Darren Carpenter

Comment 14 Beniamino Galvani 2022-07-19 14:49:05 UTC
The attached log doesn't show any "selecting lease failed" error as in the original report, and it looks like the lease is never lost. Is it the right log?

Comment 18 Victor Pickard 2022-07-20 13:36:57 UTC
*** Bug 2106903 has been marked as a duplicate of this bug. ***

Comment 28 Lars Kellogg-Stedman 2022-07-22 14:23:39 UTC
Ben (et al): While we wait for the fix to land in appropriate channels, are there any suggested workarounds for this problem?

Comment 30 Ben Nemec 2022-07-22 17:28:43 UTC
(In reply to Lars Kellogg-Stedman from comment #28)
> Ben (et al): While we wait for the fix to land in appropriate channels, are
> there any suggested workarounds for this problem?

I don't really know what the circumstances are that cause this so I can't suggest a way to avoid it. I suppose you could write a service to periodically check the interfaces and if they lose their address then force them to retry. That could still result in connectivity breaks in the meantime though. :-/

Comment 32 Beniamino Galvani 2022-07-25 13:11:48 UTC
Added links to upstream issue and NM-CI merge request to add test.

Comment 44 errata-xmlrpc 2022-11-08 10:10:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7680


Note You need to log in before you can comment on or make changes to this bug.