Bug 2105088
| Summary: | NetworkManager didn't automatically renew the lease on the VLAN interface configured using nmstate | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Mridul Markandey <mmarkand> | ||||
| Component: | NetworkManager | Assignee: | NetworkManager Development Team <nm-team> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 8.4 | CC: | ajuarez, bgalvani, bnemec, bzvonar, dacarpen, ealcaniz, gdiotte, lars, lrintel, nm-team, pemensik, rkhan, sfaye, sukulkar, till, vbenes, xzhou, yjoseph | ||||
| Target Milestone: | rc | Keywords: | FastFix, Triaged, ZStream | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | NetworkManager-1.39.10-1.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2109411 2109412 2110000 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-11-08 10:10:38 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2109411, 2109412, 2110000, 2113992, 2113993, 2113994, 2113997, 2113998, 2113999 | ||||||
| Attachments: |
|
||||||
|
Comment 9
Darren Carpenter
2022-07-15 22:59:24 UTC
Created attachment 1897581 [details]
traces from journalctl from Jul 13 (original ticket open date)
Apologies, never gave a problem statement. This is an openshift env. with storage IPs being delivered via dhcp. (Ideally these would be static) When the lease expired for storage, it was unable to negotiate a new IP. -- From what I can tell on worker-18, it receives 'something' from the server, but ultimately cannot connect; dmesg/worker-18 11674:[9115102.767156] NFS: Server wrote zero bytes, expected 4096. 11678:[9115257.189423] nfs: server 69.100.109.84 not responding, still trying 11679:[9115257.192668] nfs: server 69.100.109.84 not responding, still trying 11680:[9115264.357663] nfs: server 69.100.109.84 not responding, timed out 11681:[9115264.365017] nfs: server 69.100.109.85 not responding, timed out 11683:[9115282.789355] nfs: server 69.100.109.85 not responding, still trying 11684:[9115282.789364] nfs: server 69.100.109.85 not responding, still trying Journalctl seems to echo the above, but the piece we locked in on was the 'not an infinite DHCP4 lease, ignoring'; journalctl/worker-18 3822730:Jul 13 16:18:53 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Checking for nameservers in /var/run/NetworkManager/resolv.conf 3822733:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn> [1657729133.4581] dhcp4 (ens2f0): request timed out 3822734:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4582] dhcp4 (ens2f0): state changed unknown -> timeout 3822735:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4582] device (ens2f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') 3822736:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn> [1657729133.4597] device (ens2f0): Activation: failed for connection 'Wired connection 8' 3822737:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4601] device (ens2f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') 3822738:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5015] dhcp4 (ens2f0): canceled DHCP transaction 3822739:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5015] dhcp4 (ens2f0): state changed timeout -> done 3822740:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5076] policy: auto-activating connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56) 3822741:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5088] device (ens2f0): Activation: starting connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56) 3822742:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5089] device (ens2f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') 3822743:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5094] device (ens2f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') 3822744:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5100] device (ens2f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') 3822745:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5104] dhcp4 (ens2f0): activation: beginning transaction (timeout in 45 seconds) 3822819:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Prepending 'nameserver 69.100.108.159' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf) 3822824:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not an infinite DHCP4 lease. Ignoring.' 3822825:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not an infinite DHCP4 lease. Ignoring. 3822828:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not a DHCP6 address. Ignoring.' 3822829:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not a DHCP6 address. Ignoring. 3822831:Jul 13 16:18:54 worker-18 root[1882843]: NM mdns-hostname triggered by dhcp4-change. 3822832:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: <13>Jul 13 16:18:54 root: NM mdns-hostname triggered by dhcp4-change. 3822969:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3822970:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3822972:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Succeeded. 3822973:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Consumed 470ms CPU time 3823029:Jul 13 16:19:10 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying 3823035:Jul 13 16:19:13 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3823161:Jul 13 16:19:18 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3823190:Jul 13 16:19:23 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3823214:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying 3823215:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3823242:Jul 13 16:19:33 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out Lastly, we see; <warn> [1657729182.9473] dhcp4 (bond0.284): selecting lease failed: -131 Which led me to [1], which appears to show that NetworkManagers' code at this step leads to; return -ENOTRECOVERABLE; ---------- If you need any further details/have questions, please let me know. [1] gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/993 Hi All, I know the trace+details we're only recently added, but just checking in to see if any progress has been made. Customer is putting a little bit of pressure on a resolution for this one. Thanks in advance ~ and sorry to bug. Darren Carpenter The attached log doesn't show any "selecting lease failed" error as in the original report, and it looks like the lease is never lost. Is it the right log? *** Bug 2106903 has been marked as a duplicate of this bug. *** Ben (et al): While we wait for the fix to land in appropriate channels, are there any suggested workarounds for this problem? (In reply to Lars Kellogg-Stedman from comment #28) > Ben (et al): While we wait for the fix to land in appropriate channels, are > there any suggested workarounds for this problem? I don't really know what the circumstances are that cause this so I can't suggest a way to avoid it. I suppose you could write a service to periodically check the interfaces and if they lose their address then force them to retry. That could still result in connectivity breaks in the meantime though. :-/ Added links to upstream issue and NM-CI merge request to add test. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7680 |