My apologies y'all, I've been putting notes in the wrong case I believe. Here is details/attachment with the networkmanager trace from the 13th; ---- This is an openshift env. with storage IPs being delivered via dhcp. (Ideally these would be static) When the lease expired for storage, it was unable to negotiate a new IP. -- From what I can tell on worker-18, it receives 'something' from the server, but ultimately cannot connect; dmesg/worker-18 11674:[9115102.767156] NFS: Server wrote zero bytes, expected 4096. 11678:[9115257.189423] nfs: server 69.100.109.84 not responding, still trying 11679:[9115257.192668] nfs: server 69.100.109.84 not responding, still trying 11680:[9115264.357663] nfs: server 69.100.109.84 not responding, timed out 11681:[9115264.365017] nfs: server 69.100.109.85 not responding, timed out 11683:[9115282.789355] nfs: server 69.100.109.85 not responding, still trying 11684:[9115282.789364] nfs: server 69.100.109.85 not responding, still trying Journalctl seems to echo the above, but the piece we locked in on was the 'not an infinite DHCP4 lease, ignoring'; journalctl/worker-18 3822730:Jul 13 16:18:53 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Checking for nameservers in /var/run/NetworkManager/resolv.conf 3822733:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn> [1657729133.4581] dhcp4 (ens2f0): request timed out 3822734:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4582] dhcp4 (ens2f0): state changed unknown -> timeout 3822735:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4582] device (ens2f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') 3822736:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <warn> [1657729133.4597] device (ens2f0): Activation: failed for connection 'Wired connection 8' 3822737:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.4601] device (ens2f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') 3822738:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5015] dhcp4 (ens2f0): canceled DHCP transaction 3822739:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5015] dhcp4 (ens2f0): state changed timeout -> done 3822740:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5076] policy: auto-activating connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56) 3822741:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5088] device (ens2f0): Activation: starting connection 'Wired connection 8' (bb09809e-1756-3375-9845-1c2cd5600b56) 3822742:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5089] device (ens2f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') 3822743:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5094] device (ens2f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') 3822744:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5100] device (ens2f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') 3822745:Jul 13 16:18:53 worker-18 NetworkManager[4760]: <info> [1657729133.5104] dhcp4 (ens2f0): activation: beginning transaction (timeout in 45 seconds) 3822819:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: NM resolv-prepender: Prepending 'nameserver 69.100.108.159' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf) 3822824:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not an infinite DHCP4 lease. Ignoring.' 3822825:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not an infinite DHCP4 lease. Ignoring. 3822828:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: + echo 'Not a DHCP6 address. Ignoring.' 3822829:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: Not a DHCP6 address. Ignoring. 3822831:Jul 13 16:18:54 worker-18 root[1882843]: NM mdns-hostname triggered by dhcp4-change. 3822832:Jul 13 16:18:54 worker-18 nm-dispatcher[1882468]: <13>Jul 13 16:18:54 root: NM mdns-hostname triggered by dhcp4-change. 3822969:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3822970:Jul 13 16:19:02 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3822972:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Succeeded. 3822973:Jul 13 16:19:04 worker-18 systemd[1]: NetworkManager-dispatcher.service: Consumed 470ms CPU time 3823029:Jul 13 16:19:10 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying 3823035:Jul 13 16:19:13 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3823161:Jul 13 16:19:18 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3823190:Jul 13 16:19:23 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out 3823214:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, still trying 3823215:Jul 13 16:19:28 worker-18 kernel: nfs: server 69.100.109.84 not responding, timed out 3823242:Jul 13 16:19:33 worker-18 kernel: nfs: server 69.100.109.85 not responding, timed out Lastly, we see; <warn> [1657729182.9473] dhcp4 (bond0.284): selecting lease failed: -131 Which led me to [1], which appears to show that NetworkManagers' code at this step leads to; return -ENOTRECOVERABLE; ---------- If you need any further details/have questions, please let me know. [1] gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/993
Created attachment 1897959 [details] networkmanager trace from the affected host from Jul 13