`systemctl reload NetworkManager` is like sending SIGHUP signal. It means to reload configuration from disk, but also trigger a new DNS update. It does however not mean to run dispatcher scripts. Dispatcher scripts run for various states of the activation, and SIGHUP does not change that. NetworkManager writes out DNS configuration (like writing to /etc/resolv.conf) at unpredictable moments whenever it thinks it is necessary. For example, when a new DHCP gets received. You cannot configure NEtworkManager to write to /etc/resolv.conf while also write it with a dispatcher script. That does not work. If you want to manage /etc/resolv.conf yourself (e.g. with a dispatcher script), then tell NetworkManager to not (also) write /etc/resolv.conf. See `dns=` and `rc-manager=` settings in `man NetworkManager.conf`. > Network name resolution for internal addresses fails on the Node after the upgrade works as expected. It's not clear how that could have worked reliably before update. The dispatcher script and NetworkManager seem to fight over managing /etc/resolv.conf, that does not work. Maybe it worked before because there was a lucky race, or the configuration was significantly different. Also, what versions of software was used before the update? And how does the configuration look like? > Above, we can see that "systemctl restart NetworkManager" is fixing In most cases, `systemctl restart NetworkManager` is not the right solution for fixing anything. Nor is it clear that it would solve the race. The solution is not to have two components fight over /etc/resolv.conf.
> We have observed the change in behaviour when going from NetworkManager-1.18.4-3.el7.x86_64 to NetworkManager-1.18.8-1.el7.x86_64. Please attach two complete syslog outputs that show working (rhel-7.8) and non-working (rhel-7.9). Also, ensure to have debug logging in NetworkManager enabled (level=TRACE), but don't filter the logs to only contain NetworkManager logs. See https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 for hints about logging.
Current findings when individually updating packages indicate that NetworkManager is only partially involved, as only updating NetworkManager (NetworkManager-1.18.4-3.el7.x86_64 to NetworkManager-1.18.8-1.el7.x86_64) does NOT reproduce the issue. The `cloud-init` package (cloud-init-18.5-6.el7_8.5.x86_64 -> cloud-init-19.4-7.el7.x86_64) seems to be the root cause for this issue.
(In reply to Simon Krenger from comment #12) > The `cloud-init` package (cloud-init-18.5-6.el7_8.5.x86_64 -> > cloud-init-19.4-7.el7.x86_64) seems to be the root cause for this issue. probably due to bug 1748015
[root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj311osp1116bmaster-etcd-nfs-1 Ready master 15m v1.11.0+d4cacc0 10.0.150.215 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-1 Ready compute 11m v1.11.0+d4cacc0 10.0.151.160 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-registry-router-1 Ready <none> 11m v1.11.0+d4cacc0 10.0.151.80 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc version oc v3.11.318 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wj311osp1116bmaster-etcd-nfs-1:8443 openshift v3.11.318 kubernetes v1.11.0+d4cacc0 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# cat /etc/NetworkManager/conf.d/99-origin.conf [main] dns=none #### before upgrade [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc -n openshift-monitoring rsh cluster-monitoring-operator-576c6b8b55-sz8cw sh-4.2$ nslookup > kubernetes.default.svc.cluster.local Server: 10.0.151.160 Address: 10.0.151.160#53 Name: kubernetes.default.svc.cluster.local Address: 172.30.0.1 > sh-4.2$ exit [root@wj311osp1116bnode-1 ~]# rpm -qa|grep -i -E "kernel|networkmanager|cloud-init|redhat-release-server" cloud-init-18.5-6.el7.x86_64 kernel-3.10.0-1127.el7.x86_64 kernel-tools-libs-3.10.0-1127.el7.x86_64 NetworkManager-1.18.4-3.el7.x86_64 NetworkManager-team-1.18.4-3.el7.x86_64 NetworkManager-tui-1.18.4-3.el7.x86_64 kernel-tools-3.10.0-1127.el7.x86_64 NetworkManager-config-server-1.18.4-3.el7.noarch redhat-release-server-7.8-2.el7.x86_64 NetworkManager-libnm-1.18.4-3.el7.x86_64 #### After upgrade [root@wj311osp1116bnode-1 ~]# rpm -qa|grep -i -E "kernel|networkmanager|cloud-init|redhat-release-server" NetworkManager-libnm-1.18.8-2.el7_9.x86_64 kernel-3.10.0-1127.el7.x86_64 NetworkManager-team-1.18.8-2.el7_9.x86_64 NetworkManager-1.18.8-2.el7_9.x86_64 kernel-3.10.0-1160.6.1.el7.x86_64 cloud-init-19.4-7.el7_9.2.x86_64 NetworkManager-config-server-1.18.8-2.el7_9.noarch redhat-release-server-7.9-5.el7_9.x86_64 kernel-tools-libs-3.10.0-1160.6.1.el7.x86_64 kernel-tools-3.10.0-1160.6.1.el7.x86_64 NetworkManager-tui-1.18.8-2.el7_9.x86_64 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj311osp1116bmaster-etcd-nfs-1 Ready master 32m v1.11.0+d4cacc0 10.0.150.215 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-1 Ready compute 28m v1.11.0+d4cacc0 10.0.151.160 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.6.1.el7.x86_64 docker://1.13.1 wj311osp1116bnode-registry-router-1 Ready <none> 28m v1.11.0+d4cacc0 10.0.151.80 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc -n openshift-monitoring rsh cluster-monitoring-operator-576c6b8b55-sz8cw sh-4.2$ nslookup > kubernetes.default.svc.cluster.local Server: 10.0.151.160 Address: 10.0.151.160#53 Name: kubernetes.default.svc.cluster.local Address: 172.30.0.1 > sh-4.2$ exit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.318 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5107