Bug 1888962
| Summary: | Name resolution not working due to 99-origin-dns.sh not being executed reliably after upgrading to RHEL 7.9 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Krenger <skrenger> |
| Component: | Installer | Assignee: | Russell Teague <rteague> |
| Installer sub component: | openshift-ansible | QA Contact: | weiwei jiang <wjiang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | acardace, andbartl, aos-bugs, atragler, bgalvani, bleanhar, bparry, dornelas, jgreguske, jokerman, jortialc, lrintel, mahmad, mstaeble, nm-team, rkhan, rsandu, sreber, sthakare, sukulkar, thaller, till |
| Version: | 3.11.0 | Keywords: | Regression |
| Target Milestone: | --- | ||
| Target Release: | 3.11.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-18 14:09:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 5
Thomas Haller
2020-10-19 08:13:01 UTC
> We have observed the change in behaviour when going from NetworkManager-1.18.4-3.el7.x86_64 to NetworkManager-1.18.8-1.el7.x86_64. Please attach two complete syslog outputs that show working (rhel-7.8) and non-working (rhel-7.9). Also, ensure to have debug logging in NetworkManager enabled (level=TRACE), but don't filter the logs to only contain NetworkManager logs. See https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 for hints about logging. Current findings when individually updating packages indicate that NetworkManager is only partially involved, as only updating NetworkManager (NetworkManager-1.18.4-3.el7.x86_64 to NetworkManager-1.18.8-1.el7.x86_64) does NOT reproduce the issue. The `cloud-init` package (cloud-init-18.5-6.el7_8.5.x86_64 -> cloud-init-19.4-7.el7.x86_64) seems to be the root cause for this issue. (In reply to Simon Krenger from comment #12) > The `cloud-init` package (cloud-init-18.5-6.el7_8.5.x86_64 -> > cloud-init-19.4-7.el7.x86_64) seems to be the root cause for this issue. probably due to bug 1748015 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj311osp1116bmaster-etcd-nfs-1 Ready master 15m v1.11.0+d4cacc0 10.0.150.215 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-1 Ready compute 11m v1.11.0+d4cacc0 10.0.151.160 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-registry-router-1 Ready <none> 11m v1.11.0+d4cacc0 10.0.151.80 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc version oc v3.11.318 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wj311osp1116bmaster-etcd-nfs-1:8443 openshift v3.11.318 kubernetes v1.11.0+d4cacc0 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# cat /etc/NetworkManager/conf.d/99-origin.conf [main] dns=none #### before upgrade [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc -n openshift-monitoring rsh cluster-monitoring-operator-576c6b8b55-sz8cw sh-4.2$ nslookup > kubernetes.default.svc.cluster.local Server: 10.0.151.160 Address: 10.0.151.160#53 Name: kubernetes.default.svc.cluster.local Address: 172.30.0.1 > sh-4.2$ exit [root@wj311osp1116bnode-1 ~]# rpm -qa|grep -i -E "kernel|networkmanager|cloud-init|redhat-release-server" cloud-init-18.5-6.el7.x86_64 kernel-3.10.0-1127.el7.x86_64 kernel-tools-libs-3.10.0-1127.el7.x86_64 NetworkManager-1.18.4-3.el7.x86_64 NetworkManager-team-1.18.4-3.el7.x86_64 NetworkManager-tui-1.18.4-3.el7.x86_64 kernel-tools-3.10.0-1127.el7.x86_64 NetworkManager-config-server-1.18.4-3.el7.noarch redhat-release-server-7.8-2.el7.x86_64 NetworkManager-libnm-1.18.4-3.el7.x86_64 #### After upgrade [root@wj311osp1116bnode-1 ~]# rpm -qa|grep -i -E "kernel|networkmanager|cloud-init|redhat-release-server" NetworkManager-libnm-1.18.8-2.el7_9.x86_64 kernel-3.10.0-1127.el7.x86_64 NetworkManager-team-1.18.8-2.el7_9.x86_64 NetworkManager-1.18.8-2.el7_9.x86_64 kernel-3.10.0-1160.6.1.el7.x86_64 cloud-init-19.4-7.el7_9.2.x86_64 NetworkManager-config-server-1.18.8-2.el7_9.noarch redhat-release-server-7.9-5.el7_9.x86_64 kernel-tools-libs-3.10.0-1160.6.1.el7.x86_64 kernel-tools-3.10.0-1160.6.1.el7.x86_64 NetworkManager-tui-1.18.8-2.el7_9.x86_64 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj311osp1116bmaster-etcd-nfs-1 Ready master 32m v1.11.0+d4cacc0 10.0.150.215 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 wj311osp1116bnode-1 Ready compute 28m v1.11.0+d4cacc0 10.0.151.160 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.6.1.el7.x86_64 docker://1.13.1 wj311osp1116bnode-registry-router-1 Ready <none> 28m v1.11.0+d4cacc0 10.0.151.80 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.el7.x86_64 docker://1.13.1 [root@wj311osp1116bmaster-etcd-nfs-1 ~]# oc -n openshift-monitoring rsh cluster-monitoring-operator-576c6b8b55-sz8cw sh-4.2$ nslookup > kubernetes.default.svc.cluster.local Server: 10.0.151.160 Address: 10.0.151.160#53 Name: kubernetes.default.svc.cluster.local Address: 172.30.0.1 > sh-4.2$ exit Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.318 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5107 |