Bug 2087021
| Summary: | configure-ovs.sh fails, blocking new RHEL node from being scaled up on cluster without manual reboot | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Paul Webster <pauwebst> | |
| Component: | Networking | Assignee: | Periyasamy Palanisamy <pepalani> | |
| Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | ffernand, jcaamano, rbrattai, vpickard | |
| Version: | 4.9 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.9.z | |||
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2088519 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-09 14:00:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2088519 | |||
| Bug Blocks: | ||||
Scale succeeded with https://github.com/openshift/machine-config-operator/pull/3188 4.9.0-0.ci.test-2022-06-14-141650-ci-ln-rl16bwt-latest o49v23-xq6ss-rhel-0 Ready worker 57m v1.22.8+f34b40c 172.31.249.178 172.31.249.178 Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-372.9.1.el8.x86_64 cri-o://1.22.5-3.rhaos4.9.gitb6d3a87.el8 o49v23-xq6ss-rhel-1 Ready worker 57m v1.22.8+f34b40c 172.31.249.122 172.31.249.122 Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-372.9.1.el8.x86_64 cri-o://1.22.5-3.rhaos4.9.gitb6d3a87.el8 o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + ip route show o49v23-xq6ss-rhel-1 configure-ovs.sh[1890]: default via 172.31.248.1 dev br-ex proto dhcp metric 49 o49v23-xq6ss-rhel-1 configure-ovs.sh[1890]: 172.31.248.0/23 dev br-ex proto kernel scope link src 172.31.249.122 metric 49 o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + ip -6 route show o49v23-xq6ss-rhel-1 configure-ovs.sh[1891]: ::1 dev lo proto kernel metric 256 pref medium o49v23-xq6ss-rhel-1 configure-ovs.sh[1891]: fe80::/64 dev br-ex proto kernel metric 1024 pref medium o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + exit 0 o49v23-xq6ss-rhel-1 systemd[1]: ovs-configuration.service: Succeeded. o49v23-xq6ss-rhel-1 systemd[1]: Started Configures OVS with proper host networking configuration. o49v23-xq6ss-rhel-1 systemd[1]: ovs-configuration.service: Consumed 1.533s CPU time o49v23-xq6ss-rhel-0 configure-ovs.sh[1361]: + ip route show o49v23-xq6ss-rhel-0 configure-ovs.sh[1894]: default via 172.31.248.1 dev br-ex proto dhcp metric 49 o49v23-xq6ss-rhel-0 configure-ovs.sh[1894]: 172.31.248.0/23 dev br-ex proto kernel scope link src 172.31.249.178 metric 49 o49v23-xq6ss-rhel-0 configure-ovs.sh[1361]: + ip -6 route show o49v23-xq6ss-rhel-0 configure-ovs.sh[1895]: ::1 dev lo proto kernel metric 256 pref medium o49v23-xq6ss-rhel-0 configure-ovs.sh[1895]: fe80::/64 dev br-ex proto kernel metric 1024 pref medium o49v23-xq6ss-rhel-0 configure-ovs.sh[1361]: + exit 0 o49v23-xq6ss-rhel-0 systemd[1]: ovs-configuration.service: Succeeded. o49v23-xq6ss-rhel-0 systemd[1]: Started Configures OVS with proper host networking configuration. o49v23-xq6ss-rhel-0 systemd[1]: ovs-configuration.service: Consumed 1.321s CPU time logs with basename o49v23-xq6ss-rhel-1 configure-ovs.sh[1830]: ++ basename /etc/NetworkManager/systemConnectionsMerged/br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1833]: ++ basename /etc/NetworkManager/systemConnectionsMerged/br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1838]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-if-br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1842]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-if-br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1847]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-port-br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-port-br-ex o49v23-xq6ss-rhel-1 configure-ovs.sh[1850]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-port-br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-port-br-ex.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1853]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-if-phys0 o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-if-phys0 o49v23-xq6ss-rhel-1 configure-ovs.sh[1855]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-if-phys0.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-if-phys0.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1859]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-port-phys0 o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-port-phys0 o49v23-xq6ss-rhel-1 configure-ovs.sh[1862]: ++ basename /etc/NetworkManager/systemConnectionsMerged/ovs-port-phys0.nmconnection o49v23-xq6ss-rhel-1 configure-ovs.sh[1357]: + file=ovs-port-phys0.nmconnection Re-tested with https://github.com/openshift/machine-config-operator/pull/3254 for BZ 2108538 vSphere UPI RHCOS active_backup fail_over_mac=0 /etc/NetworkManager/systemConnectionsMerged/ens192 test .nmconnection /etc/NetworkManager/systemConnectionsMerged/ens224 test .nmconnection /etc/NetworkManager/systemConnectionsMerged/ens256 test .nmconnection libvirt IPI RHCOS DHCP active_backup fail_over_mac=0 /etc/NetworkManager/systemConnectionsMerged/bond0 test .nmconnection /etc/NetworkManager/systemConnectionsMerged/enp5s0 test .nmconnection /etc/NetworkManager/systemConnectionsMerged/enp6s0 test .nmconnection BZ 2108538, PR 3254 is need to make sure bond0 MAC == br-ex MAC for vSphere. Spaces in file names work. Spaces in NetworkManager ids does not work, depends on BZ 2104386 Rebooting after link failure is also risky, that depends on BZ 2103899 Correction in last comment: "vSphere UPI RHCOS active_backup fail_over_mac=0" should be "RHEL8 vSphere DHCP active_backup fail_over_mac=0" I should also note that we now set autoconnect-priority=99 in the slave .nmconnections See BZ 2055433 comment 1 and BZ 2089943 comment 9 Verified. PR 3254 is in 4.9.0-0.nightly-2022-07-26-141848 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.9.45 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5879 |
Description of problem: While attempting to add a new RHEL 8.5 node to an existing OCP 4.9.28 cluster using the scaleup ansible playbook, the node failed to report ready. The debug status of the kubelet service collected by ansible shows: E0504 11:46:35.931390 9709 kubelet.go:2360] \"Container runtime network not ready\" networkReady=\"NetworkReady=false reason :NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?\"" Rebooting the node resolved the issue. Logs from a sosreport taken prior to the reboot indicate that the configure-ovs.sh script run by ovs-configuration service failed to configure the network: May 04 11:35:24 XXXXXXXX configure-ovs.sh[3668]: + for file in "${files[@]}" May 04 11:35:24 XXXXXXXX configure-ovs.sh[3668]: ++ basename /etc/NetworkManager/systemConnectionsMerged/pteam0 slave 1-slave-ovs-clone.nmconnection May 04 11:35:24 XXXXXXXX configure-ovs.sh[3668]: basename: extra operand ‘1-slave-ovs-clone.nmconnection’ May 04 11:35:24 XXXXXXXX configure-ovs.sh[3668]: Try 'basename --help' for more information. May 04 11:35:24 XXXXXXXX configure-ovs.sh[3668]: + file= May 04 11:35:24 XXXXXXXX systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=1/FAILURE May 04 11:35:24 XXXXXXXX systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. May 04 11:35:24 XXXXXXXX systemd[1]: Failed to start Configures OVS with proper host networking configuration. May 04 11:35:24 XXXXXXXX systemd[1]: ovs-configuration.service: Consumed 1.783s CPU time Version-Release number of selected component (if applicable): Red Hat OpenShift Container Platform 4.9.28 Red Hat Enterprise Linux 8.5 How reproducible: Unknown Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Initial configure-ovs.sh runs successfully, and sets up host network configuration to allow it to be added to the cluster Additional info: