Bug 2011747
| Summary: | Dual-stack with separate NICs per stack is crashing OVN-K8s during installation | ||
|---|---|---|---|
| Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Mat Kowalski <mko> |
| Component: | Infrastructure Operator | Assignee: | Mat Kowalski <mko> |
| Status: | CLOSED DUPLICATE | QA Contact: | bjacot |
| Severity: | unspecified | Docs Contact: | Derek <dcadzow> |
| Priority: | unspecified | ||
| Version: | rhacm-2.4 | CC: | ccrum, trwest, yfirst |
| Target Milestone: | --- | ||
| Target Release: | rhacm-2.5 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-13 11:43:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2011502 | ||
| Bug Blocks: | |||
As decided in the Triage Management Meeting, we will not solve it by adding a validator but by adding a note in the UI and/or DOCS stating the limitation explicitly, e.g. """ Please note that for dual-stack installations you must provide a network interface with both IPv4 and IPv6 addresses. It is not supported to have a separate network interface for IPv4 and for IPv6. """ *** This bug has been marked as a duplicate of bug 2011502 *** |
+++ Problem The following setup is causing OVN-K8s to crash during the installation of the OCP using AI * dual-stack cluster * separate network interfaces per stack, i.e. * 1st NIC with IPv4-only * 2nd NIC with IPv6-only The installation times out with * 2 out of 3 nodes timing out in "Configuring" * bootstrap node timing out in "Waiting for control plane" +++ Errors Nodes show the following message ``` message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?' ``` but please note this is misleading - this error message only means that cluster-network-operator did not finish its work yet and does not indicate a real issue. That one comes from the following ``` [root@rdu-infra-edge-01 tmp]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network False True False 30m [root@rdu-infra-edge-01 tmp]# oc -n openshift-ovn-kubernetes get pods NAME READY STATUS RESTARTS AGE ovnkube-master-ksz6s 6/6 Running 6 30m ovnkube-master-lmmhx 6/6 Running 3 30m ovnkube-node-fjjx5 3/4 CrashLoopBackOff 10 30m ovnkube-node-kqppf 3/4 CrashLoopBackOff 10 30m ``` +++ References There is a BZ open against OVN-K8s - https://bugzilla.redhat.com/show_bug.cgi?id=2011502. Please note at this moment there is no official documentation mentioning this limitation in OCP. +++ Potential solutions 1) Add a validator in AI ensuring that for dual-stack installation all the hosts have NIC holding both IPv4 and IPv6 2) Add a note in the documentation with this limitation explicitly stated As much as (1) seems like an obvious choice, this will stop users from using AI to deploy OCP clusters with a CNI of their choice. This is now possible with a bit of manual tuning, e.g. https://cloudcult.dev/cilium-installation-openshift-assisted-installer. A strict validator would block use cases like this. Given that this limitation is purely CNI-related, (1) is not an obvious choice.