Bug 2011747 - Dual-stack with separate NICs per stack is crashing OVN-K8s during installation
Summary: Dual-stack with separate NICs per stack is crashing OVN-K8s during installation
Keywords:
Status: CLOSED DUPLICATE of bug 2011502
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: rhacm-2.5
Assignee: Mat Kowalski
QA Contact: bjacot
Derek
URL:
Whiteboard:
Depends On: 2011502
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-07 09:11 UTC by Mat Kowalski
Modified: 2021-10-13 11:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-13 11:43:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 16944 0 None None None 2021-10-07 13:18:23 UTC

Description Mat Kowalski 2021-10-07 09:11:09 UTC
+++ Problem

The following setup is causing OVN-K8s to crash during the installation of the OCP using AI

* dual-stack cluster
* separate network interfaces per stack, i.e.
    * 1st NIC with IPv4-only
    * 2nd NIC with IPv6-only

The installation times out with
* 2 out of 3 nodes timing out in "Configuring"
* bootstrap node timing out in "Waiting for control plane"

+++ Errors

Nodes show the following message

```
message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?'
```

but please note this is misleading - this error message only means that cluster-network-operator did not finish its work yet and does not indicate a real issue. That one comes from the following

```
[root@rdu-infra-edge-01 tmp]# oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             False       True          False      30m
[root@rdu-infra-edge-01 tmp]# oc -n openshift-ovn-kubernetes get pods
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-ksz6s   6/6     Running            6          30m
ovnkube-master-lmmhx   6/6     Running            3          30m
ovnkube-node-fjjx5     3/4     CrashLoopBackOff   10         30m
ovnkube-node-kqppf     3/4     CrashLoopBackOff   10         30m
```

+++ References

There is a BZ open against OVN-K8s - https://bugzilla.redhat.com/show_bug.cgi?id=2011502. Please note at this moment there is no official documentation mentioning this limitation in OCP.

+++ Potential solutions

1) Add a validator in AI ensuring that for dual-stack installation all the hosts have NIC holding both IPv4 and IPv6
2) Add a note in the documentation with this limitation explicitly stated

As much as (1) seems like an obvious choice, this will stop users from using AI to deploy OCP clusters with a CNI of their choice. This is now possible with a bit of manual tuning, e.g. https://cloudcult.dev/cilium-installation-openshift-assisted-installer. A strict validator would block use cases like this. Given that this limitation is purely CNI-related, (1) is not an obvious choice.

Comment 2 Mat Kowalski 2021-10-07 11:13:50 UTC
As decided in the Triage Management Meeting, we will not solve it by adding a validator but by adding a note in the UI and/or DOCS stating the limitation explicitly, e.g.

"""
Please note that for dual-stack installations you must provide a network interface with both IPv4 and IPv6 addresses. It is not supported to have a separate network interface for IPv4 and for IPv6.
"""

Comment 3 Mat Kowalski 2021-10-13 11:43:40 UTC

*** This bug has been marked as a duplicate of bug 2011502 ***


Note You need to log in before you can comment on or make changes to this bug.