Bug 2011747

Summary:	Dual-stack with separate NICs per stack is crashing OVN-K8s during installation
Product:	Red Hat Advanced Cluster Management for Kubernetes	Reporter:	Mat Kowalski <mko>
Component:	Infrastructure Operator	Assignee:	Mat Kowalski <mko>
Status:	CLOSED DUPLICATE	QA Contact:	bjacot
Severity:	unspecified	Docs Contact:	Derek <dcadzow>
Priority:	unspecified
Version:	rhacm-2.4	CC:	ccrum, trwest, yfirst
Target Milestone:	---
Target Release:	rhacm-2.5
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-10-13 11:43:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2011502
Bug Blocks:

Description Mat Kowalski 2021-10-07 09:11:09 UTC

+++ Problem

The following setup is causing OVN-K8s to crash during the installation of the OCP using AI

* dual-stack cluster
* separate network interfaces per stack, i.e.
    * 1st NIC with IPv4-only
    * 2nd NIC with IPv6-only

The installation times out with
* 2 out of 3 nodes timing out in "Configuring"
* bootstrap node timing out in "Waiting for control plane"

+++ Errors

Nodes show the following message

```
message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?'
```

but please note this is misleading - this error message only means that cluster-network-operator did not finish its work yet and does not indicate a real issue. That one comes from the following

```
[root@rdu-infra-edge-01 tmp]# oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             False       True          False      30m
[root@rdu-infra-edge-01 tmp]# oc -n openshift-ovn-kubernetes get pods
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-ksz6s   6/6     Running            6          30m
ovnkube-master-lmmhx   6/6     Running            3          30m
ovnkube-node-fjjx5     3/4     CrashLoopBackOff   10         30m
ovnkube-node-kqppf     3/4     CrashLoopBackOff   10         30m
```

+++ References

There is a BZ open against OVN-K8s - https://bugzilla.redhat.com/show_bug.cgi?id=2011502. Please note at this moment there is no official documentation mentioning this limitation in OCP.

+++ Potential solutions

1) Add a validator in AI ensuring that for dual-stack installation all the hosts have NIC holding both IPv4 and IPv6
2) Add a note in the documentation with this limitation explicitly stated

As much as (1) seems like an obvious choice, this will stop users from using AI to deploy OCP clusters with a CNI of their choice. This is now possible with a bit of manual tuning, e.g. https://cloudcult.dev/cilium-installation-openshift-assisted-installer. A strict validator would block use cases like this. Given that this limitation is purely CNI-related, (1) is not an obvious choice.

Comment 2 Mat Kowalski 2021-10-07 11:13:50 UTC

As decided in the Triage Management Meeting, we will not solve it by adding a validator but by adding a note in the UI and/or DOCS stating the limitation explicitly, e.g.

"""
Please note that for dual-stack installations you must provide a network interface with both IPv4 and IPv6 addresses. It is not supported to have a separate network interface for IPv4 and for IPv6.
"""

Comment 3 Mat Kowalski 2021-10-13 11:43:40 UTC


*** This bug has been marked as a duplicate of bug 2011502 ***