Bug 2011747

Summary: Dual-stack with separate NICs per stack is crashing OVN-K8s during installation
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Mat Kowalski <mko>
Component: Infrastructure OperatorAssignee: Mat Kowalski <mko>
Status: CLOSED DUPLICATE QA Contact: bjacot
Severity: unspecified Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: rhacm-2.4CC: ccrum, trwest, yfirst
Target Milestone: ---   
Target Release: rhacm-2.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-13 11:43:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2011502    
Bug Blocks:    

Description Mat Kowalski 2021-10-07 09:11:09 UTC
+++ Problem

The following setup is causing OVN-K8s to crash during the installation of the OCP using AI

* dual-stack cluster
* separate network interfaces per stack, i.e.
    * 1st NIC with IPv4-only
    * 2nd NIC with IPv6-only

The installation times out with
* 2 out of 3 nodes timing out in "Configuring"
* bootstrap node timing out in "Waiting for control plane"

+++ Errors

Nodes show the following message

```
message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?'
```

but please note this is misleading - this error message only means that cluster-network-operator did not finish its work yet and does not indicate a real issue. That one comes from the following

```
[root@rdu-infra-edge-01 tmp]# oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             False       True          False      30m
[root@rdu-infra-edge-01 tmp]# oc -n openshift-ovn-kubernetes get pods
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-ksz6s   6/6     Running            6          30m
ovnkube-master-lmmhx   6/6     Running            3          30m
ovnkube-node-fjjx5     3/4     CrashLoopBackOff   10         30m
ovnkube-node-kqppf     3/4     CrashLoopBackOff   10         30m
```

+++ References

There is a BZ open against OVN-K8s - https://bugzilla.redhat.com/show_bug.cgi?id=2011502. Please note at this moment there is no official documentation mentioning this limitation in OCP.

+++ Potential solutions

1) Add a validator in AI ensuring that for dual-stack installation all the hosts have NIC holding both IPv4 and IPv6
2) Add a note in the documentation with this limitation explicitly stated

As much as (1) seems like an obvious choice, this will stop users from using AI to deploy OCP clusters with a CNI of their choice. This is now possible with a bit of manual tuning, e.g. https://cloudcult.dev/cilium-installation-openshift-assisted-installer. A strict validator would block use cases like this. Given that this limitation is purely CNI-related, (1) is not an obvious choice.

Comment 2 Mat Kowalski 2021-10-07 11:13:50 UTC
As decided in the Triage Management Meeting, we will not solve it by adding a validator but by adding a note in the UI and/or DOCS stating the limitation explicitly, e.g.

"""
Please note that for dual-stack installations you must provide a network interface with both IPv4 and IPv6 addresses. It is not supported to have a separate network interface for IPv4 and for IPv6.
"""

Comment 3 Mat Kowalski 2021-10-13 11:43:40 UTC

*** This bug has been marked as a duplicate of bug 2011502 ***