Description of problem: With remote worker nodes infrastructure relying on openshift-kni-infra coredns will result in an error (With remote nodes) with the "render-config-coredns" init container in that it will be unable to render the coredns configuration to resolve the cluster's api-int address. This will prevent the node from joining the cluster. Version-Release number of selected component (if applicable): I believe all versions are affects however we discovered the issue with 4.6.16 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: # oc logs -n openshift-kni-infra coredns-f19-h07-000-r640 -c render-config-coredns Error: No interface nor address found for the given VIPs Usage: runtimecfg render [path to kubeconfig] [paths to render]... If there is one single path and it is a directory, it renders the .tmpl files in it [flags] Flags: --api-port uint16 Port where the OpenShift API listens at (default 6443) --api-vip ip Virtual IP Address to reach the OpenShift API -c, --cluster-config string Path to cluster-config ConfigMap to retrieve ControlPlane info --dns-vip ip Virtual IP Address to reach an OpenShift node resolving DNS server -h, --help help for render --ingress-vip ip Virtual IP Address to reach the OpenShift Ingress Routers --lb-port uint16 Port where the API HAProxy LB will listen at (default 9445) -o, --out-dir string Directory where the templates will be rendered -r, --resolvconf-path string Optional path to a resolv.conf file to use to get upstream DNS servers (default "/etc/resolv.conf") --stat-port uint16 Port where the HAProxy stats API will listen at (default 50000) --verbose Display extra information about the rendering time="2021-03-09T20:53:42Z" level=fatal msg="Error executing runtimecfg: No interface nor address found for the given VIPs" Expected results: Additional info: As a workaround you can overwrite the existing static pod configuration with a new coredns static pod without the rendering script and with a specific configuration to resolve your api-int address.
This issue affects a RWN deployment, in case the vip is in different subnet than the rwn. Also, it might be obsolete by those changes, assuming they will be introduced in 4.8: https://github.com/openshift/machine-config-operator/pull/2374 https://github.com/openshift/machine-config-operator/pull/2410
This is because 4.6 did not support deployment across multiple subnets. It will be fixed by the backport in https://github.com/openshift/baremetal-runtimecfg/pull/133, although it's not strictly a duplicate of that bug.
The PR for this has merged so it should be fixed now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.42 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3008