Description of problem: There are images, like Fedora Core OS that already dropped platform-python from the available executables. In such cases, the NetworkManager dispatcher script that we use to place the correct nameserver to /etc/resolv.conf, fails and prevents successful clustering. This is specially important in deployments with multiple interfaces. It also affects, in multiple interface environments, the ability to set the listening IP address for CRI-O and Kubelet. Version-Release number of selected component (if applicable):4.5 How reproducible: 100% Steps to Reproduce: 1. Trigger a deployment with an image missing platform-python (can be virt-editted away) Actual results: Clustering fails Expected results: Clustering succeeds. /etc/resolv.conf on the worker nodes has the first nameserver pointing to the same worker node's control plane address. sudo systemctl status crio.service shows the control plane address in the main process command line. sudo systemctl status kubelet.service shows the control plane address in the main process command line. Additional info: For some time, the Fedora CoreOS branch of Machine Config Operator has had to carry a divergent patch to address this issue.
Verified on the "latest" build 27/04 Host: ocp-edge10.lab.eng.tlv2.redhat.com IP addresses are correct for all the nodes. Output for master-0-1: [core@master-0-1 ~]$ cat /etc/resolv.conf # Generated by KNI resolv prepender NM dispatcher script search ocp-edge-cluster-0.qe.lab.redhat.com nameserver fd2e:6f44:5dd8::11b nameserver fe80::5054:ff:fe4d:b69a%enp5s0 nameserver fd2e:6f44:5dd8::1 [core@master-0-1 ~]$ sudo systemctl status crio.service ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-default-env.conf, 20-nodenet.conf, 20-stream-address.conf Active: active (running) since Mon 2020-04-27 14:06:32 UTC; 1h 36min ago Docs: https://github.com/cri-o/cri-o Main PID: 1672 (crio) Tasks: 39 Memory: 5.7G CPU: 11min 57ms CGroup: /system.slice/crio.service └─1672 /usr/bin/crio --stream-address=fd2e:6f44:5dd8::11b --enable-metrics=true --metrics-port=9537 Apr 27 15:43:09 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:09.161967173Z" level=info msg="Exec'd > Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.270719676Z" level=info msg="Exec'd > Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.720100037Z" level=info msg="Exec'd > Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.929528046Z" level=info msg="Exec'd > Apr 27 15:43:11 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:11.244429400Z" level=info msg="Exec'd > Apr 27 15:43:13 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:13.825194755Z" level=info msg="Exec'd > Apr 27 15:43:14 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:14.911572656Z" level=info msg="Exec'd > Apr 27 15:43:15 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:15.250500734Z" level=info msg="Exec'd > Apr 27 15:43:15 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:15.936711382Z" level=info msg="Exec'd > Apr 27 15:43:16 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:16.252798765Z" level=info msg="Exec'd > node-b5wpt/ovs-daemons" id=4fef4118-f2d6-4823-81b3-6069aa0f7c0c name=/runtime.v1alpha2.RuntimeService/ExecSync lines 1-24/24 (END) [core@master-0-1 ~]$ sudo systemctl status kubelet.service ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-default-env.conf, 20-nodenet.conf Active: active (running) since Mon 2020-04-27 14:06:32 UTC; 1h 40min ago Process: 1716 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Process: 1714 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Main PID: 1718 (kubelet) Tasks: 43 Memory: 277.4M CPU: 9min 45.632s CGroup: /system.slice/kubelet.service └─1718 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=fd2e:6f44:5dd8::11b --address=fd2e:6f44:5dd8::11b --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --v=3 Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: --check-interval duration Time between monitor checks (default 6s) Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: -h, --help help for monitor Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: --lb-port uint16 Port where the API HAProxy LB will listen at (default 7443) Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: --stat-port uint16 Port where the HAProxy stats API will listen at (default 50000) Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: time="2020-04-27T14:07:31Z" level=fatal msg="Failed due to write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe" Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: ,StartedAt:2020-04-27 14:06:43 +0000 UTC,FinishedAt:2020-04-27 14:07:31 +0000 UTC,ContainerID:cri-o://206bff632cfdf99a1f6696c464463b6904ce17460602338b7638005d5d> Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016075 1718 volume_manager.go:372] Waiting for volumes to attach and mount for pod "haproxy-master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com_op> Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016162 1718 volume_manager.go:403] All volumes are attached and mounted for pod "haproxy-master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com_opens> Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016675 1718 kuberuntime_manager.go:650] computePodActions got {KillPod:false CreateSandbox:false SandboxID:c3112bfc493fe1b87c858c48e5e20a10d4a> Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.260268 1718 prober.go:133] Readiness probe for "ovnkube-node-mjnlh_openshift-ovn-kubernetes(ab9529fe-d9cc-4765-a3e7-e2dfc4567aa5):ovnkube-node>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409