Bug 1826211 - Virtual IP and multiple interface do not work in images without python
Summary: Virtual IP and multiple interface do not work in images without python
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-21 08:54 UTC by Antoni Segura Puimedon
Modified: 2020-07-13 17:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The functionality to find node addresses in the control plane and configure them for the Container runtime and Kubelet was implemented in Python. Some platforms use an image that does not come with Python. Consequence: Deployments on such Platforms (OKD and FCOS) would fail to manage the addresses correclty Fix: Replace the Python scripts with a binary implementation that is run from the runtime configuration container. Result: All platforms can run Virtual IP management regardless of them providing Python or not.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:29:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1659 0 None closed Bug 1826211: drop python usage for node-ip functionality 2020-11-16 17:33:10 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:29:46 UTC

Description Antoni Segura Puimedon 2020-04-21 08:54:47 UTC
Description of problem:
There are images, like Fedora Core OS that already dropped platform-python from the available executables. In such cases, the NetworkManager dispatcher script that we use to place the correct nameserver to /etc/resolv.conf, fails and prevents successful clustering. This is specially important in deployments with multiple interfaces.

It also affects, in multiple interface environments, the ability to set the listening IP address for CRI-O and Kubelet.

Version-Release number of selected component (if applicable):4.5


How reproducible: 100%


Steps to Reproduce:
1. Trigger a deployment with an image missing platform-python (can be virt-editted away)

Actual results:
Clustering fails

Expected results:
Clustering succeeds. /etc/resolv.conf on the worker nodes has the first nameserver pointing to the same worker node's control plane address.

sudo systemctl status crio.service shows the control plane address in the main process command line.

sudo systemctl status kubelet.service shows the control plane address in the main process command line.

Additional info: For some time, the Fedora CoreOS branch of Machine Config Operator has had to carry a divergent patch to address this issue.

Comment 3 Aleksandra Malykhin 2020-04-27 15:53:17 UTC
Verified on the "latest" build 27/04
Host: ocp-edge10.lab.eng.tlv2.redhat.com

IP addresses are correct for all the nodes.




Output for master-0-1:

[core@master-0-1 ~]$ cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fd2e:6f44:5dd8::11b
nameserver fe80::5054:ff:fe4d:b69a%enp5s0
nameserver fd2e:6f44:5dd8::1
[core@master-0-1 ~]$ sudo systemctl status crio.service
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf, 20-nodenet.conf, 20-stream-address.conf
   Active: active (running) since Mon 2020-04-27 14:06:32 UTC; 1h 36min ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 1672 (crio)
    Tasks: 39
   Memory: 5.7G
      CPU: 11min 57ms
   CGroup: /system.slice/crio.service
           └─1672 /usr/bin/crio --stream-address=fd2e:6f44:5dd8::11b --enable-metrics=true --metrics-port=9537

Apr 27 15:43:09 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:09.161967173Z" level=info msg="Exec'd >
Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.270719676Z" level=info msg="Exec'd >
Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.720100037Z" level=info msg="Exec'd >
Apr 27 15:43:10 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:10.929528046Z" level=info msg="Exec'd >
Apr 27 15:43:11 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:11.244429400Z" level=info msg="Exec'd >
Apr 27 15:43:13 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:13.825194755Z" level=info msg="Exec'd >
Apr 27 15:43:14 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:14.911572656Z" level=info msg="Exec'd >
Apr 27 15:43:15 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:15.250500734Z" level=info msg="Exec'd >
Apr 27 15:43:15 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:15.936711382Z" level=info msg="Exec'd >
Apr 27 15:43:16 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com crio[1672]: time="2020-04-27 15:43:16.252798765Z" level=info msg="Exec'd >
node-b5wpt/ovs-daemons" id=4fef4118-f2d6-4823-81b3-6069aa0f7c0c name=/runtime.v1alpha2.RuntimeService/ExecSync
lines 1-24/24 (END)

[core@master-0-1 ~]$ sudo systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf, 20-nodenet.conf
   Active: active (running) since Mon 2020-04-27 14:06:32 UTC; 1h 40min ago
  Process: 1716 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 1714 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 1718 (kubelet)
    Tasks: 43
   Memory: 277.4M
      CPU: 9min 45.632s
   CGroup: /system.slice/kubelet.service
           └─1718 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=fd2e:6f44:5dd8::11b --address=fd2e:6f44:5dd8::11b --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --v=3

Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]:       --check-interval duration   Time between monitor checks (default 6s)
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]:   -h, --help                      help for monitor
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]:       --lb-port uint16            Port where the API HAProxy LB will listen at (default 7443)
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]:       --stat-port uint16          Port where the HAProxy stats API will listen at (default 50000)
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: time="2020-04-27T14:07:31Z" level=fatal msg="Failed due to write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe"
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: ,StartedAt:2020-04-27 14:06:43 +0000 UTC,FinishedAt:2020-04-27 14:07:31 +0000 UTC,ContainerID:cri-o://206bff632cfdf99a1f6696c464463b6904ce17460602338b7638005d5d>
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016075    1718 volume_manager.go:372] Waiting for volumes to attach and mount for pod "haproxy-master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com_op>
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016162    1718 volume_manager.go:403] All volumes are attached and mounted for pod "haproxy-master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com_opens>
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.016675    1718 kuberuntime_manager.go:650] computePodActions got {KillPod:false CreateSandbox:false SandboxID:c3112bfc493fe1b87c858c48e5e20a10d4a>
Apr 27 15:49:20 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com hyperkube[1718]: I0427 15:49:20.260268    1718 prober.go:133] Readiness probe for "ovnkube-node-mjnlh_openshift-ovn-kubernetes(ab9529fe-d9cc-4765-a3e7-e2dfc4567aa5):ovnkube-node>

Comment 4 errata-xmlrpc 2020-07-13 17:29:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.