Description of problem: If I apply a machine config to the worker pool, wait for it to be ready, and apply a second one, the node does not get back to the ready state. Version-Release number of selected component (if applicable): oc version Client Version: v4.2.0 Server Version: 4.4.0-0.nightly-2020-01-20-081123 Kubernetes Version: v1.17.0 How reproducible: Really often if not always Steps to Reproduce: 1. Apply any mc like: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: samplemc spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8,hello filesystem: root mode: 420 path: /tmp/foo.txt and wait for the MachineConfigPool to get back to the Updated state ``` oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-a377cf240a65d7a8fa30619f6c37fb90 True False False 1 1 1 0 47m worker rendered-worker-f866ee1180b333e57dde24e794b876d5 True False False 1 1 1 0 47m ``` 2. Apply another one: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: foo spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8,hello filesystem: root mode: 420 path: /tmp/foo1.txt Actual results: The first node rebooted never get Ready and the MCP is stuck in Updating Expected results: All the nodes get back to Ready and the MachineConfigPool is Updated. Additional info: The node is not ready because of: Ready False Mon, 20 Jan 2020 14:30:08 +0100 Mon, 20 Jan 2020 13:34:56 +0100 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network The sdn pod on that node is not able to start because it's not finding the ovs socket: ``` oc -n openshift-sdn get pod sdn-gxlgp -o yaml ... lastState: terminate d: containerID: cri-o://0998aef659e81d9ae0a641fe5ada78c564a1048a2d1b1965a4082f769c170d3f exitCode: 255 finishedAt: "2020-01-20T13:11:10Z" message: | 21186 healthcheck.go:42] waiting for OVS to start: dial unix /var/run/openvswitch/db.sock: connect: no such file or directory I0120 13:11:00.076144 21186 healthcheck.go:42] waiting for OVS to start: dial unix /var/run/openvswitch/db.sock: connect: no such file or directory I0120 13:11:01.075847 21186 healthcheck.go:42] waiting for OVS to start: dial unix /var/run/openvswitch/db.sock: connect: no such file or directory [....] F0120 13:11:10.076349 21186 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition ``` This is because the ovs pod is not starting (and I think this is the root cause). oc get pods -n openshift-sdn ovs-cjrzd NAME READY STATUS RESTARTS AGE ovs-cjrzd 0/1 Error 1 100m This is because of Warning FailedCreatePodSandBox 51m (x3 over 51m) kubelet, test1-5f65q-worker-0-sq48t (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = error reserving pod name k8s_ovs-cjrzd_openshift-sdn_046bb2b4-5ac7-48b4-9348-f79de18866d1_1 for id eab93df02f5d960d3e182fc719660cf30b736d06d7b0b6803bd07a887d250e4b: name is reserved Normal SandboxChanged 2m47s (x235 over 52m) kubelet, test1-5f65q-worker-0-sq48t Pod sandbox changed, it will be killed and re-created.
is this this one https://bugzilla.redhat.com/show_bug.cgi?id=1792749?
*** This bug has been marked as a duplicate of bug 1792749 ***