Description of problem: This is a rare but a very real bug I noticed during testing a new functionality that creates MachineConfigs. Node goes into unschedulable after applying a MC, MCP never updates and MCD reports issues "failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout". Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-16-000734 True False 6h37m Cluster version is 4.6.0-0.nightly-2020-09-16-000734 How reproducible: Rare. In the past, I was noticing these issues, but this is the first time I caught it in time prior to doing more changes on the cluster. Steps to Reproduce: 1. Create a machineconfig using a custom MCP causing a node reboot. Actual results: On a rare occassion (one in ~20?) times a node will go into unschedulable and the custom MCP never updates. MCD reporting: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout Expected results: Node in schedulable, MCP updated. Additional info: Other pods on the affected node have the same issue as the mcd pod. The must-gather below was taken "after" I've set the node into schedulable and deleted the mcp pod. http://file.rdu.redhat.com/jmencak/bugzilla/2020-09-19/must-gather.local.7417114535902373082.tar.xz
*** This bug has been marked as a duplicate of bug 1874696 ***