Description of problem:
Pod stuck in ContainerCreating due to error (from `oc get events`):
1- failed to create pod network sandbox
2- netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
Version-Release number of selected component (if applicable): 3.11.272 and 3.11.232
Unknown at this stage. It seems to appear on worker nodes that have many applications running and have been running for a longer period of time.
Steps to Reproduce (uncertain):
1. Install OCP3.11 with Kuryr on OSP13 with CRI-O
2. Put a load on the cluster (applications) and then deploy more applications
New applications are stuck in ContainerCreating.
New applications are created and running.
Additional info: The problem is resolved or removed by draining each node, performing the steps below, then uncordoning the node:
sudo systemctl disable crio
sudo systemctl disable atomic-openshift-node.service
sudo rm -fr /var/lib/containers/*
sudo systemctl enable crio
sudo systemctl enable atomic-openshift-node.service
sudo systemctl start atomic-openshift-node.service
sudo systemctl start crio
We think it might have to do with the Kuryr cni. The kuryr controller allocates the ports on OpenStack, and annotates the pods with the new IPs, but the kuryr-cni is unable to attach the network to the pods.