Bug 1889946 - Pod stuck in ContainerCreating due to error "failed to create pod network sandbox" and "netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input"
Summary: Pod stuck in ContainerCreating due to error "failed to create pod network san...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-21 03:30 UTC by Mohammad
Modified: 2020-11-26 16:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Mohammad 2020-10-21 03:30:15 UTC
Description of problem:

Pod stuck in ContainerCreating due to error (from `oc get events`):

1- failed to create pod network sandbox
2- netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input


Version-Release number of selected component (if applicable): 3.11.272 and 3.11.232

How reproducible:

Unknown at this stage. It seems to appear on worker nodes that have many applications running and have been running for a longer period of time.

Steps to Reproduce (uncertain):
1. Install OCP3.11 with Kuryr on OSP13 with CRI-O
2. Put a load on the cluster (applications) and then deploy more applications


Actual results:

New applications are stuck in ContainerCreating.

Expected results:

New applications are created and running.

Additional info: The problem is resolved or removed by draining each node, performing the steps below, then uncordoning the node:

sudo systemctl disable crio
sudo systemctl disable atomic-openshift-node.service
sudo reboot
sudo rm -fr /var/lib/containers/*
sudo systemctl enable crio
sudo systemctl enable atomic-openshift-node.service
sudo systemctl start atomic-openshift-node.service
sudo systemctl start crio

We think it might have to do with the Kuryr cni. The kuryr controller allocates the ports on OpenStack, and annotates the pods with the new IPs, but the kuryr-cni is unable to attach the network to the pods.


Note You need to log in before you can comment on or make changes to this bug.