Bug 1889946 - Pod stuck in ContainerCreating due to error "failed to create pod network sandbox" and "netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input"
Summary: Pod stuck in ContainerCreating due to error "failed to create pod network san...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: GenadiC
URL:
Whiteboard:
Depends On: 1917441
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-21 03:30 UTC by Mohammad
Modified: 2024-03-25 16:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-16 10:51:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 420 0 None closed Bug 1889946: Bunch of CNI fixes related to cri-o 2021-02-15 17:06:41 UTC
Red Hat Product Errata RHBA-2021:0274 0 None None None 2021-02-03 18:40:16 UTC

Description Mohammad 2020-10-21 03:30:15 UTC
Description of problem:

Pod stuck in ContainerCreating due to error (from `oc get events`):

1- failed to create pod network sandbox
2- netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input


Version-Release number of selected component (if applicable): 3.11.272 and 3.11.232

How reproducible:

Unknown at this stage. It seems to appear on worker nodes that have many applications running and have been running for a longer period of time.

Steps to Reproduce (uncertain):
1. Install OCP3.11 with Kuryr on OSP13 with CRI-O
2. Put a load on the cluster (applications) and then deploy more applications


Actual results:

New applications are stuck in ContainerCreating.

Expected results:

New applications are created and running.

Additional info: The problem is resolved or removed by draining each node, performing the steps below, then uncordoning the node:

sudo systemctl disable crio
sudo systemctl disable atomic-openshift-node.service
sudo reboot
sudo rm -fr /var/lib/containers/*
sudo systemctl enable crio
sudo systemctl enable atomic-openshift-node.service
sudo systemctl start atomic-openshift-node.service
sudo systemctl start crio

We think it might have to do with the Kuryr cni. The kuryr controller allocates the ports on OpenStack, and annotates the pods with the new IPs, but the kuryr-cni is unable to attach the network to the pods.

Comment 15 Itzik Brown 2021-01-29 00:05:45 UTC
Ran tempest tests on v3.11.380 and all passed. (docker not cri-o)

Comment 19 errata-xmlrpc 2021-02-03 18:40:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.380 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0274

Comment 20 Itzik Brown 2021-02-15 17:14:26 UTC
When updating from v3.11.346 to v3.11.386 I got the following:

(shiftstack) [stack@undercloud-0 ~]$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
demo-68dbc445d-8dt5m       1/1       Running   0          7h
demo-68dbc445d-cw8p5       1/1       Running   0          7h
demo-68dbc445d-nrfxt       0/1       Error     0          7h
docker-registry-1-cm2wk    1/1       Running   0          8h
registry-console-1-h2lv9   0/1       Error     0          8h
router-1-8mkt2             1/1       Running   0          8h
router-1-9mtbp             1/1       Running   0          8h
router-1-bkcjf             1/1       Running   0          8h

and 
(shiftstack) [stack@undercloud-0 ~]$ oc get pods -n kuryr
NAME                                READY     STATUS             RESTARTS   AGE
kuryr-cni-ds-4g78t                  1/2       CrashLoopBackOff   21         1h
kuryr-cni-ds-565df                  2/2       Running            0          8h
kuryr-cni-ds-7gm75                  1/2       CrashLoopBackOff   19         1h
kuryr-cni-ds-j4nrl                  2/2       Running            0          8h
kuryr-cni-ds-jqt4j                  1/2       CrashLoopBackOff   23         1h
kuryr-cni-ds-l99xw                  2/2       Running            0          8h
kuryr-cni-ds-n5n8h                  2/2       Running            0          8h
kuryr-cni-ds-q9fr7                  2/2       Running            0          8h
kuryr-controller-74c988b946-tldhv   0/1       Running            21         1h

Comment 22 Itzik Brown 2021-02-16 10:51:12 UTC
Opened a bug: https://bugzilla.redhat.com/show_bug.cgi?id=1929170

Comment 23 Red Hat Bugzilla 2023-09-15 00:50:00 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.