1826075 – OCICNI methods should accept a context from CRI-O

Bug 1826075 - OCICNI methods should accept a context from CRI-O

Summary: OCICNI methods should accept a context from CRI-O

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Douglas Smith
QA Contact:	Weibin Liang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1826821 1828104
TreeView+	depends on / blocked

Reported:	2020-04-20 19:49 UTC by Douglas Smith
Modified:	2020-07-13 17:29 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1826821 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:29:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:29:31 UTC

Description Douglas Smith 2020-04-20 19:49:41 UTC

Description of problem: CI runs have exposed issues where there's i/o timeouts (specifically on GCE). When these issues are encountered, this can take over 4 minutes. In order to address this problem, OCICNI calls made from CRI-O should have a context so that they can be appropriately context cancelled. 

Example log messages:

```
25812:Apr 17 20:25:40.955016 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:40Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod console-58bbd4c4db-jvbkm in out of cluster comm: Get https://[api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com]:6443/api/v1/namespaces/openshift-console/pods/console-58bbd4c4db-jvbkm: dial tcp 10.0.0.2:6443: i/o timeout
```

And:

```
Apr 17 20:25:09.810897 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:09Z [error] error in getting result from AddNetwork: CNI request failed with status 400: 'Get https://api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-4-ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal: dial tcp 10.0.0.2:6443: i/o timeout
```

Discovered while people were troubleshooting this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1785399

How reproducible: (Unknown)

Fix: Mrunal Patel has suggested that "all calls to netPlugin from https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go should have context in ocicni"


Examples where ocicni is called:

* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L44
* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L53

Comment 1 Douglas Smith 2020-04-22 17:38:41 UTC

dcbw submitted this PR to OCICNI @ https://github.com/cri-o/ocicni/pull/72

Comment 2 Urvashi Mohnani 2020-05-06 19:31:40 UTC

Latest 1.18.5 build https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1185618 and Latest Openshift 4.5 nightly has the ocicni updates in it - https://releases-rhcos-art.cloud.privileged.psi.redhat.com/contents.html?stream=releases%2Frhcos-4.5&release=45.82.202005061429-0

Comment 6 Weibin Liang 2020-05-12 19:37:53 UTC

Tested and verified in 4.5.0-0.nightly-2020-05-11-114800 in GCE cluster

[weliang@weliang verification-tests]$ oc debug node/welian-p6qgb-w-a-86g2f.c.openshift-qe.internal
Starting pod/welian-p6qgb-w-a-86g2fcopenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 10.0.32.2
If you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# journalctl -u crio | grep "failed to destroy network for pod sandbox"
sh-4.4# rpm -qa | grep "cri-o"
cri-o-1.18.0-17.dev.rhaos4.5.gitdea34b9.el8.x86_64

sh-4.4# journalctl -u crio | grep "failed to destroy network for pod sandbox"
sh-4.4# journalctl -u crio | grep "stopping network on cleanup"
sh-4.4# journalctl -u crio | grep "Multus: error"
sh-4.4# journalctl -u crio | grep "CNI request failed"

Comment 7 errata-xmlrpc 2020-07-13 17:29:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.