Bug 1828104

Summary:	OCICNI methods should have a context when called from CRI-O
Product:	OpenShift Container Platform	Reporter:	Mrunal Patel <mpatel>
Component:	Networking	Assignee:	mcambria <mcambria>
Networking sub component:	multus	QA Contact:	huirwang
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	bbennett, bparees, dosmith, huirwang, jdelft, lmohanty, mcambria, mpatel, scuppett, sttts, vlaad, weliang, wking, wzheng, xtian, zyu, zzhao
Version:	4.3.z
Target Milestone:	---
Target Release:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:	1826821	Environment:
Last Closed:	2020-04-30 01:28:30 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1826075, 1826821
Bug Blocks:	1826329

Description Mrunal Patel 2020-04-27 00:03:40 UTC

+++ This bug was initially created as a clone of Bug #1826821 +++

+++ This bug was initially created as a clone of Bug #1826075 +++

Description of problem: CI runs have exposed issues where there's i/o timeouts (specifically on GCE). When these issues are encountered, this can take over 4 minutes. In order to address this problem, OCICNI calls made from CRI-O should have a context so that they can be appropriately context cancelled. 

Example log messages:

```
25812:Apr 17 20:25:40.955016 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:40Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod console-58bbd4c4db-jvbkm in out of cluster comm: Get https://[api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com]:6443/api/v1/namespaces/openshift-console/pods/console-58bbd4c4db-jvbkm: dial tcp 10.0.0.2:6443: i/o timeout
```

And:

```
Apr 17 20:25:09.810897 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:09Z [error] error in getting result from AddNetwork: CNI request failed with status 400: 'Get https://api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-4-ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal: dial tcp 10.0.0.2:6443: i/o timeout
```

Discovered while people were troubleshooting this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1785399

How reproducible: (Unknown)

Fix: Mrunal Patel has suggested that "all calls to netPlugin from https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go should have context in ocicni"


Examples where ocicni is called:

* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L44
* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L53

--- Additional comment from Mrunal Patel on 2020-04-23 15:38:02 UTC ---

https://github.com/cri-o/cri-o/pull/3632 is opened for master and will be cherry-picked for 4.4

--- Additional comment from mcambria on 2020-04-23 18:20:14 UTC ---


PR's

master: https://github.com/cri-o/cri-o/pull/3632

release 1.17: https://github.com/cri-o/cri-o/pull/3644

--- Additional comment from Mrunal Patel on 2020-04-24 21:10:25 UTC ---

This has now been folded into https://github.com/cri-o/cri-o/pull/3659.

Comment 11 errata-xmlrpc 2020-04-30 01:28:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1529