Bug 1826821 - OCICNI methods should accept a context from CRI-O
Summary: OCICNI methods should accept a context from CRI-O
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.4.0
Assignee: mcambria@redhat.com
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On: 1826075
Blocks: 1826329 1828104
TreeView+ depends on / blocked
 
Reported: 2020-04-22 15:33 UTC by Ben Bennett
Modified: 2020-05-04 11:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1826075
: 1828104 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:50:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 3644 0 None closed [Release 1.17] Pass context supplied to cri-o calls to ocicni 2020-05-12 06:15:36 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:50:32 UTC

Description Ben Bennett 2020-04-22 15:33:35 UTC
+++ This bug was initially created as a clone of Bug #1826075 +++

Description of problem: CI runs have exposed issues where there's i/o timeouts (specifically on GCE). When these issues are encountered, this can take over 4 minutes. In order to address this problem, OCICNI calls made from CRI-O should have a context so that they can be appropriately context cancelled. 

Example log messages:

```
25812:Apr 17 20:25:40.955016 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:40Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod console-58bbd4c4db-jvbkm in out of cluster comm: Get https://[api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com]:6443/api/v1/namespaces/openshift-console/pods/console-58bbd4c4db-jvbkm: dial tcp 10.0.0.2:6443: i/o timeout
```

And:

```
Apr 17 20:25:09.810897 ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal crio[1361]: 2020-04-17T20:25:09Z [error] error in getting result from AddNetwork: CNI request failed with status 400: 'Get https://api-int.ci-op-6gj7wwlt-2aad9.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-4-ci-op-fk9tx-m-0.c.openshift-gce-devel-ci.internal: dial tcp 10.0.0.2:6443: i/o timeout
```

Discovered while people were troubleshooting this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1785399

How reproducible: (Unknown)

Fix: Mrunal Patel has suggested that "all calls to netPlugin from https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go should have context in ocicni"


Examples where ocicni is called:

* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L44
* https://github.com/cri-o/cri-o/blob/release-1.17/server/sandbox_network.go#L53

Comment 1 Mrunal Patel 2020-04-23 15:38:02 UTC
https://github.com/cri-o/cri-o/pull/3632 is opened for master and will be cherry-picked for 4.4

Comment 6 zhaozhanqi 2020-04-26 05:41:40 UTC
Verified this bug on 4.4.0-0.nightly-2020-04-25-191512 with cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8 with GCP cluster

Check the crio logs, No above network logs found. 

.4# journalctl -u crio | grep "Network" | grep "error"
sh-4.4#

Comment 8 errata-xmlrpc 2020-05-04 11:50:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.