1753801 – 4.2 to 4.2 upgrade caused crio to error with 'error loading cached network config'

Bug 1753801 - 4.2 to 4.2 upgrade caused crio to error with 'error loading cached network config'

Summary: 4.2 to 4.2 upgrade caused crio to error with 'error loading cached network co...

Keywords:
Status:	CLOSED DUPLICATE of bug 1754638
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Dan Williams
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-19 22:36 UTC by Ryan Phillips
Modified:	2019-09-23 20:14 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-23 20:14:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ryan Phillips 2019-09-19 22:36:27 UTC

Description of problem:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95

The CI job failed to upgrade from 4.2 to 4.2. The master log contains the following log after a scheduled upgrade reboot:

  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319337165Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("9e0ae156a44978547cb3c3bf6d33c7c44ff8c3c4be356beaccf6f13a94b95a9b"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319396157Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319433055Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("30df20c96d9adccc9e7f7efb539d92a2101a6d7352eda0536e527b32b890a876"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319441490Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319462315Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("91b3f497ab5d74252e5234ea277567025f5ba12dd329996a5eea5069842fd9c1"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319468883Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319482090Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("c0be228bcc3c33d311ac40fb5610423675e993098bdde18a01606891f7fb73c3"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319488175Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319501846Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("3c6bf7ab68dd4f44470e8fd6b358477f11875b96aa2ee551cfffb1429b5fe792"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319507912Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319521584Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("324de78280f28ffc42b7c7dfba7baa021c2aae3319233a158a28211c222939a1"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319527648Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319541724Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("621ca69dde4fc9740c7c05d399ab8cc546026705ad3434da5b594773f385f8a7"), and interface name ("eth0")"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319548496Z" level=error msg="CNI network "" not found"
  Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319561372Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("74f0a825d32367b73339983c421c9729e467b5b47ad9f1eb15e57d9d1e9e6dd8"), and interface name ("eth0")"

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. CI build

Actual results:


Expected results:


Additional info:

Comment 1 Mrunal Patel 2019-09-19 22:50:39 UTC

The errors appear to be from the code that was merged yesterday to get in ocicni fixes: https://github.com/cri-o/cri-o/pull/2800.

Comment 2 Ben Bennett 2019-09-20 14:18:32 UTC

So, those errors are more like warnings.  They are only really errors when the state was expected to be present, and the latest change added the new state, so when upgrading to the latest there would be nothing to read, and the old behavior is expected.

But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become ready.

Comment 3 Dan Winship 2019-09-20 17:21:47 UTC

(In reply to Ben Bennett from comment #2)
> But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become
> ready.

I think it does; you were looking at the sdn-c9g8w logs, but note that that pod started a few minutes *after* e2e-aws-upgrade declared defeat:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/build-log.txt:

    Sep 19 18:05:16.404: INFO: Unexpected error occurred: Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator kube-controller-manager is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-128-148.ec2.internal" not ready

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/artifacts/e2e-aws-upgrade/pods/openshift-sdn_sdn-c9g8w_sdn.log:

    I0919 18:08:44.009567   97623 node.go:145] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-128-148.ec2.internal" (IP "10.0.128.148")

So the failure there is irrelevant; it happened while the cluster was in the process of being shut down. I think the cri-o bug is the only bug here.

Comment 4 Ben Bennett 2019-09-20 17:55:01 UTC

The bug Dan is referring to is https://bugzilla.redhat.com/show_bug.cgi?id=1753988

*** This bug has been marked as a duplicate of bug 1753988 ***

Comment 5 Ryan Phillips 2019-09-21 00:35:13 UTC

I think I found the issue: https://github.com/cri-o/ocicni/pull/62

Comment 7 Casey Callendrello 2019-09-23 13:09:46 UTC


*** This bug has been marked as a duplicate of bug 1753988 ***

Comment 8 Casey Callendrello 2019-09-23 13:10:15 UTC


*** This bug has been marked as a duplicate of bug 1754434 ***

Comment 9 Casey Callendrello 2019-09-23 15:24:15 UTC

Unmarking as duplicate. Sorry for the spam.

Comment 11 Ben Bennett 2019-09-23 20:14:43 UTC

The networking on the node failed to start.  Dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1754638

*** This bug has been marked as a duplicate of bug 1754638 ***

Note You need to log in before you can comment on or make changes to this bug.