Description of problem: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95 The CI job failed to upgrade from 4.2 to 4.2. The master log contains the following log after a scheduled upgrade reboot: Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319337165Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("9e0ae156a44978547cb3c3bf6d33c7c44ff8c3c4be356beaccf6f13a94b95a9b"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319396157Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319433055Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("30df20c96d9adccc9e7f7efb539d92a2101a6d7352eda0536e527b32b890a876"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319441490Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319462315Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("91b3f497ab5d74252e5234ea277567025f5ba12dd329996a5eea5069842fd9c1"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319468883Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319482090Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("c0be228bcc3c33d311ac40fb5610423675e993098bdde18a01606891f7fb73c3"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319488175Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319501846Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("3c6bf7ab68dd4f44470e8fd6b358477f11875b96aa2ee551cfffb1429b5fe792"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319507912Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319521584Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("324de78280f28ffc42b7c7dfba7baa021c2aae3319233a158a28211c222939a1"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319527648Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319541724Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("621ca69dde4fc9740c7c05d399ab8cc546026705ad3434da5b594773f385f8a7"), and interface name ("eth0")" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319548496Z" level=error msg="CNI network "" not found" Sep 19 17:18:09 ip-10-0-153-74 crio[1088]: time="2019-09-19 17:18:09.319561372Z" level=error msg="error loading cached network config: cache file path requires network name (""), container ID ("74f0a825d32367b73339983c421c9729e467b5b47ad9f1eb15e57d9d1e9e6dd8"), and interface name ("eth0")" Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. CI build Actual results: Expected results: Additional info:
The errors appear to be from the code that was merged yesterday to get in ocicni fixes: https://github.com/cri-o/cri-o/pull/2800.
So, those errors are more like warnings. They are only really errors when the state was expected to be present, and the latest change added the new state, so when upgrading to the latest there would be nothing to read, and the old behavior is expected. But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become ready.
(In reply to Ben Bennett from comment #2) > But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become > ready. I think it does; you were looking at the sdn-c9g8w logs, but note that that pod started a few minutes *after* e2e-aws-upgrade declared defeat: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/build-log.txt: Sep 19 18:05:16.404: INFO: Unexpected error occurred: Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator kube-controller-manager is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-128-148.ec2.internal" not ready https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/artifacts/e2e-aws-upgrade/pods/openshift-sdn_sdn-c9g8w_sdn.log: I0919 18:08:44.009567 97623 node.go:145] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-128-148.ec2.internal" (IP "10.0.128.148") So the failure there is irrelevant; it happened while the cluster was in the process of being shut down. I think the cri-o bug is the only bug here.
The bug Dan is referring to is https://bugzilla.redhat.com/show_bug.cgi?id=1753988 *** This bug has been marked as a duplicate of bug 1753988 ***
I think I found the issue: https://github.com/cri-o/ocicni/pull/62
*** This bug has been marked as a duplicate of bug 1753988 ***
*** This bug has been marked as a duplicate of bug 1754434 ***
Unmarking as duplicate. Sorry for the spam.
The networking on the node failed to start. Dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1754638 *** This bug has been marked as a duplicate of bug 1754638 ***