Bug 1753801
Summary: | 4.2 to 4.2 upgrade caused crio to error with 'error loading cached network config' | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Phillips <rphillips> |
Component: | Networking | Assignee: | Dan Williams <dcbw> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aos-bugs, bbennett, ccoleman, cdc, danw, dcbw, jokerman, mcambria, mpatel, yinzhou |
Version: | 4.2.0 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-09-23 20:14:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ryan Phillips
2019-09-19 22:36:27 UTC
The errors appear to be from the code that was merged yesterday to get in ocicni fixes: https://github.com/cri-o/cri-o/pull/2800. So, those errors are more like warnings. They are only really errors when the state was expected to be present, and the latest change added the new state, so when upgrading to the latest there would be nothing to read, and the old behavior is expected. But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become ready. (In reply to Ben Bennett from comment #2) > But that doesn't explain why node ip-10-0-128-148.ec2.internal didn't become > ready. I think it does; you were looking at the sdn-c9g8w logs, but note that that pod started a few minutes *after* e2e-aws-upgrade declared defeat: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/build-log.txt: Sep 19 18:05:16.404: INFO: Unexpected error occurred: Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator kube-controller-manager is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-128-148.ec2.internal" not ready https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_insights-operator/16/pull-ci-openshift-insights-operator-master-e2e-aws-upgrade/95/artifacts/e2e-aws-upgrade/pods/openshift-sdn_sdn-c9g8w_sdn.log: I0919 18:08:44.009567 97623 node.go:145] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-128-148.ec2.internal" (IP "10.0.128.148") So the failure there is irrelevant; it happened while the cluster was in the process of being shut down. I think the cri-o bug is the only bug here. The bug Dan is referring to is https://bugzilla.redhat.com/show_bug.cgi?id=1753988 *** This bug has been marked as a duplicate of bug 1753988 *** I think I found the issue: https://github.com/cri-o/ocicni/pull/62 *** This bug has been marked as a duplicate of bug 1753988 *** *** This bug has been marked as a duplicate of bug 1754434 *** Unmarking as duplicate. Sorry for the spam. The networking on the node failed to start. Dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1754638 *** This bug has been marked as a duplicate of bug 1754638 *** |