Bug 1693247
Summary: | [upgrade] upgrade failed due to the "failed to create pod network sandbox" "netplugin failed but error parsing its diagnostic message" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> |
Component: | Networking | Assignee: | Casey Callendrello <cdc> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | aos-bugs, bbennett, cdc, chezhang, dyan, jfan, jokerman, mmccomas, scolange, sponnaga, zitang |
Version: | 4.1.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-18 13:02:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jian Zhang
2019-03-27 11:36:23 UTC
Unfortunately, this cluster has been removed. I will launch a new cluster. Can you provide the logs from the sdn pods (masters, nodes, and ovs)? Thanks. I have a hunch the problem may be due to cert rotation causing trouble, but I could be wrong. @rvokal identified that it is probably a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1692408 Ben, > Can you provide the logs from the sdn pods (masters, nodes, and ovs)? Thanks. I'm sorry, as described in comment 1, the cluster had been removed, and not all nodes sdn logs preserved, only one: mac:~ jianzhang$ oc logs sdn-gd8zz -n openshift-sdn Error from server: Get https://ip-172-31-159-215.us-east-2.compute.internal:10250/containerLogs/openshift-sdn/sdn-gd8zz/sdn: x509: certificate has expired or is not yet valid I will try to reproduce this, but it's not always happening. Seth, > This there a cluster where this is currently happening that I can look at? Sorry, as I described in comment 1, that issued cluster had been removed. And, this issue doesn't always happen. I'm trying to reproduce it, but it was blocked due to no available upgrade graph, details in comment 6. Just to be clear, the reason for the upgrade failure was not the kubelet serving cert becoming invalid; it is this: FailedCreatePodSandBox 2h kubelet, ip-172-31-159-215.us-east-2.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_machine-config-controller-864b594976-gg85k_openshift-machine-config-operator_ceeb5a39-505d-11e9-b9b3-024f9c8d2a44_0(93d212c81bbff94bd33d7b0b4d7022dc1cb911c81706fe2c10cd3443828a260d): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input Sending to Network to see if they have encountered this before. Hm, interesting. Marking this as low until it's reproduced. If this happens again, please wake the bug. It looks like we found the issue. Furthermore, danw has submitted a libcni change that makes the error message more interesting. Closing as dupe. *** This bug has been marked as a duplicate of bug 1700504 *** Casey, Ok, thanks! |