Bug 1695554
| Summary: | [upgrade] upgrade to 4.0.0-0.nightly-2019-04-02-133735 failed due to the installer pods errors | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> |
| Component: | Networking | Assignee: | Casey Callendrello <cdc> |
| Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | aos-bugs, bandrade, chezhang, danw, dcbw, dosmith, dyan, fpan, jfan, jokerman, mmccomas, scolange, vlaad, zitang |
| Target Milestone: | --- | Keywords: | BetaBlocker |
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:47:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jian Zhang
2019-04-03 10:38:04 UTC
`F0403 09:26:00.475326 1 cmd.go:89] failed to get self-reference: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-9-ip-10-0-154-74.ap-south-1.compute.internal: dial tcp 172.30.0.1:443: connect: no route to host` this looks like a networking error. The installer pod logged its error at 09:26:00. The sdn pod on that node first observed the existence of the default/kubernetes service only 2 seconds before that:
I0403 09:25:58.208561 4167 service.go:319] Adding new service port "default/kubernetes:https" at 172.30.0.1:443/TCP
Given that a "no route to host" error would take a few seconds of network failure to trigger, it's likely that the installer pod had started its attempt to connect to 172.30.0.1 before that point (and it's not clear from the logs exactly when sdn/kube-proxy actually created the iptables rules for the service anyway; the iptables proxier would wait until at least 1 second after the last of the burst of initial updates).
So it looks like the installer-pod got "no route to host" because it tried to connect to the service before openshift-sdn had fully set up service IPs. (Something that is not covered by any node readiness condition, I believe.)
One weird thing is that the sdn was actually started 20 seconds before this point, and in the logs we see:
W0403 09:25:42.835196 4167 node.go:399] Could not reattach pod 'openshift-machine-api/cluster-autoscaler-operator-848f5ff955-p7v7j' to SDN: error on port veth0f212092: "could not open network device veth0f212092 (No such device)"
I0403 09:25:57.803708 4167 node.go:350] openshift-sdn network plugin registering startup
Note the mysterious 15-second delay between the two messages. I'm not sure where that's coming from, but if it hadn't been there, then sdn would have had more than enough time to get the service IPs set up before the installer pod ran...
> W0403 09:25:42.835196 4167 node.go:399] Could not reattach pod 'openshift-machine-api/cluster-autoscaler-operator-848f5ff955-p7v7j' to SDN: error on port veth0f212092: "could not open network device veth0f212092 (No such device)"
(just to be clear, that error is irrelevant to this bug; the node was rebooted, so we wouldn't expect to be able to reattach the pods that were running before it was rebooted. openshift-sdn just isn't smart enough currently to distinguish that from the case where it should have been able to reattach the pod)
Ugh, ok, it's because we're killing the pods after failing to reattach them, and crio ends up doing a CNI DELETE for each one, at which point multus needs to query the apiserver to figure out if the pod has any additional networks to also delete, but that times out because 172.30.0.1 doesn't point to anything yet. Ugh, interesting. Multus should absolutely be checkpointing this - especially since the apiserver state could have changed after the pod was created. Doug, is there any way to configure multus to checkpoint added networks to disk, or will that require a code change? As a workaround, can we ensure the kubelet is NodeNotReady until services have been synced? It will require a bit of yucky plumbing in the proxy, but should otherwise be doable. CC doug and feng. Doug, Feng, multus should not rely on the apiserver being up to do a DELETE. If there's no way to get rid of that requirement, it should be connecting to the "real" apiserver as opposed to the service IP. You can get that information from the operator, since we plumb it through the environment. (In reply to Casey Callendrello from comment #8) > As a workaround, can we ensure the kubelet is NodeNotReady until services > have been synced? It will require a bit of yucky plumbing in the proxy, but > should otherwise be doable. Well the bigger problem is that as things were written before, we weren't even starting the proxy until after we had finished cleaning up the old pods. origin #37414 fixes that. But after talking some more, dcbw and I decided we should probably fix up multus to use the real apiserver too, though that PR isn't written yet. This should be fixed by either of https://github.com/openshift/cluster-network-operator/pull/139 or https://github.com/openshift/origin/pull/22471. (And twice as fixed with both of them!) The effect of either fix is that there's no longer a long pause in SDN startup, so service IPs should be ready now before any other pods try to access them. Tested on 4.0.0-0.nightly-2019-04-10-182914
The sdn node can finish the start up in 10 seconds.
# oc logs sdn-7c5bm | grep node.go
I0411 06:37:33.798980 41886 node.go:148] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-149-3.ap-northeast-1.compute.internal" (IP ""), iptables sync period "30s"
I0411 06:37:33.803646 41886 node.go:267] Starting openshift-sdn network plugin
I0411 06:37:43.323230 41886 node.go:326] Starting openshift-sdn pod manager
I0411 06:37:43.324888 41886 node.go:334] OVS bridge has been recreated. Will reattach 5 existing pods...
I0411 06:37:43.332196 41886 node.go:389] Interface veth91f19b45 for pod 'openshift-image-registry/node-ca-kxt2n' no longer exists
I0411 06:37:43.336999 41886 node.go:389] Interface vethf4e807fa for pod 'openshift-monitoring/alertmanager-main-0' no longer exists
I0411 06:37:43.341730 41886 node.go:389] Interface veth3a59281d for pod 'openshift-dns/dns-default-qqs45' no longer exists
I0411 06:37:43.346474 41886 node.go:389] Interface vethbf232f67 for pod 'openshift-monitoring/prometheus-adapter-56c84b6556-tqkdm' no longer exists
I0411 06:37:43.351180 41886 node.go:389] Interface veth6ba7c3de for pod 'openshift-monitoring/grafana-56879d5757-swp68' no longer exists
I0411 06:37:43.351206 41886 node.go:350] openshift-sdn network plugin registering startup
I0411 06:37:43.351346 41886 node.go:368] openshift-sdn network plugin ready
And the multus is using the alias of internal lb to access the api service, which will not need the kube-proxy started firstly
[root@ip-10-0-130-51 ~]# cat /etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig
# Kubeconfig file for Multus CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
cluster:
server: https://api.qe-bmeng-ap-411.qe.devcluster.openshift.com:6443
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |