Bug 1695554

Summary:	[upgrade] upgrade to 4.0.0-0.nightly-2019-04-02-133735 failed due to the installer pods errors
Product:	OpenShift Container Platform	Reporter:	Jian Zhang <jiazha>
Component:	Networking	Assignee:	Casey Callendrello <cdc>
Status:	CLOSED ERRATA	QA Contact:	Meng Bo <bmeng>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.1.0	CC:	aos-bugs, bandrade, chezhang, danw, dcbw, dosmith, dyan, fpan, jfan, jokerman, mmccomas, scolange, vlaad, zitang
Target Milestone:	---	Keywords:	BetaBlocker
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:47:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jian Zhang 2019-04-03 10:38:04 UTC

Description of problem:
Upgrade failed to 4.0.0-0.nightly-2019-04-02-133735 from 4.0.0-0.nightly-2019-04-02-081046. Hang here more than 1 hour:
[jzhang@dhcp-140-18 ocp43]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        True          80m     Working towards 4.0.0-0.nightly-2019-04-02-133735: 18% complete
[jzhang@dhcp-140-18 ocp43]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        True          81m     Unable to apply 4.0.0-0.nightly-2019-04-02-133735: the cluster operator kube-apiserver is failing


Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1. Install the OCP 4.0 with the payload: 4.0.0-0.nightly-2019-04-02-081046
2. Install the etcd operator.
3. Change the "upstream" value to "https://openshift-release.svc.ci.openshift.org/graph" and run "oc adm upgrade --to 4.0.0-0.nightly-2019-04-02-133735".

Actual results:
Upgrade failed.
[jzhang@dhcp-140-18 ocp43]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        True          80m     Working towards 4.0.0-0.nightly-2019-04-02-133735: 18% complete
[jzhang@dhcp-140-18 ocp43]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        True          81m     Unable to apply 4.0.0-0.nightly-2019-04-02-133735: the cluster operator kube-apiserver is failing
[jzhang@dhcp-140-18 ocp43]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        True          89m     Working towards 4.0.0-0.nightly-2019-04-02-133735: 18% complete

[jzhang@dhcp-140-18 ocp43]$ oc get pods -n openshift-kube-apiserver
NAME                                                            READY   STATUS      RESTARTS   AGE
installer-9-ip-10-0-154-74.ap-south-1.compute.internal          0/1     Error       0          45m
kube-apiserver-ip-10-0-130-14.ap-south-1.compute.internal       2/2     Running     2          62m
kube-apiserver-ip-10-0-138-187.ap-south-1.compute.internal      2/2     Running     2          60m
kube-apiserver-ip-10-0-154-74.ap-south-1.compute.internal       2/2     Running     2          63m
revision-pruner-8-ip-10-0-130-14.ap-south-1.compute.internal    0/1     Completed   0          50m
revision-pruner-8-ip-10-0-138-187.ap-south-1.compute.internal   0/1     Completed   0          46m
revision-pruner-8-ip-10-0-154-74.ap-south-1.compute.internal    0/1     Completed   0          45m
revision-pruner-9-ip-10-0-154-74.ap-south-1.compute.internal    0/1     Completed   0          44m

Expected results:
Upgrade success.

Additional info:
Here are the whole "installer-9-ip-10-0-154-74.ap-south-1.compute.internal" logs:
[jzhang@dhcp-140-18 ocp43]$ oc logs installer-9-ip-10-0-154-74.ap-south-1.compute.internal -n openshift-kube-apiserver
I0403 09:25:50.813193       1 cmd.go:76] &{<nil> true {false} installer true map[resource-dir:0xc4206c0140 configmaps:0xc42001ff40 optional-configmaps:0xc4206c00a0 cert-configmaps:0xc4206c0a00 optional-cert-secrets:0xc4206c0aa0 v:0xc420534b40 revision:0xc42001fb80 namespace:0xc42001fc20 pod:0xc42001fe00 pod-manifest-dir:0xc4206c05a0 secrets:0xc42001fea0 cert-dir:0xc4206c0be0 cert-secrets:0xc4206c0960] [0xc420534b40 0xc42001fb80 0xc42001fc20 0xc42001fe00 0xc4206c0140 0xc4206c05a0 0xc42001ff40 0xc4206c00a0 0xc42001fea0 0xc4206c0be0 0xc4206c0a00 0xc4206c0960 0xc4206c0aa0] [] map[configmaps:0xc42001ff40 pod-manifest-dir:0xc4206c05a0 cert-secrets:0xc4206c0960 optional-cert-secrets:0xc4206c0aa0 optional-cert-configmaps:0xc4206c0b40 cert-dir:0xc4206c0be0 v:0xc420534b40 revision:0xc42001fb80 optional-configmaps:0xc4206c00a0 timeout-duration:0xc4206c0640 log-dir:0xc4205345a0 stderrthreshold:0xc4205346e0 vmodule:0xc420534be0 namespace:0xc42001fc20 pod:0xc42001fe00 secrets:0xc42001fea0 optional-secrets:0xc4206c0000 resource-dir:0xc4206c0140 kubeconfig:0xc42001f5e0 alsologtostderr:0xc4205341e0 log-backtrace-at:0xc420534280 log-flush-frequency:0xc420126640 logtostderr:0xc420534640 help:0xc4206648c0 cert-configmaps:0xc4206c0a00] [0xc42001f5e0 0xc42001fb80 0xc42001fc20 0xc42001fe00 0xc42001fea0 0xc42001ff40 0xc4206c0000 0xc4206c00a0 0xc4206c0140 0xc4206c05a0 0xc4206c0640 0xc4206c0960 0xc4206c0a00 0xc4206c0aa0 0xc4206c0b40 0xc4206c0be0 0xc4205341e0 0xc420534280 0xc4205345a0 0xc420126640 0xc420534640 0xc4205346e0 0xc420534b40 0xc420534be0 0xc4206648c0] [0xc4205341e0 0xc4206c0a00 0xc4206c0be0 0xc4206c0960 0xc42001ff40 0xc4206648c0 0xc42001f5e0 0xc420534280 0xc4205345a0 0xc420126640 0xc420534640 0xc42001fc20 0xc4206c0b40 0xc4206c0aa0 0xc4206c00a0 0xc4206c0000 0xc42001fe00 0xc4206c05a0 0xc4206c0140 0xc42001fb80 0xc42001fea0 0xc4205346e0 0xc4206c0640 0xc420534b40 0xc420534be0] map[118:0xc420534b40 104:0xc4206648c0] [] -1 0 0xc420481a40 true <nil> []}
I0403 09:25:50.813704       1 cmd.go:77] (*installerpod.InstallOptions)(0xc4203ebb80)({
 KubeConfig: (string) "",
 KubeClient: (kubernetes.Interface) <nil>,
 Revision: (string) (len=1) "9",
 Namespace: (string) (len=24) "openshift-kube-apiserver",
 PodConfigMapNamePrefix: (string) (len=18) "kube-apiserver-pod",
 SecretNamePrefixes: ([]string) (len=3 cap=4) {
  (string) (len=11) "etcd-client",
  (string) (len=42) "kube-apiserver-cert-syncer-client-cert-key",
  (string) (len=14) "kubelet-client"
 },
 OptionalSecretNamePrefixes: ([]string) <nil>,
 ConfigMapNamePrefixes: ([]string) (len=5 cap=8) {
  (string) (len=18) "kube-apiserver-pod",
  (string) (len=6) "config",
  (string) (len=37) "kube-apiserver-cert-syncer-kubeconfig",
  (string) (len=15) "etcd-serving-ca",
  (string) (len=18) "kubelet-serving-ca"
 },
 OptionalConfigMapNamePrefixes: ([]string) (len=4 cap=4) {
  (string) (len=14) "oauth-metadata",
  (string) (len=30) "initial-sa-token-signing-certs",
  (string) (len=24) "kube-apiserver-server-ca",
  (string) (len=46) "kube-controller-manager-sa-token-signing-certs"
 },
 CertSecretNames: ([]string) (len=2 cap=2) {
  (string) (len=17) "aggregator-client",
  (string) (len=12) "serving-cert"
 },
 OptionalCertSecretNamePrefixes: ([]string) (len=11 cap=16) {
  (string) (len=17) "user-serving-cert",
  (string) (len=21) "user-serving-cert-000",
  (string) (len=21) "user-serving-cert-001",
  (string) (len=21) "user-serving-cert-002",
  (string) (len=21) "user-serving-cert-003",
  (string) (len=21) "user-serving-cert-004",
  (string) (len=21) "user-serving-cert-005",
  (string) (len=21) "user-serving-cert-006",
  (string) (len=21) "user-serving-cert-007",
  (string) (len=21) "user-serving-cert-008",
  (string) (len=21) "user-serving-cert-009"
 },
 CertConfigMapNamePrefixes: ([]string) (len=2 cap=2) {
  (string) (len=20) "aggregator-client-ca",
  (string) (len=9) "client-ca"
 },
 OptionalCertConfigMapNamePrefixes: ([]string) <nil>,
 CertDir: (string) (len=57) "/etc/kubernetes/static-pod-resources/kube-apiserver-certs",
 ResourceDir: (string) (len=36) "/etc/kubernetes/static-pod-resources",
 PodManifestDir: (string) (len=25) "/etc/kubernetes/manifests",
 Timeout: (time.Duration) 2m0s,
 PodMutationFns: ([]installerpod.PodMutationFunc) <nil>
})
F0403 09:26:00.475326       1 cmd.go:89] failed to get self-reference: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-9-ip-10-0-154-74.ap-south-1.compute.internal: dial tcp 172.30.0.1:443: connect: no route to host

Comment 2 Michal Fojtik 2019-04-03 14:10:43 UTC

`F0403 09:26:00.475326       1 cmd.go:89] failed to get self-reference: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-9-ip-10-0-154-74.ap-south-1.compute.internal: dial tcp 172.30.0.1:443: connect: no route to host`

this looks like a networking error.

Comment 5 Dan Winship 2019-04-03 15:48:19 UTC

The installer pod logged its error at 09:26:00. The sdn pod on that node first observed the existence of the default/kubernetes service only 2 seconds before that:

    I0403 09:25:58.208561    4167 service.go:319] Adding new service port "default/kubernetes:https" at 172.30.0.1:443/TCP

Given that a "no route to host" error would take a few seconds of network failure to trigger, it's likely that the installer pod had started its attempt to connect to 172.30.0.1 before that point (and it's not clear from the logs exactly when sdn/kube-proxy actually created the iptables rules for the service anyway; the iptables proxier would wait until at least 1 second after the last of the burst of initial updates).

So it looks like the installer-pod got "no route to host" because it tried to connect to the service before openshift-sdn had fully set up service IPs. (Something that is not covered by any node readiness condition, I believe.)

One weird thing is that the sdn was actually started 20 seconds before this point, and in the logs we see:

    W0403 09:25:42.835196    4167 node.go:399] Could not reattach pod 'openshift-machine-api/cluster-autoscaler-operator-848f5ff955-p7v7j' to SDN: error on port veth0f212092: "could not open network device veth0f212092 (No such device)"
    I0403 09:25:57.803708    4167 node.go:350] openshift-sdn network plugin registering startup

Note the mysterious 15-second delay between the two messages. I'm not sure where that's coming from, but if it hadn't been there, then sdn would have had more than enough time to get the service IPs set up before the installer pod ran...

Comment 6 Dan Winship 2019-04-03 15:53:25 UTC

>     W0403 09:25:42.835196    4167 node.go:399] Could not reattach pod 'openshift-machine-api/cluster-autoscaler-operator-848f5ff955-p7v7j' to SDN: error on port veth0f212092: "could not open network device veth0f212092 (No such device)"

(just to be clear, that error is irrelevant to this bug; the node was rebooted, so we wouldn't expect to be able to reattach the pods that were running before it was rebooted. openshift-sdn just isn't smart enough currently to distinguish that from the case where it should have been able to reattach the pod)

Comment 7 Dan Winship 2019-04-03 16:04:37 UTC

Ugh, ok, it's because we're killing the pods after failing to reattach them, and crio ends up doing a CNI DELETE for each one, at which point multus needs to query the apiserver to figure out if the pod has any additional networks to also delete, but that times out because 172.30.0.1 doesn't point to anything yet.

Comment 8 Casey Callendrello 2019-04-04 11:20:57 UTC

Ugh, interesting.

Multus should absolutely be checkpointing this - especially since the apiserver state could have changed after the pod was created. Doug, is there any way to configure multus to checkpoint added networks to disk, or will that require a code change?

As a workaround, can we ensure the kubelet is NodeNotReady until services have been synced? It will require a bit of yucky plumbing in the proxy, but should otherwise be doable.

Comment 9 Casey Callendrello 2019-04-04 11:24:33 UTC

CC doug and feng.

Doug, Feng, multus should not rely on the apiserver being up to do a DELETE. If there's no way to get rid of that requirement, it should be connecting to the "real" apiserver as opposed to the service IP. You can get that information from the operator, since we plumb it through the environment.

Comment 10 Dan Winship 2019-04-04 11:36:20 UTC

(In reply to Casey Callendrello from comment #8)
> As a workaround, can we ensure the kubelet is NodeNotReady until services
> have been synced? It will require a bit of yucky plumbing in the proxy, but
> should otherwise be doable.

Well the bigger problem is that as things were written before, we weren't even starting the proxy until after we had finished cleaning up the old pods. origin #37414 fixes that.

But after talking some more, dcbw and I decided we should probably fix up multus to use the real apiserver too, though that PR isn't written yet.

Comment 11 Casey Callendrello 2019-04-05 16:28:11 UTC

PR has merged: https://github.com/openshift/cluster-network-operator/pull/139

Comment 12 Dan Winship 2019-04-06 14:23:50 UTC

This should be fixed by either of https://github.com/openshift/cluster-network-operator/pull/139 or https://github.com/openshift/origin/pull/22471. (And twice as fixed with both of them!)

The effect of either fix is that there's no longer a long pause in SDN startup, so service IPs should be ready now before any other pods try to access them.

Comment 14 Meng Bo 2019-04-11 08:14:18 UTC

Tested on 4.0.0-0.nightly-2019-04-10-182914

The sdn node can finish the start up in 10 seconds.
# oc logs sdn-7c5bm  | grep node.go
I0411 06:37:33.798980   41886 node.go:148] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-149-3.ap-northeast-1.compute.internal" (IP ""), iptables sync period "30s"
I0411 06:37:33.803646   41886 node.go:267] Starting openshift-sdn network plugin
I0411 06:37:43.323230   41886 node.go:326] Starting openshift-sdn pod manager
I0411 06:37:43.324888   41886 node.go:334] OVS bridge has been recreated. Will reattach 5 existing pods...
I0411 06:37:43.332196   41886 node.go:389] Interface veth91f19b45 for pod 'openshift-image-registry/node-ca-kxt2n' no longer exists
I0411 06:37:43.336999   41886 node.go:389] Interface vethf4e807fa for pod 'openshift-monitoring/alertmanager-main-0' no longer exists
I0411 06:37:43.341730   41886 node.go:389] Interface veth3a59281d for pod 'openshift-dns/dns-default-qqs45' no longer exists
I0411 06:37:43.346474   41886 node.go:389] Interface vethbf232f67 for pod 'openshift-monitoring/prometheus-adapter-56c84b6556-tqkdm' no longer exists
I0411 06:37:43.351180   41886 node.go:389] Interface veth6ba7c3de for pod 'openshift-monitoring/grafana-56879d5757-swp68' no longer exists
I0411 06:37:43.351206   41886 node.go:350] openshift-sdn network plugin registering startup
I0411 06:37:43.351346   41886 node.go:368] openshift-sdn network plugin ready


And the multus is using the alias of internal lb to access the api service, which will not need the kube-proxy started firstly
[root@ip-10-0-130-51 ~]# cat /etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig
# Kubeconfig file for Multus CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
  cluster:
    server: https://api.qe-bmeng-ap-411.qe.devcluster.openshift.com:6443

Comment 16 errata-xmlrpc 2019-06-04 10:47:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758