1791117 – Network disruption of openshift-apiserver during 4.4-4.4 upgrade

Bug 1791117 - Network disruption of openshift-apiserver during 4.4-4.4 upgrade

Summary: Network disruption of openshift-apiserver during 4.4-4.4 upgrade

Keywords:
Status:	CLOSED DUPLICATE of bug 1785457
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Aniket Bhat
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-14 21:58 UTC by Clayton Coleman
Modified:	2023-09-14 05:49 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-10 13:21:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Clayton Coleman 2020-01-14 21:58:14 UTC

An e2e test to verify both Kube and Openshift apiservers remain available during upgrade failed in a 4.4 to 4.4 test.

A particular snippet that seemed to highlight the reason for the failure was:

Jan 12 20:58:46.230 I ns/openshift-sdn daemonset/sdn-controller Deleted pod: sdn-controller-lkj6v
Jan 12 20:58:46.230 I ns/openshift-sdn pod/sdn-7d6g9 Pulling image "registry.svc.ci.openshift.org/ci-op-jbtg7jjb/stable@sha256:f8de726661ce92ee52c4de8498a9f2868a4569b7ae62e59442d09ccbb78302b5"
Jan 12 20:58:46.364 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:48.379 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times)
Jan 12 20:58:48.387 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:48.701 W ns/openshift-operator-lifecycle-manager pod/catalog-operator-86488444c-v4h5q Readiness probe failed: Get http://10.129.0.46:8080/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times)
Jan 12 20:58:49.097 W ns/openshift-apiserver pod/apiserver-zg25k Readiness probe failed: Get https://10.129.0.43:8443/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times)
Jan 12 20:58:49.374 W ns/openshift-cluster-node-tuning-operator pod/cluster-node-tuning-operator-5c859c6585-kb6ph network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:49.742 W node/ip-10-0-157-152.ec2.internal condition Ready changed
Jan 12 20:58:49.745 I node/ip-10-0-157-152.ec2.internal Node ip-10-0-157-152.ec2.internal status is now: NodeReady (2 times)
Jan 12 20:58:49.882 I ns/openshift-machine-api machine/ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 Updated machine ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 (3 times)
Jan 12 20:58:50.366 W ns/openshift-ingress-operator pod/ingress-operator-8c8c9579c-hph6g network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:50.373 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (3 times)
Jan 12 20:58:50.381 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times)

Which looks like:

1. openshift/sdn on a node is updated
2. 8-12 seconds later openshift-apiserver (on the pod network) fails readiness checks and is taken out of rotation

At a first glance this would be a very serious bug if upgrading openshift-sdn caused a disruption to pods on the pod network

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/14098

Comment 1 Clayton Coleman 2020-01-15 04:05:23 UTC

May be related to 1791162, but not sure.

Comment 4 Ben Bennett 2020-03-10 13:21:50 UTC


*** This bug has been marked as a duplicate of bug 1785457 ***

Comment 5 Red Hat Bugzilla 2023-09-14 05:49:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.