Bug 1791117 - Network disruption of openshift-apiserver during 4.4-4.4 upgrade
Summary: Network disruption of openshift-apiserver during 4.4-4.4 upgrade
Keywords:
Status: CLOSED DUPLICATE of bug 1785457
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.4.0
Assignee: Aniket Bhat
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-14 21:58 UTC by Clayton Coleman
Modified: 2023-09-14 05:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 13:21:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2020-01-14 21:58:14 UTC
An e2e test to verify both Kube and Openshift apiservers remain available during upgrade failed in a 4.4 to 4.4 test.

A particular snippet that seemed to highlight the reason for the failure was:

Jan 12 20:58:46.230 I ns/openshift-sdn daemonset/sdn-controller Deleted pod: sdn-controller-lkj6v
Jan 12 20:58:46.230 I ns/openshift-sdn pod/sdn-7d6g9 Pulling image "registry.svc.ci.openshift.org/ci-op-jbtg7jjb/stable@sha256:f8de726661ce92ee52c4de8498a9f2868a4569b7ae62e59442d09ccbb78302b5"
Jan 12 20:58:46.364 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:48.379 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times)
Jan 12 20:58:48.387 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:48.701 W ns/openshift-operator-lifecycle-manager pod/catalog-operator-86488444c-v4h5q Readiness probe failed: Get http://10.129.0.46:8080/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times)
Jan 12 20:58:49.097 W ns/openshift-apiserver pod/apiserver-zg25k Readiness probe failed: Get https://10.129.0.43:8443/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times)
Jan 12 20:58:49.374 W ns/openshift-cluster-node-tuning-operator pod/cluster-node-tuning-operator-5c859c6585-kb6ph network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:49.742 W node/ip-10-0-157-152.ec2.internal condition Ready changed
Jan 12 20:58:49.745 I node/ip-10-0-157-152.ec2.internal Node ip-10-0-157-152.ec2.internal status is now: NodeReady (2 times)
Jan 12 20:58:49.882 I ns/openshift-machine-api machine/ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 Updated machine ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 (3 times)
Jan 12 20:58:50.366 W ns/openshift-ingress-operator pod/ingress-operator-8c8c9579c-hph6g network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jan 12 20:58:50.373 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (3 times)
Jan 12 20:58:50.381 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times)

Which looks like:

1. openshift/sdn on a node is updated
2. 8-12 seconds later openshift-apiserver (on the pod network) fails readiness checks and is taken out of rotation

At a first glance this would be a very serious bug if upgrading openshift-sdn caused a disruption to pods on the pod network

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/14098

Comment 1 Clayton Coleman 2020-01-15 04:05:23 UTC
May be related to 1791162, but not sure.

Comment 4 Ben Bennett 2020-03-10 13:21:50 UTC

*** This bug has been marked as a duplicate of bug 1785457 ***

Comment 5 Red Hat Bugzilla 2023-09-14 05:49:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.