test: [sig-api-machinery] Kubernetes APIs remain available for new connections is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-api-machinery%5C%5D+Kubernetes+APIs+remain+available+for+new+connections Sample failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-stable-to-4.7-ci/1354023402309947392 During upgrade, seems that the connectivity is lost. The effect is a failure of the test requiring keeping connection with the cluster. --- API "kubernetes-api-available-new-connections" was unreachable during disruption for at least 16s of 1h0m4s (0%), this is currently sufficient to pass the test/job but not considered completely correct: Jan 26 12:06:29.017 E kube-apiserver-new-connection kube-apiserver-new-connection started failing: Get "https://api.ci-op-bip2nktq-71cad.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default": dial tcp 54.147.14.33:6443: connect: connection refused Jan 26 12:06:29.983 E kube-apiserver-new-connection kube-apiserver-new-connection is not responding to GET requests Jan 26 12:06:30.038 I kube-apiserver-new-connection kube-apiserver-new-connection started responding to GET requests --- As the issue is with keeping connectivity with the cluster, all the sig-api-machinery tests and few more (added in the environment field) fail: [sig-api-machinery] Kubernetes APIs remain available for new connections [sig-api-machinery] Kubernetes APIs remain available with reused connections [sig-api-machinery] OAuth APIs remain available for new connections [sig-api-machinery] OAuth APIs remain available with reused connections [sig-api-machinery] OpenShift APIs remain available for new connections [sig-api-machinery] OpenShift APIs remain available with reused connections Starting assigning to the OpenShift Update Service component for further investigation. The only info I was able to gather is that the issue happens when updating the cluster, so feel free to reassign to a different component if needed.
This looks like the 4.7 version of bug #1845411. Seems a know issue, moving to the kube-apiserver component.
I don't see why this is urgent. No customer escalation, not blocking every CI run.
Per discussion, we have on the slack I'm assigning to the node team.(https://coreos.slack.com/archives/CB48XQ4KZ/p1611761584134700) We suspect that mco triggers a reboot and doesn't wait for kubelet to finish all running processes. For example https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6-stable-to-4.7-ci/1354295675558301696 T0: At 06:45:46: machine-config initiating reboot: Node will reboot into config rendered-master 854a2b589ebed29fbad70d67f2e243c1 T1: At 06:45:46: Stopped Kubernetes Kubelet on ci-op-z52cbzhi-6d7cd-pz2jw-master-0 T2: At 06:45:58: systemd-shutdown was sending SIGTERM to remaining processes... T3: At 06:45:58: kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Received signal to terminate, becoming unready, but keeping serving (TerminationStart event) T4: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: The minimal shutdown duration of 1m10s finished (TerminationMinimalShutdownDurationFinished event) T5: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Server has stopped listening (TerminationStoppedServing event) T5 is the last event reported from that api server. At T5 the server might wait up to 60s for all requests to complete and then it fires TerminationGracefulTerminationFinished event. ci-op-z52cbzhi-6d7cd-pz2jw-master-0-termination (audit-logs) file suggest the server was forcefully shut down no TerminationGracefulTerminationFinished reported It seems that mco must wait for kubelet so that all processes finish and only then starting tearing other things like network or volumes).
Not a 4.7 blocker.
I'm seeing the following tests fail consistently on the release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.6-to-4.7-ci job: [sig-api-machinery] Kubernetes APIs remain available for new connections [sig-api-machinery] Kubernetes APIs remain available with reused connections [sig-api-machinery] OAuth APIs remain available for new connections [sig-api-machinery] OAuth APIs remain available with reused connections [sig-api-machinery] OpenShift APIs remain available for new connections [sig-api-machinery] OpenShift APIs remain available with reused connections See <https://search.ci.openshift.org/?search=%5C%5Bsig-api-machinery%5C%5D+OAuth+APIs+remain+available+for+new+connections&maxAge=168h&context=1&type=bug%2Bjunit&name=release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.6-to-4.7-ci&maxMatches=5&maxBytes=20971520&groupBy=job>.
*** This bug has been marked as a duplicate of bug 1928946 ***