2062459 – Ingress pods scheduled on the same node

Bug 2062459 - Ingress pods scheduled on the same node

Summary: Ingress pods scheduled on the same node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Jan Chaloupka
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2080471 (view as bug list)
Depends On:
Blocks:	2089336
TreeView+	depends on / blocked

Reported:	2022-03-09 19:42 UTC by Ken Zhang
Modified:	2022-08-10 10:53 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:	[sig-scheduling][Early] The HAProxy router pods should be scheduled on different nodes [Suite:openshift/conformance/parallel]
Last Closed:	2022-08-10 10:53:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Events showing both pods scheduled on same node (120.19 KB, image/png) 2022-03-09 19:42 UTC, Ken Zhang	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-scheduler-operator pull 428	None	Merged	Bug 2062459: Fix bootstrap leader election config	2022-05-26 23:46:08 UTC
Github	openshift cluster-kube-scheduler-operator pull 430	None	Merged	Bug 2062459: Introduce sync unit test	2022-06-16 22:55:20 UTC
Github	openshift kubernetes pull 1210	None	Merged	Bug 2062459: Generate event when cache update is failed	2022-03-21 14:33:57 UTC
Github	openshift kubernetes pull 1251	None	Merged	Bug 2062459: Identify if there are multiple schedulers running	2022-05-14 06:28:56 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:53:49 UTC

Description Ken Zhang 2022-03-09 19:42:12 UTC

Created attachment 1864996 [details]
Events showing both pods scheduled on same node

Created attachment 1864996 [details]
Events showing both pods scheduled on same node

Created attachment 1864996 [details]
Events showing both pods scheduled on same node

While debugging a disruption test failure "[sig-imageregistry] Image registry remains available using new connections", we noticed that the two ingress pods were scheduled on the same node. 

A couple of example job runs:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade/1500434000819261440 
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade/1500433998298484736


See attached of the events showing the two pods were scheduled on the same node.

Ingress has anti-affinity rule that should prevent this: https://github.com/openshift/cluster-ingress-operator/blob/5040f65551851b3ee284f0803bfdd1c64631c4c6/pkg/operator/controller/ingress/deployment.go#L337-L357

But somehow the pods ended up on the same node.

Comment 1 David Eads 2022-03-09 19:55:22 UTC

Ken, I cannot find the events from your screenshot in the linked CI job.

Comment 2 David Eads 2022-03-09 19:58:52 UTC

yeah, the screenshot doesn't match, but the events show the bug pretty clearly.  `	router-default-79dfc95ff7-wtzl6` and router-default-79dfc95ff7-f96fj in the linked run

Comment 3 W. Trevor King 2022-03-09 20:16:18 UTC

Details from the events with the pods David points out in comment 2 both getting scheduled to the same node in the same second:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade/1500434000819261440/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-ingress" and (.reason == "Scheduled" or .reason == "Killing")) | .metadata.creationTimestamp + " " + (.count | tostring) + " " + .involvedObject.name + " " + .reason + ": " + .message' | sort
2022-03-06T11:50:02Z null router-default-79dfc95ff7-f96fj Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-f96fj to ip-10-0-129-93.us-west-1.compute.internal
2022-03-06T11:50:02Z null router-default-79dfc95ff7-wtzl6 Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-wtzl6 to ip-10-0-129-93.us-west-1.compute.internal
2022-03-06T12:26:05Z null router-default-79dfc95ff7-27b2v Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-27b2v to ip-10-0-225-12.us-west-1.compute.internal
2022-03-06T12:26:06Z 1 router-default-79dfc95ff7-wtzl6 Killing: Stopping container router
2022-03-06T12:26:15Z 1 router-default-79dfc95ff7-f96fj Killing: Stopping container router
2022-03-06T12:26:15Z null router-default-79dfc95ff7-ltwrt Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-ltwrt to ip-10-0-155-172.us-west-1.compute.internal
2022-03-06T12:29:40Z 1 router-default-79dfc95ff7-27b2v Killing: Stopping container router
2022-03-06T12:29:40Z null router-default-79dfc95ff7-t6d8g Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-t6d8g to ip-10-0-129-93.us-west-1.compute.internal
2022-03-06T12:33:05Z null router-default-79dfc95ff7-59cgm Scheduled: Successfully assigned openshift-ingress/router-default-79dfc95ff7-59cgm to ip-10-0-225-12.us-west-1.compute.internal
2022-03-06T12:33:08Z 2 router-default-79dfc95ff7-ltwrt Killing: Stopping container router

Comment 6 W. Trevor King 2022-03-16 18:46:13 UTC

Moving back to ASSIGNED, per [1], openshift/kubernetes#1210 is a debugging aid and not a fix.

[1]: https://github.com/openshift/kubernetes/pull/1210#issuecomment-1068235121

Comment 9 David Eads 2022-03-21 14:39:10 UTC

"The openshift-etcd pods should be scheduled on different nodes" appears to be failing 8% of the time on metal OVN.  This means that etcd quorum is not protected by the PDB.

https://search.ci.openshift.org/?search=The+openshift-etcd+pods+should+be+scheduled+on+different+nodes&maxAge=168h&context=0&type=junit&name=4.11.*metal.*ovn&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 10 ravig 2022-05-04 12:43:04 UTC

*** Bug 2080471 has been marked as a duplicate of this bug. ***

Comment 16 RamaKasturi 2022-05-26 10:12:09 UTC

Hello Ravi,

   I tried to verify the issue by checking the link below, the only time i see it passing was at [2] i.e before 44 hours but after that i see it failing with error at [3]. Any idea if we have a bug tracking this ? And i think we should wait until this issue is fixed. WDYS ?

[1] https://search.ci.openshift.org/?search=The+openshift-etcd+pods+should+be+scheduled+on+different+nodes&maxAge=168h&context=0&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
[2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade/1529051880641007616
[3] blob:https://prow.ci.openshift.org/47301f28-6c2f-4a64-9815-031b18b036ab

Thanks
kasturi

Comment 17 ravig 2022-05-31 18:30:27 UTC

Hi Kasturi,

There is an issue with build controller SA which should be exempted from pod security, Standa has opened a PR and it should solve the problem. I looked at the search ci again and it seems the failures are unrelated to the symptom we usually see.

Comment 18 RamaKasturi 2022-06-02 05:55:21 UTC

Ravi, yes agree. But can we wait until we have the test passing atleast couple of times  before the bug is moved to verified state ?

Comment 19 ravig 2022-06-03 13:50:17 UTC

Sure. We should wait till we have clear signal. No point in rushing to close this BZ.

Comment 22 RamaKasturi 2022-06-15 07:08:03 UTC

Hello Ravi,

   I tried to verify the bug again but this time i am not sure of the reason it failed but i do see below messages when checking the logs at [1] , could you please help take a look? thanks !!

{Passed 2 times, failed 0 times, skipped 0 times: we require at least 3 attempts to have a chance at success  name: '[sig-scheduling][Early] The openshift-etcd pods should be scheduled on different
  nodes [Suite:openshift/conformance/parallel]'
testsuitename: openshift-tests-upgrade
summary: 'Passed 2 times, failed 0 times, skipped 0 times: we require at least 3 attempts
  to have a chance at success'
passes:
- jobrunid: "1536907377092071424"
  humanurl: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-origin-27244-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536907377092071424
  gcsartifacturl: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-origin-27244-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536907377092071424/artifacts
- jobrunid: "1536907379621236736"
  humanurl: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-origin-27244-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536907379621236736
  gcsartifacturl: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-origin-27244-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536907379621236736/artifacts
failures: []
skips: []
}

[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregator-periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536907382129430528

Thanks
kasturi

Comment 23 RamaKasturi 2022-06-15 17:00:15 UTC

Moving the test back to assigned state because when i looked at the ci logs i still see that pod got scheduled on to the same node.

[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure-modern/1535199601215148032
[2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure-modern/1535286255833583616

Comment 24 ravig 2022-06-16 13:06:28 UTC

That test is just for launch jobs by cluster bot and when I looked at the failed jobs they seem to be upgrade from 4.10 which doesn't include fix. So, moving back to `ON_QA`:

xref: https://coreos.slack.com/archives/C01CQA76KMX/p1655314005818299

Comment 25 RamaKasturi 2022-06-21 15:33:03 UTC

Looking at [1] i do see there are failures but they are not related to the actual error which is originally reported here in the bug and it is something to do with the env (based on my observations in the log), so based on that moving the test to verified.

[1] https://search.ci.openshift.org/?search=The+openshift-etcd+pods+should+be+scheduled+on+different+nodes&maxAge=168h&context=0&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 28 errata-xmlrpc 2022-08-10 10:53:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.