Bug 2071139

Summary: Ingress pods scheduled on the same node
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: deads, hongli
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: An issue with the scheduler can cause it to ignore pod anti-affinity rules when scheduling pods. Consequence: Two router pod replicas for the same generation of the same IngressController could be scheduled to the same node, increasing the risk of disruption to ingress during cluster upgrades or node outages. Fix: Logic was added to the ingress operator to evict misscheduled router pods. Result: Router pods are properly spread across multiple nodes to reduce disruption during upgrades and increase resilience to node outages.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:03:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2022-04-01 23:43:03 UTC
This bug was initially created as a copy of Bug #2062459

I am copying this bug because David Eads has implemented a short-term workaround for it in the ingress operator.  This new bug tracks the short-term workaround while the old bug tracks the longer-term scheduler fix.  


Description of problem:

Two router pod replicas for the same generation of the same ingresscontroller can be scheduled to the same node despite pod anti-affinity rules that should prevent colocated pods.  


OpenShift release version:

At least 4.11 and 4.10.  


Cluster Platform:

The scheduling issue affects router pods for at least AWS, Azure, and GCP.  It most likely affects all cloud platforms.  I don't know whether the scheduling issue affects router pods for on-premise platforms, which use the "HostNetwork" endpoint publishing strategy by default.  


How reproducible:

See bug 2062459.


Steps to Reproduce (in detail):

The issue has been observed to be causing as significant number of CI job failures.  The cause appears to be a race condition.  As far as I know, we do not have a reliable reproducer.


Actual results:

Pods for the same generation of the same ingresscontroller are sometimes schedule to the same node; see bug 2062459.


Expected results:

These pods should always be spread across nodes.


Impact of the problem:

Failure to spread router pods out across nodes increases the impact of rolling updates of nodes or outages of individual nodes or availability zones.  


Additional info:

See bug 2062459 for example CI failures.  

The router pod anti-affinity rule is defined here: https://github.com/openshift/cluster-ingress-operator/blob/5040f65551851b3ee284f0803bfdd1c64631c4c6/pkg/operator/controller/ingress/deployment.go#L337-L357

This anti-affinity rule is only added when using the "LoadBalancerService" endpoint publishing strategy.  By default, cloud platforms (Alibaba, AWS, Azure, GCP, IBM Cloud, and Power VS) use "LoadBalancerService" while other platforms use "HostNetwork".  

With "HostNetwork", router pods use the host network, which prevents them from being colocated on the same node: every router pod requires ports 80, 443, and 1936, so when using the host network, the scheduler already prevents two router pods from being scheduled to the same node, even without the use of pod anti-affinity.  I am not aware of an issue with the scheduler as pertains to host port conflicts.

Comment 5 Arvind iyengar 2022-05-09 02:53:07 UTC
There are no more failures noted in the recent runs for the "sig-scheduling][Early] The HAProxy router pods should be scheduled on different nodes" test. Marking this as "verified":
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-e2e-aws-upgrade

Comment 8 errata-xmlrpc 2022-08-10 11:03:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069