Bug 1975379 - Console pods are scheduled on single master node
Summary: Console pods are scheduled on single master node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Jakub Hadvig
QA Contact: Yanping Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2003639
TreeView+ depends on / blocked
 
Reported: 2021-06-23 14:34 UTC by Apurva Nisal
Modified: 2023-09-15 01:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Use of soft requirement for anti-affinity rules on both console's deployments. Consequence: Console pods are scheduled on single master node. Fix: Use hard requirement for anti-affinity rules on both console's deployments. Use the hostname as topology key when scheduling the pods. Result: Console pods are scheduled on different master node.
Clone Of:
Environment:
Last Closed: 2021-10-18 17:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift console-operator pull 560 0 None open Bug 1975379: Use hard requirement for anti-affinity rules on both console's deployments 2021-06-24 16:47:29 UTC
Github openshift console-operator pull 566 0 None open Bug 1975379: Have timezone as soft requirement for pod antiaffinity 2021-07-15 16:15:45 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:42 UTC

Description Apurva Nisal 2021-06-23 14:34:25 UTC
Description of problem:
Console pods are scheduled on single master node.

 oc get pods -owide
NAME                        READY   STATUS    RESTARTS   AGE    IP            NODE                                                   NOMINATED NODE   READINESS GATES
console-6558bcb9f9-7cnjk    1/1     Running   1          3d1h   10.129.0.38   master-1.abc.com   <none>           <none>
console-6558bcb9f9-fwpzf    1/1     Running   0          3d1h   10.129.0.46   master-1.abc.com   <none>           <none>
downloads-84f554976-9nwr2   1/1     Running   0          3d1h   10.131.0.11   worker-2.abc.com   <none>           <none>
downloads-84f554976-wl655   1/1     Running   0          3d1h   10.129.2.7    worker-0.abc.com   <none>           <none>


oc get nodes
NAME               STATUS   ROLES    AGE    VERSION
master-0.abc.com   Ready    master   3d1h   v1.20.0+df9c838
master-1.abc.com   Ready    master   3d1h   v1.20.0+df9c838
master-2.abc.com   Ready    master   3d1h   v1.20.0+df9c838
worker-0.abc.com   Ready    worker   3d1h   v1.20.0+df9c838
worker-1.abc.com   Ready    worker   3d1h   v1.20.0+df9c838
worker-2.abc.com   Ready    worker   3d1h   v1.20.0+df9c838


Actual results:
Console pods are scheduled on single master node

Expected results:
Console pods should be scheduled on different master node

Comment 2 Samuel Padgett 2021-06-23 14:59:11 UTC
We have anti-affinity rules set, but we're using `preferredDuringSchedulingIgnoredDuringExecution` which is the soft requirement rather than `requiredDuringSchedulingIgnoredDuringExecution` which is the hard requirement.

Comment 3 Jakub Hadvig 2021-07-02 16:57:19 UTC
Still valid. PR up and in merge process.

Comment 5 Ronald 2021-07-09 12:21:53 UTC
Please backport the PR to release 4.7 as well

Thx,
Ronald

Comment 7 Samuel Padgett 2021-07-15 13:06:45 UTC
Reopening as this breaks OpenStack deployments.

Comment 9 Yanping Zhang 2021-07-20 09:35:21 UTC
Checked on ocp 4.9 cluster with payload 4.9.0-0.nightly-2021-07-19-140945。
Check console/downloads deployment yaml, the anti-affinity rule is "requiredDuringSchedulingIgnoredDuringExecution". And console pods are scheduled on different master nodes.
# oc get node |grep master
ip-10-0-158-130.us-east-2.compute.internal   Ready    master   9h    v1.21.1+8268f88
ip-10-0-160-245.us-east-2.compute.internal   Ready    master   9h    v1.21.1+8268f88
ip-10-0-196-96.us-east-2.compute.internal    Ready    master   9h    v1.21.1+8268f88
# oc get pod -n openshift-console -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
console-66946dc647-2vp26     1/1     Running   0          9h    10.128.0.36   ip-10-0-158-130.us-east-2.compute.internal   <none>           <none>
console-66946dc647-wpg64     1/1     Running   0          9h    10.130.0.35   ip-10-0-160-245.us-east-2.compute.internal   <none>           <none>
downloads-7d9df5cb76-5fsmr   1/1     Running   0          9h    10.130.0.28   ip-10-0-160-245.us-east-2.compute.internal   <none>           <none>
downloads-7d9df5cb76-x44nc   1/1     Running   0          9h    10.128.0.23   ip-10-0-158-130.us-east-2.compute.internal   <none>           <none>

Comment 10 Ronald 2021-07-28 11:08:35 UTC
Hi Guys,


Can it be backported to release 4.7 as well ?

Thx,
Ronald

Comment 14 errata-xmlrpc 2021-10-18 17:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 15 Red Hat Bugzilla 2023-09-15 01:10:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.