Bug 1975379 - Console pods are scheduled on single master node [NEEDINFO]
Summary: Console pods are scheduled on single master node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Jakub Hadvig
QA Contact: Yanping Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2003639
TreeView+ depends on / blocked
 
Reported: 2021-06-23 14:34 UTC by Apurva Nisal
Modified: 2021-10-18 17:36 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Use of soft requirement for anti-affinity rules on both console's deployments. Consequence: Console pods are scheduled on single master node. Fix: Use hard requirement for anti-affinity rules on both console's deployments. Use the hostname as topology key when scheduling the pods. Result: Console pods are scheduled on different master node.
Clone Of:
Environment:
Last Closed: 2021-10-18 17:36:28 UTC
Target Upstream Version:
mmohan: needinfo? (jhadvig)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift console-operator pull 560 0 None open Bug 1975379: Use hard requirement for anti-affinity rules on both console's deployments 2021-06-24 16:47:29 UTC
Github openshift console-operator pull 566 0 None open Bug 1975379: Have timezone as soft requirement for pod antiaffinity 2021-07-15 16:15:45 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:42 UTC

Description Apurva Nisal 2021-06-23 14:34:25 UTC
Description of problem:
Console pods are scheduled on single master node.

 oc get pods -owide
NAME                        READY   STATUS    RESTARTS   AGE    IP            NODE                                                   NOMINATED NODE   READINESS GATES
console-6558bcb9f9-7cnjk    1/1     Running   1          3d1h   10.129.0.38   master-1.abc.com   <none>           <none>
console-6558bcb9f9-fwpzf    1/1     Running   0          3d1h   10.129.0.46   master-1.abc.com   <none>           <none>
downloads-84f554976-9nwr2   1/1     Running   0          3d1h   10.131.0.11   worker-2.abc.com   <none>           <none>
downloads-84f554976-wl655   1/1     Running   0          3d1h   10.129.2.7    worker-0.abc.com   <none>           <none>


oc get nodes
NAME               STATUS   ROLES    AGE    VERSION
master-0.abc.com   Ready    master   3d1h   v1.20.0+df9c838
master-1.abc.com   Ready    master   3d1h   v1.20.0+df9c838
master-2.abc.com   Ready    master   3d1h   v1.20.0+df9c838
worker-0.abc.com   Ready    worker   3d1h   v1.20.0+df9c838
worker-1.abc.com   Ready    worker   3d1h   v1.20.0+df9c838
worker-2.abc.com   Ready    worker   3d1h   v1.20.0+df9c838


Actual results:
Console pods are scheduled on single master node

Expected results:
Console pods should be scheduled on different master node

Comment 2 Samuel Padgett 2021-06-23 14:59:11 UTC
We have anti-affinity rules set, but we're using `preferredDuringSchedulingIgnoredDuringExecution` which is the soft requirement rather than `requiredDuringSchedulingIgnoredDuringExecution` which is the hard requirement.

Comment 3 Jakub Hadvig 2021-07-02 16:57:19 UTC
Still valid. PR up and in merge process.

Comment 5 Ronald 2021-07-09 12:21:53 UTC
Please backport the PR to release 4.7 as well

Thx,
Ronald

Comment 7 Samuel Padgett 2021-07-15 13:06:45 UTC
Reopening as this breaks OpenStack deployments.

Comment 9 Yanping Zhang 2021-07-20 09:35:21 UTC
Checked on ocp 4.9 cluster with payload 4.9.0-0.nightly-2021-07-19-140945。
Check console/downloads deployment yaml, the anti-affinity rule is "requiredDuringSchedulingIgnoredDuringExecution". And console pods are scheduled on different master nodes.
# oc get node |grep master
ip-10-0-158-130.us-east-2.compute.internal   Ready    master   9h    v1.21.1+8268f88
ip-10-0-160-245.us-east-2.compute.internal   Ready    master   9h    v1.21.1+8268f88
ip-10-0-196-96.us-east-2.compute.internal    Ready    master   9h    v1.21.1+8268f88
# oc get pod -n openshift-console -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
console-66946dc647-2vp26     1/1     Running   0          9h    10.128.0.36   ip-10-0-158-130.us-east-2.compute.internal   <none>           <none>
console-66946dc647-wpg64     1/1     Running   0          9h    10.130.0.35   ip-10-0-160-245.us-east-2.compute.internal   <none>           <none>
downloads-7d9df5cb76-5fsmr   1/1     Running   0          9h    10.130.0.28   ip-10-0-160-245.us-east-2.compute.internal   <none>           <none>
downloads-7d9df5cb76-x44nc   1/1     Running   0          9h    10.128.0.23   ip-10-0-158-130.us-east-2.compute.internal   <none>           <none>

Comment 10 Ronald 2021-07-28 11:08:35 UTC
Hi Guys,


Can it be backported to release 4.7 as well ?

Thx,
Ronald

Comment 14 errata-xmlrpc 2021-10-18 17:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.