1945387 – Image Registry deployment should have 2 replicas and hard anti-affinity rules

Bug 1945387 - Image Registry deployment should have 2 replicas and hard anti-affinity rules

Summary: Image Registry deployment should have 2 replicas and hard anti-affinity rules

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Ricardo Maraschini
QA Contact:	XiuJuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1973693 1986486
TreeView+	depends on / blocked

Reported:	2021-03-31 18:39 UTC by ravig
Modified:	2023-09-18 00:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Image registry pods were being scheduled on the same node. Consequence: In case of problems in the node where pods were scheduled the registry was becoming unavailable for a while. Fix: When we have only two replicas we start to require (requiredDuringSchedulingIgnoredDuringExecution) the pods to be scheduled in different nodes. If the number of replicas is higher than 2 then we prefer (preferredDuringSchedulingIgnoredDuringExecution) them to run in different nodes but do not enforce it. This fix also added rules for maxUnavailable (1) and maxSurge (1) when the number of replicas is 2. Result: Image registry pods are fairly distributed among the nodes, allowing nodes to fail without making the registry unavailable.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:57:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-image-registry-operator pull 681	None	open	Bug 1945387: Setting required pod anti-affinity rules	2021-04-29 09:58:36 UTC
Red Hat Knowledge Base (Solution)	5397921	None	None	None	2021-09-22 15:36:03 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:57:28 UTC

Description ravig 2021-03-31 18:39:37 UTC

Description of problem:
Currently, all the image registry deployment replicas can land on single node which causes single point of failure. In order to highly available, it should set the affinities to `requiredDuringSchedulingIgnoredDuringExecution` instead of `PreferredDuringSchedulingIgnoredDuringExecution`

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 XiuJuan Wang 2021-05-14 03:46:13 UTC

Setting baremetal SNO cluster, the default image registry replicas=1, then raise up the replicas to 2, check the image registry pod running.
The new two pods can't be running due to not match pod anti-affinity rules.

$oc get pods 
NAME                                               READY   STATUS      RESTARTS   AGE
cluster-image-registry-operator-7879c9b8bb-bwp8d   1/1     Running     5          17h
image-pruner-27015840-hcmhx                        0/1     Completed   0          3h43m
image-registry-5f5cbb89c4-qz7mv                    0/1     Pending     0          2m27s
image-registry-5f5cbb89c4-rg7rv                    0/1     Pending     0          2m27s
image-registry-66dd4f45fb-ljj8n                    1/1     Running     0          17h
node-ca-k4nxf                                      1/1     Running     0          17h


oc describe pods image-registry-5f5cbb89c4-qz7mv

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  5s (x5 over 2m20s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules.

Verified on 4.8.0-0.nightly-2021-05-13-002125 cluster.

Comment 6 XiuJuan Wang 2021-05-14 06:03:03 UTC

On a multiple nodes(3 masters,3 workers)cluster, when set replicas to 3, the 3 image registry pods schedule the same nodes sometimes.Since the pod anti-affinity is following preferredDuringSchedulingIgnoredDuringExecution rules.

Should we consider this scenario?

$oc patch config.image cluster -p '{"spec":{"replicas":3}}' --type=merge

$oc get pod -o wide
NAME                                               READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
cluster-image-registry-operator-5b597c45bc-2msvq   1/1     Running   0          46m    10.129.0.68   ip-10-0-56-47.us-east-2.compute.internal    <none>           <none>
image-registry-77d5f86878-8f74x                    1/1     Running   0          60s    10.128.2.29   ip-10-0-77-236.us-east-2.compute.internal   <none>           <none>
image-registry-77d5f86878-gz2jh                    1/1     Running   0          81s    10.128.2.27   ip-10-0-77-236.us-east-2.compute.internal   <none>           <none>
image-registry-77d5f86878-w4pq6                    1/1     Running   0          70s    10.128.2.28   ip-10-0-77-236.us-east-2.compute.internal   <none>           <none>

$oc get pods image-registry-77d5f86878-4jtgc -o json  | jq -r '[.spec.affinity]'
[
  {
    "podAntiAffinity": {
      "preferredDuringSchedulingIgnoredDuringExecution": [
        {
          "podAffinityTerm": {
            "labelSelector": {
              "matchLabels": {
                "docker-registry": "default"
              }
            },
            "namespaces": [
              "openshift-image-registry"
            ],
            "topologyKey": "kubernetes.io/hostname"
          },
          "weight": 100
        }
      ]
    }
  }
]

Comment 7 Ricardo Maraschini 2021-05-14 07:35:45 UTC

When we have two replicas we start to require (requiredDuringSchedulingIgnoredDuringExecution) the pods to be scheduled in different nodes. If the number of replicas is higher than 2 then we prefer (preferredDuringSchedulingIgnoredDuringExecution) them to run in different nodes but do not enforce it. This fix also added rules for maxUnavailable (1) and maxSurge (1) when the number of replicas is 2.

Comment 8 XiuJuan Wang 2021-05-14 09:35:39 UTC

Ricardo, thank you for explaining that.

Comment 10 Ganesh Gore 2021-06-17 17:56:20 UTC

Hi,

Will it be backported in 4.7?

I have cu who is looking for similar flexibility to change hard anti-affinity rules.

Currently, it is set by default as "preferredDuringSchedulingIgnoredDuringExecution", however, cu would like to change it to the "requiredDuringSchedulingIgnoredDuringExecution" so that the pods must get scheduled on a different node if the condition matches.

Cu is running on OCP4.7

Let me know if I need to open a separate bug for 4.7?

Thanks & Regards,
Ganesh Gore

Comment 12 errata-xmlrpc 2021-07-27 22:57:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 14 Red Hat Bugzilla 2023-09-18 00:25:32 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.