Description of problem: Currently, all the image registry deployment replicas can land on single node which causes single point of failure. In order to highly available, it should set the affinities to `requiredDuringSchedulingIgnoredDuringExecution` instead of `PreferredDuringSchedulingIgnoredDuringExecution` Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Setting baremetal SNO cluster, the default image registry replicas=1, then raise up the replicas to 2, check the image registry pod running. The new two pods can't be running due to not match pod anti-affinity rules. $oc get pods NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-7879c9b8bb-bwp8d 1/1 Running 5 17h image-pruner-27015840-hcmhx 0/1 Completed 0 3h43m image-registry-5f5cbb89c4-qz7mv 0/1 Pending 0 2m27s image-registry-5f5cbb89c4-rg7rv 0/1 Pending 0 2m27s image-registry-66dd4f45fb-ljj8n 1/1 Running 0 17h node-ca-k4nxf 1/1 Running 0 17h oc describe pods image-registry-5f5cbb89c4-qz7mv Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 5s (x5 over 2m20s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules. Verified on 4.8.0-0.nightly-2021-05-13-002125 cluster.
On a multiple nodes(3 masters,3 workers)cluster, when set replicas to 3, the 3 image registry pods schedule the same nodes sometimes.Since the pod anti-affinity is following preferredDuringSchedulingIgnoredDuringExecution rules. Should we consider this scenario? $oc patch config.image cluster -p '{"spec":{"replicas":3}}' --type=merge $oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-5b597c45bc-2msvq 1/1 Running 0 46m 10.129.0.68 ip-10-0-56-47.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-8f74x 1/1 Running 0 60s 10.128.2.29 ip-10-0-77-236.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-gz2jh 1/1 Running 0 81s 10.128.2.27 ip-10-0-77-236.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-w4pq6 1/1 Running 0 70s 10.128.2.28 ip-10-0-77-236.us-east-2.compute.internal <none> <none> $oc get pods image-registry-77d5f86878-4jtgc -o json | jq -r '[.spec.affinity]' [ { "podAntiAffinity": { "preferredDuringSchedulingIgnoredDuringExecution": [ { "podAffinityTerm": { "labelSelector": { "matchLabels": { "docker-registry": "default" } }, "namespaces": [ "openshift-image-registry" ], "topologyKey": "kubernetes.io/hostname" }, "weight": 100 } ] } } ]
When we have two replicas we start to require (requiredDuringSchedulingIgnoredDuringExecution) the pods to be scheduled in different nodes. If the number of replicas is higher than 2 then we prefer (preferredDuringSchedulingIgnoredDuringExecution) them to run in different nodes but do not enforce it. This fix also added rules for maxUnavailable (1) and maxSurge (1) when the number of replicas is 2.
Ricardo, thank you for explaining that.
Hi, Will it be backported in 4.7? I have cu who is looking for similar flexibility to change hard anti-affinity rules. Currently, it is set by default as "preferredDuringSchedulingIgnoredDuringExecution", however, cu would like to change it to the "requiredDuringSchedulingIgnoredDuringExecution" so that the pods must get scheduled on a different node if the condition matches. Cu is running on OCP4.7 Let me know if I need to open a separate bug for 4.7? Thanks & Regards, Ganesh Gore
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days