Bug 1945387
Summary: | Image Registry deployment should have 2 replicas and hard anti-affinity rules | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | ravig <rgudimet> |
Component: | Image Registry | Assignee: | Ricardo Maraschini <rmarasch> |
Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.8 | CC: | aos-bugs, gagore, mrobson, oarribas, obulatov, rmarasch, wewang, wking, xiuwang |
Target Milestone: | --- | Keywords: | TestCaseNeeded |
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Image registry pods were being scheduled on the same node.
Consequence:
In case of problems in the node where pods were scheduled the registry was becoming unavailable for a while.
Fix:
When we have only two replicas we start to require (requiredDuringSchedulingIgnoredDuringExecution) the pods to be scheduled in different nodes. If the number of replicas is higher than 2 then we prefer (preferredDuringSchedulingIgnoredDuringExecution) them to run in different nodes but do not enforce it. This fix also added rules for maxUnavailable (1) and maxSurge (1) when the number of replicas is 2.
Result:
Image registry pods are fairly distributed among the nodes, allowing nodes to fail without making the registry unavailable.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:57:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1973693, 1986486 |
Description
ravig
2021-03-31 18:39:37 UTC
Setting baremetal SNO cluster, the default image registry replicas=1, then raise up the replicas to 2, check the image registry pod running. The new two pods can't be running due to not match pod anti-affinity rules. $oc get pods NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-7879c9b8bb-bwp8d 1/1 Running 5 17h image-pruner-27015840-hcmhx 0/1 Completed 0 3h43m image-registry-5f5cbb89c4-qz7mv 0/1 Pending 0 2m27s image-registry-5f5cbb89c4-rg7rv 0/1 Pending 0 2m27s image-registry-66dd4f45fb-ljj8n 1/1 Running 0 17h node-ca-k4nxf 1/1 Running 0 17h oc describe pods image-registry-5f5cbb89c4-qz7mv Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 5s (x5 over 2m20s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules. Verified on 4.8.0-0.nightly-2021-05-13-002125 cluster. On a multiple nodes(3 masters,3 workers)cluster, when set replicas to 3, the 3 image registry pods schedule the same nodes sometimes.Since the pod anti-affinity is following preferredDuringSchedulingIgnoredDuringExecution rules. Should we consider this scenario? $oc patch config.image cluster -p '{"spec":{"replicas":3}}' --type=merge $oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-5b597c45bc-2msvq 1/1 Running 0 46m 10.129.0.68 ip-10-0-56-47.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-8f74x 1/1 Running 0 60s 10.128.2.29 ip-10-0-77-236.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-gz2jh 1/1 Running 0 81s 10.128.2.27 ip-10-0-77-236.us-east-2.compute.internal <none> <none> image-registry-77d5f86878-w4pq6 1/1 Running 0 70s 10.128.2.28 ip-10-0-77-236.us-east-2.compute.internal <none> <none> $oc get pods image-registry-77d5f86878-4jtgc -o json | jq -r '[.spec.affinity]' [ { "podAntiAffinity": { "preferredDuringSchedulingIgnoredDuringExecution": [ { "podAffinityTerm": { "labelSelector": { "matchLabels": { "docker-registry": "default" } }, "namespaces": [ "openshift-image-registry" ], "topologyKey": "kubernetes.io/hostname" }, "weight": 100 } ] } } ] When we have two replicas we start to require (requiredDuringSchedulingIgnoredDuringExecution) the pods to be scheduled in different nodes. If the number of replicas is higher than 2 then we prefer (preferredDuringSchedulingIgnoredDuringExecution) them to run in different nodes but do not enforce it. This fix also added rules for maxUnavailable (1) and maxSurge (1) when the number of replicas is 2. Ricardo, thank you for explaining that. Hi, Will it be backported in 4.7? I have cu who is looking for similar flexibility to change hard anti-affinity rules. Currently, it is set by default as "preferredDuringSchedulingIgnoredDuringExecution", however, cu would like to change it to the "requiredDuringSchedulingIgnoredDuringExecution" so that the pods must get scheduled on a different node if the condition matches. Cu is running on OCP4.7 Let me know if I need to open a separate bug for 4.7? Thanks & Regards, Ganesh Gore Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |