Hide Forgot
Description of problem: If the replicas of router is equal to the number of worker nodes, after updating the ingresscontroller the new deployed router pod always in "Pending" status. Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-03-15-063749 How reproducible: 100% Steps to Reproduce: 1. install 4.0 cluster 2. if the worker node is 3, then update ingresscontroller and configure the replicas to 3 (default is 2). $ oc -n openshift-ingress-operator patch ingresscontrollers.operator.openshift.io default -p '{"spec":{"replicas":3}}' --type=merge 3. wait for all 3 router pods are running, then edit ingresscontroller and add "spec.namespaceSelector" to trigger new route pod to be deployed Actual results: the new route pod is in "Pending" status, the events show it is related to anti-affinity rules. $ oc get pod -n openshift-ingress NAME READY STATUS RESTARTS AGE router-default-7cb654b489-bbz79 0/1 Pending 0 20m router-default-f56566446-2x5z2 1/1 Running 0 28m router-default-f56566446-89bjl 1/1 Running 0 22m router-default-f56566446-ft9h5 1/1 Running 0 28m $ oc get rs -n openshift-ingress NAME DESIRED CURRENT READY AGE router-default-7cb654b489 1 1 0 10m router-default-f56566446 3 3 3 131m $ oc -n openshift-ingress describe pod router-default-7cb654b489-bbz79 Name: router-default-7cb654b489-bbz79 Namespace: openshift-ingress Priority: 2000000000 PriorityClassName: system-cluster-critical Node: <none> Labels: app=router ingress.openshift.io/component=ingress-controller pod-template-hash=7cb654b489 router=router-default Annotations: openshift.io/scc: restricted Status: Pending IP: Controlled By: ReplicaSet/router-default-7cb654b489 <---snip---> Node-Selectors: beta.kubernetes.io/os=linux node-role.kubernetes.io/worker= Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 66s (x4 over 67s) default-scheduler 0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules. Expected results: the new deployed router pod should be running Additional info: No this issue if the replicas of router pod is less then the number of worker node.
Anti-Affinity was updated to make the rules preferred rather than required, allowing surge pods to schedule during a deployment.
will verify with next nightly build which contains the fix.
verified with 4.0.0-0.nightly-2019-03-23-222829 and the issue has been fixed. $ oc get pod -n openshift-ingress -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE router-default-65dc774d97-6hw5z 1/1 Running 0 69s 10.129.2.12 ip-172-31-134-125.ec2.internal <none> router-default-65dc774d97-b8wh2 1/1 Running 0 92s 10.131.0.12 ip-172-31-151-75.ec2.internal <none> router-default-65dc774d97-mkvmm 1/1 Running 0 51s 10.128.2.10 ip-172-31-162-21.ec2.internal <none> $ oc get rs -n openshift-ingress NAME DESIRED CURRENT READY AGE router-default-65dc774d97 3 3 3 4m46s router-default-86695d48b 0 0 0 59m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758