Bug 1689779 - new deployed router pod always in pending status when updating ingresscontroller
Summary: new deployed router pod always in pending status when updating ingresscontroller
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Dan Mace
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2019-03-18 06:00 UTC by Hongan Li
Modified: 2022-08-04 22:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-06-04 10:46:01 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 167 0 None closed Bug 1687940: deployment: fix scope of pod anti-affinity 2021-01-11 23:51:53 UTC
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:46:06 UTC

Description Hongan Li 2019-03-18 06:00:38 UTC
Description of problem:
If the replicas of router is equal to the number of worker nodes, after updating the ingresscontroller the new deployed router pod always in "Pending" status.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. install 4.0 cluster
2. if the worker node is 3, then update ingresscontroller and configure the replicas to 3 (default is 2).
$ oc -n openshift-ingress-operator patch ingresscontrollers.operator.openshift.io default -p '{"spec":{"replicas":3}}' --type=merge
3. wait for all 3 router pods are running, then edit ingresscontroller and add "spec.namespaceSelector" to trigger new route pod to be deployed 

Actual results:
the new route pod is in "Pending" status, the events show it is related to anti-affinity rules.

$ oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cb654b489-bbz79   0/1     Pending   0          20m
router-default-f56566446-2x5z2    1/1     Running   0          28m
router-default-f56566446-89bjl    1/1     Running   0          22m
router-default-f56566446-ft9h5    1/1     Running   0          28m

$ oc get rs -n openshift-ingress
NAME                        DESIRED   CURRENT   READY   AGE
router-default-7cb654b489   1         1         0       10m
router-default-f56566446    3         3         3       131m

$ oc -n openshift-ingress describe pod router-default-7cb654b489-bbz79
Name:               router-default-7cb654b489-bbz79
Namespace:          openshift-ingress
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               <none>
Labels:             app=router
Annotations:        openshift.io/scc: restricted
Status:             Pending
Controlled By:      ReplicaSet/router-default-7cb654b489


Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  66s (x4 over 67s)  default-scheduler  0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules.

Expected results:
the new deployed router pod should be running

Additional info:
No this issue if the replicas of router pod is less then the number of worker node.

Comment 1 Dan Mace 2019-03-21 12:53:57 UTC
Anti-Affinity was updated to make the rules preferred rather than required, allowing surge pods to schedule during a deployment.

Comment 3 Hongan Li 2019-03-22 05:51:15 UTC
will verify with next nightly build which contains the fix.

Comment 4 Hongan Li 2019-03-25 01:30:49 UTC
verified with 4.0.0-0.nightly-2019-03-23-222829 and the issue has been fixed.

$ oc get pod -n openshift-ingress -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE                             NOMINATED NODE
router-default-65dc774d97-6hw5z   1/1     Running   0          69s   ip-172-31-134-125.ec2.internal   <none>
router-default-65dc774d97-b8wh2   1/1     Running   0          92s   ip-172-31-151-75.ec2.internal    <none>
router-default-65dc774d97-mkvmm   1/1     Running   0          51s   ip-172-31-162-21.ec2.internal    <none>

$ oc get rs -n openshift-ingress
NAME                        DESIRED   CURRENT   READY   AGE
router-default-65dc774d97   3         3         3       4m46s
router-default-86695d48b    0         0         0       59m

Comment 6 errata-xmlrpc 2019-06-04 10:46:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.