Bug 1689779 - new deployed router pod always in pending status when updating ingresscontroller
Summary: new deployed router pod always in pending status when updating ingresscontroller
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-18 06:00 UTC by Hongan Li
Modified: 2019-06-04 10:46 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:46:01 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:46:06 UTC
Github openshift cluster-ingress-operator pull 167 None None None 2019-03-21 12:53:57 UTC

Description Hongan Li 2019-03-18 06:00:38 UTC
Description of problem:
If the replicas of router is equal to the number of worker nodes, after updating the ingresscontroller the new deployed router pod always in "Pending" status.


Version-Release number of selected component (if applicable):
4.0.0-0.nightly-2019-03-15-063749

How reproducible:
100%

Steps to Reproduce:
1. install 4.0 cluster
2. if the worker node is 3, then update ingresscontroller and configure the replicas to 3 (default is 2).
$ oc -n openshift-ingress-operator patch ingresscontrollers.operator.openshift.io default -p '{"spec":{"replicas":3}}' --type=merge
 
3. wait for all 3 router pods are running, then edit ingresscontroller and add "spec.namespaceSelector" to trigger new route pod to be deployed 


Actual results:
the new route pod is in "Pending" status, the events show it is related to anti-affinity rules.

$ oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cb654b489-bbz79   0/1     Pending   0          20m
router-default-f56566446-2x5z2    1/1     Running   0          28m
router-default-f56566446-89bjl    1/1     Running   0          22m
router-default-f56566446-ft9h5    1/1     Running   0          28m

$ oc get rs -n openshift-ingress
NAME                        DESIRED   CURRENT   READY   AGE
router-default-7cb654b489   1         1         0       10m
router-default-f56566446    3         3         3       131m

$ oc -n openshift-ingress describe pod router-default-7cb654b489-bbz79
Name:               router-default-7cb654b489-bbz79
Namespace:          openshift-ingress
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               <none>
Labels:             app=router
                    ingress.openshift.io/component=ingress-controller
                    pod-template-hash=7cb654b489
                    router=router-default
Annotations:        openshift.io/scc: restricted
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/router-default-7cb654b489

<---snip--->

Node-Selectors:  beta.kubernetes.io/os=linux
                 node-role.kubernetes.io/worker=
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  66s (x4 over 67s)  default-scheduler  0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules.



Expected results:
the new deployed router pod should be running

Additional info:
No this issue if the replicas of router pod is less then the number of worker node.

Comment 1 Dan Mace 2019-03-21 12:53:57 UTC
Anti-Affinity was updated to make the rules preferred rather than required, allowing surge pods to schedule during a deployment.

Comment 3 Hongan Li 2019-03-22 05:51:15 UTC
will verify with next nightly build which contains the fix.

Comment 4 Hongan Li 2019-03-25 01:30:49 UTC
verified with 4.0.0-0.nightly-2019-03-23-222829 and the issue has been fixed.


$ oc get pod -n openshift-ingress -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE                             NOMINATED NODE
router-default-65dc774d97-6hw5z   1/1     Running   0          69s   10.129.2.12   ip-172-31-134-125.ec2.internal   <none>
router-default-65dc774d97-b8wh2   1/1     Running   0          92s   10.131.0.12   ip-172-31-151-75.ec2.internal    <none>
router-default-65dc774d97-mkvmm   1/1     Running   0          51s   10.128.2.10   ip-172-31-162-21.ec2.internal    <none>

$ oc get rs -n openshift-ingress
NAME                        DESIRED   CURRENT   READY   AGE
router-default-65dc774d97   3         3         3       4m46s
router-default-86695d48b    0         0         0       59m

Comment 6 errata-xmlrpc 2019-06-04 10:46:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.