Description of problem:
Rolling update of router-default deployment is not possible
Customer has a cluster with 3 infra nodes and the
router-default deployment is scaled to 3.
Attempts to redeploy router pods as part of the a rolling
update, fails because the new pods are unschedulable.
The deployment is not able to handle the redeploy as there is
no node with a free port available but this always worked on 3.11.
Please also note that the router is using the
endpointPublishingStrategy type: HostNetwork
Version-Release number of selected component (if applicable):
OCP 4 HTB
How reproducible: 100%
Steps to Reproduce:
1. Set cluster with 3 infra nodes, router pod scaled to 3 replicas,
and node placement limited to infra nodes
2. Observe the 3 router pods run on the 3 infra nodes.
3. Make a change which requires a rolling restart of the router pods
New instance of the router pod cannot complete deployment because
the scheduler cannot find a node to place it on.
The new pods are deployed successfully.
This worked fine in 3.11
looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1689779 but it has been fixed.
Confirmed that no issue with latest 4.0.0-0.nightly-2019-04-18-170158 build on AWS, but the difference is the `endpointPublishingStrategy` setting in ingresscontroller, it might be port conflict if using HostNetwork when rolling update.
Just got a test env on Bare Metal which using `HostNetwork` and can reproduce this issue:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.0.0-0.nightly-2019-04-18-170158 True False 52m Cluster version is 4.0.0-0.nightly-2019-04-18-170158
$ oc get node
NAME STATUS ROLES AGE VERSION
dell-r730-063.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d
dell-r730-064.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d
dell-r730-065.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d
dell-r730-066.dsal.lab.eng.rdu2.redhat.com Ready worker 70m v1.13.4+d4ce02c1d
dell-r730-067.dsal.lab.eng.rdu2.redhat.com Ready worker 70m v1.13.4+d4ce02c1d
$ oc -n openshift-ingress get rs
NAME DESIRED CURRENT READY AGE
router-default-69dc5c9b8c 2 2 2 59m
router-default-6d77f7444f 1 1 0 6m21s
$ oc -n openshift-ingress get pod
NAME READY STATUS RESTARTS AGE
router-default-69dc5c9b8c-wcqqw 1/1 Running 0 59m
router-default-69dc5c9b8c-xvxh6 1/1 Running 0 59m
router-default-6d77f7444f-wndvc 0/1 Pending 0 6m32s
$ oc -n openshift-ingress describe pod router-default-6d77f7444f-wndvc
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s (x25 over 2m54s) default-scheduler 0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.
In the HostNetwork setup and with the router containers use a static port on the host interface. Therefore, additional nodes are required to surge a rolling deployment. This is also the case in 3.x given a router with the same rolling update parameters. The primary difference is that the end-user can control those parameters in 3.x.
In 4.x, the default rolling deployment parameters are max surge 25% and max unavailable 25%. The absolute value of the proportional max unavailable percent is rounded down using a floor function . Given a worker node pool of 3, this means the max unavailable value is zero.
Given the default install topology (3 workers), the host network constraint, and the rolling update parameters (which are immutable), the only way to execute the in-place rolling upgrade in this case would be to add more workers.
For now, this can be a documentation issue. Going forward, we can consider things like:
1. Changing our default rolling update parameters
2. Exposing the rolling update parameters through the configuration API
I did a little more digging here, and found the underlying difference between our 3.x setup.
In 3.x we're setting surge to 0, which triggers this fencepost condition to set a floor of 1 for unavailability even when the spec value is proportional.
We are going to consider doing the same. I'm going to keep this bug open while we evaluate our defaults.
We're going to fix this by making the deployment strategy dynamic with https://github.com/openshift/cluster-ingress-operator/pull/219.
Verified with 4.1.0-0.nightly-2019-05-04-210601 on vSphere and issue has been fixed.
$ oc get deployment/router-default -n openshift-ingress -o yaml
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.