Description of problem: Rolling update of router-default deployment is not possible Customer has a cluster with 3 infra nodes and the router-default deployment is scaled to 3. Attempts to redeploy router pods as part of the a rolling update, fails because the new pods are unschedulable. The deployment is not able to handle the redeploy as there is no node with a free port available but this always worked on 3.11. Please also note that the router is using the endpointPublishingStrategy type: HostNetwork Version-Release number of selected component (if applicable): OCP 4 HTB How reproducible: 100% Steps to Reproduce: 1. Set cluster with 3 infra nodes, router pod scaled to 3 replicas, and node placement limited to infra nodes 2. Observe the 3 router pods run on the 3 infra nodes. 3. Make a change which requires a rolling restart of the router pods Actual results: New instance of the router pod cannot complete deployment because the scheduler cannot find a node to place it on. Expected results: The new pods are deployed successfully. Additional info: This worked fine in 3.11
looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1689779 but it has been fixed.
Confirmed that no issue with latest 4.0.0-0.nightly-2019-04-18-170158 build on AWS, but the difference is the `endpointPublishingStrategy` setting in ingresscontroller, it might be port conflict if using HostNetwork when rolling update. ---AWS--- endpointPublishingStrategy: type: LoadBalancerService ---Customer--- endpointPublishingStrategy: type: HostNetwork
Just got a test env on Bare Metal which using `HostNetwork` and can reproduce this issue: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-04-18-170158 True False 52m Cluster version is 4.0.0-0.nightly-2019-04-18-170158 $ oc get node NAME STATUS ROLES AGE VERSION dell-r730-063.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d dell-r730-064.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d dell-r730-065.dsal.lab.eng.rdu2.redhat.com Ready master 70m v1.13.4+d4ce02c1d dell-r730-066.dsal.lab.eng.rdu2.redhat.com Ready worker 70m v1.13.4+d4ce02c1d dell-r730-067.dsal.lab.eng.rdu2.redhat.com Ready worker 70m v1.13.4+d4ce02c1d $ oc -n openshift-ingress get rs NAME DESIRED CURRENT READY AGE router-default-69dc5c9b8c 2 2 2 59m router-default-6d77f7444f 1 1 0 6m21s $ oc -n openshift-ingress get pod NAME READY STATUS RESTARTS AGE router-default-69dc5c9b8c-wcqqw 1/1 Running 0 59m router-default-69dc5c9b8c-xvxh6 1/1 Running 0 59m router-default-6d77f7444f-wndvc 0/1 Pending 0 6m32s $ oc -n openshift-ingress describe pod router-default-6d77f7444f-wndvc <---snip---> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 48s (x25 over 2m54s) default-scheduler 0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.
In the HostNetwork setup and with the router containers use a static port on the host interface. Therefore, additional nodes are required to surge a rolling deployment. This is also the case in 3.x given a router with the same rolling update parameters. The primary difference is that the end-user can control those parameters in 3.x. In 4.x, the default rolling deployment parameters are max surge 25% and max unavailable 25%. The absolute value of the proportional max unavailable percent is rounded down using a floor function [1]. Given a worker node pool of 3, this means the max unavailable value is zero. Given the default install topology (3 workers), the host network constraint, and the rolling update parameters (which are immutable), the only way to execute the in-place rolling upgrade in this case would be to add more workers. For now, this can be a documentation issue. Going forward, we can consider things like: 1. Changing our default rolling update parameters 2. Exposing the rolling update parameters through the configuration API [1] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
I did a little more digging here, and found the underlying difference between our 3.x setup. https://github.com/openshift/origin/blob/master/pkg/apps/util/util.go#L397 https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/util/deployment_util.go#L880 In 3.x we're setting surge to 0, which triggers this fencepost condition to set a floor of 1 for unavailability even when the spec value is proportional. We are going to consider doing the same. I'm going to keep this bug open while we evaluate our defaults.
We're going to fix this by making the deployment strategy dynamic with https://github.com/openshift/cluster-ingress-operator/pull/219.
Verified with 4.1.0-0.nightly-2019-05-04-210601 on vSphere and issue has been fixed. $ oc get deployment/router-default -n openshift-ingress -o yaml <---snip---> spec: strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 25% type: RollingUpdate
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758