Bug 1701392

Summary:	[OCP4 Beta] Rolling update of router-default deployment is not possible
Product:	OpenShift Container Platform	Reporter:	Stuart Auchterlonie <sauchter>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	urgent	CC:	aos-bugs, bbennett, florin-alexandru.peter, jokerman, mmccomas
Version:	4.1.0	Keywords:	BetaBlocker
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:47:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Stuart Auchterlonie 2019-04-18 20:35:44 UTC

Description of problem:

Rolling update of router-default deployment is not possible

Customer has a cluster with 3 infra nodes and the 
router-default deployment is scaled to 3.

Attempts to redeploy router pods as part of the a rolling
update, fails because the new pods are unschedulable.

The deployment is not able to handle the redeploy as there is 
no node with a free port available but this always worked on 3.11.

Please also note that the router is using the 
endpointPublishingStrategy type: HostNetwork

Version-Release number of selected component (if applicable):

OCP 4 HTB

How reproducible: 100%


Steps to Reproduce:
1. Set cluster with 3 infra nodes, router pod scaled to 3 replicas,
   and node placement limited to infra nodes
2. Observe the 3 router pods run on the 3 infra nodes.
3. Make a change which requires a rolling restart of the router pods

Actual results:

New instance of the router pod cannot complete deployment because
the scheduler cannot find a node to place it on.

Expected results:

The new pods are deployed successfully.

Additional info:

This worked fine in 3.11

Comment 3 Hongan Li 2019-04-19 01:44:20 UTC

looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1689779 but it has been fixed.

Comment 4 Hongan Li 2019-04-19 02:51:51 UTC

Confirmed that no issue with latest 4.0.0-0.nightly-2019-04-18-170158 build on AWS, but the difference is the `endpointPublishingStrategy` setting in ingresscontroller, it might be port conflict if using HostNetwork when rolling update.

---AWS---
  endpointPublishingStrategy:
    type: LoadBalancerService

---Customer---
  endpointPublishingStrategy:
    type: HostNetwork

Comment 5 Hongan Li 2019-04-19 06:52:38 UTC

Just got a test env on Bare Metal which using `HostNetwork` and can reproduce this issue:

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-18-170158   True        False         52m     Cluster version is 4.0.0-0.nightly-2019-04-18-170158

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
dell-r730-063.dsal.lab.eng.rdu2.redhat.com   Ready    master   70m   v1.13.4+d4ce02c1d
dell-r730-064.dsal.lab.eng.rdu2.redhat.com   Ready    master   70m   v1.13.4+d4ce02c1d
dell-r730-065.dsal.lab.eng.rdu2.redhat.com   Ready    master   70m   v1.13.4+d4ce02c1d
dell-r730-066.dsal.lab.eng.rdu2.redhat.com   Ready    worker   70m   v1.13.4+d4ce02c1d
dell-r730-067.dsal.lab.eng.rdu2.redhat.com   Ready    worker   70m   v1.13.4+d4ce02c1d

$ oc -n openshift-ingress get rs
NAME                        DESIRED   CURRENT   READY   AGE
router-default-69dc5c9b8c   2         2         2       59m
router-default-6d77f7444f   1         1         0       6m21s

$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-69dc5c9b8c-wcqqw   1/1     Running   0          59m
router-default-69dc5c9b8c-xvxh6   1/1     Running   0          59m
router-default-6d77f7444f-wndvc   0/1     Pending   0          6m32s

$ oc -n openshift-ingress describe pod router-default-6d77f7444f-wndvc
<---snip--->
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  48s (x25 over 2m54s)  default-scheduler  0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.

Comment 6 Dan Mace 2019-04-24 13:55:19 UTC

In the HostNetwork setup and with the router containers use a static port on the host interface. Therefore, additional nodes are required to surge a rolling deployment. This is also the case in 3.x given a router with the same rolling update parameters. The primary difference is that the end-user can control those parameters in 3.x.

In 4.x, the default rolling deployment parameters are max surge 25% and max unavailable 25%. The absolute value of the proportional max unavailable percent is rounded down using a floor function [1]. Given a worker node pool of 3, this means the max unavailable value is zero.

Given the default install topology (3 workers), the host network constraint, and the rolling update parameters (which are immutable), the only way to execute the in-place rolling upgrade in this case would be to add more workers.

For now, this can be a documentation issue. Going forward, we can consider things like:

1. Changing our default rolling update parameters
2. Exposing the rolling update parameters through the configuration API

[1] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment

Comment 7 Dan Mace 2019-04-26 12:59:03 UTC

I did a little more digging here, and found the underlying difference between our 3.x setup.

https://github.com/openshift/origin/blob/master/pkg/apps/util/util.go#L397
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/util/deployment_util.go#L880

In 3.x we're setting surge to 0, which triggers this fencepost condition to set a floor of 1 for unavailability even when the spec value is proportional.

We are going to consider doing the same. I'm going to keep this bug open while we evaluate our defaults.

Comment 8 Dan Mace 2019-04-26 14:53:29 UTC

We're going to fix this by making the deployment strategy dynamic with https://github.com/openshift/cluster-ingress-operator/pull/219.

Comment 11 Hongan Li 2019-05-05 05:42:25 UTC

Verified with 4.1.0-0.nightly-2019-05-04-210601 on vSphere and issue has been fixed.

$ oc get deployment/router-default -n openshift-ingress -o yaml
<---snip--->
spec:
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 25%
    type: RollingUpdate

Comment 13 errata-xmlrpc 2019-06-04 10:47:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758