Bug 1959194

Summary: Ingress controller should use minReadySeconds because otherwise it is disrupted during deployment updates
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NetworkingAssignee: Clayton Coleman <ccoleman>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, mmasters
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:07:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2021-05-10 21:45:51 UTC
Deployments with replicas=2 and maxUnavailable!=0 have a subtle behavior - the moment the deployment controller sees that the new pod is ready, it deletes the old pod. That delete propagates fast - faster than a load balancer might see it.

So if you had a LB that was checking for readiness, you'd potentially be at risk in the default config of having the old pod removed before the new pod was fully in rotation. By default we will recommend 30s to bring ingress / api in and out of rotation (i.e. set (healthy/healthy threshold +1) * interval to be < 30s), so by setting minReady we ensure consistency there.  Experimentally in the wild it takes about 30s for kube-proxy events to reach all nodes even under heavy iptables contention, so 30s works well for simply waiting long enough to ensure all nodes see the update when the endpoints are changed.

The ingress controller is the only component that must make this change at this time, but any future service load balancer exposed component should follow in its footsteps.  kube-apiserver currently mitigates a bug in AWS load balancers by waiting significantly longer - that is not necessary here because the kube-proxy routes requests from other nodes (with https://github.com/openshift/cluster-ingress-operator/pull/609 going into 4.8) and so any node behind the LB can still send to the right target.

Comment 2 Hongan Li 2021-05-24 08:00:34 UTC
verified with 4.8.0-0.nightly-2021-05-21-233425 and passed.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-21-233425   True        False         5h18m   Cluster version is 4.8.0-0.nightly-2021-05-21-233425

$ oc -n openshift-ingress get deploy/router-default -oyaml
<---snip--->
spec:
  minReadySeconds: 30
  progressDeadlineSeconds: 600
  replicas: 2

check pod status during deployment updates:
$ oc -n openshift-ingress get pod 
NAME                             READY   STATUS        RESTARTS   AGE
router-default-6467bf666-kjrmm   1/1     Running       0          29s
router-default-6467bf666-tbw7m   1/1     Running       0          29s
router-default-c4cdc666d-cc64l   0/1     Terminating   0          12m
router-default-c4cdc666d-fgzgq   1/1     Running       0          12m
...

$ oc -n openshift-ingress get pod 
NAME                             READY   STATUS        RESTARTS   AGE
router-default-6467bf666-kjrmm   1/1     Running       0          36s
router-default-6467bf666-tbw7m   1/1     Running       0          36s
router-default-c4cdc666d-cc64l   0/1     Terminating   0          12m
router-default-c4cdc666d-fgzgq   1/1     Terminating   0          12m

Comment 5 errata-xmlrpc 2021-07-27 23:07:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438