Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1943845

Summary:	Router pods should have startup probes configured
Product:	OpenShift Container Platform	Reporter:	Miciah Dashiel Butler Masters <mmasters>
Component:	Networking	Assignee:	Miciah Dashiel Butler Masters <mmasters>
Networking sub component:	router	QA Contact:	Arvind iyengar <aiyengar>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aiyengar, aos-bugs
Version:	4.8
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 22:56:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1996897

Description Miciah Dashiel Butler Masters 2021-03-27 20:10:40 UTC

Description of problem:

Router pods should specify startup probes for the router container.  Without a startup probe, the kubelet starts performing liveness probes after the liveness probe's initial delay of 10 seconds, and if the router takes a long time to synchronize (for example because the cluster has an extremely large number of routes or endpoints), the liveness probe can cause the kubelet to restart the container before the router has even finished its initial synchronization.  

The deployment should specify a startup probe that allows a generous amount of time (for example, 2 minutes) to give the router time to start up if initial synchronization takes a substantial amount of time but still have the kubelet start performing liveness and readiness probes quickly if the initial synchronization is quick.  


Version-Release number of selected component (if applicable):

Startup probes graduated to beta in Kubernetes 1.18 (OpenShift 4.6) and to stable in Kubernetes 1.20.  See <https://github.com/kubernetes/enhancements/blob/c1cec820b3b3d0fa18dede73107a2cbb43e27e33/keps/sig-node/950-liveness-probe-holdoff/README.md#implementation-history>.


How reproducible:

100%.


Steps to Reproduce:

1. Check the default router deployment's definition:

    oc -n openshift-ingress get deployments/router-default -o yaml


Actual results:

No startup probe is defined.


Expected results:

A startup probe should be defined on the "router" container:

          startupProbe:
            failureThreshold: 120
            httpGet:
              path: /healthz/ready
              port: 1936
            periodSeconds: 1

Comment 2 Arvind iyengar 2021-04-06 05:12:39 UTC

Verified in "4.8.0-0.nightly-2021-04-05-174735" release version. With this payload, it is observed that the router deployment now includes the "startup" probe:
-------

oc get clusterversion              
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-05-174735   True        False         48m     Cluster version is 4.8.0-0.nightly-2021-04-05-174735


oc -n openshift-ingress get deployments/router-default -o yaml
        startupProbe:
          failureThreshold: 120
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1
-------

Comment 5 errata-xmlrpc 2021-07-27 22:56:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438