Bug 1943845 - Router pods should have startup probes configured
Summary: Router pods should have startup probes configured
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks: 1996897
TreeView+ depends on / blocked
 
Reported: 2021-03-27 20:10 UTC by Miciah Dashiel Butler Masters
Modified: 2022-08-04 22:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:56:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 583 0 None open Bug 1943845: Add startup probe to the router deployment 2021-03-27 20:14:39 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:56:36 UTC

Description Miciah Dashiel Butler Masters 2021-03-27 20:10:40 UTC
Description of problem:

Router pods should specify startup probes for the router container.  Without a startup probe, the kubelet starts performing liveness probes after the liveness probe's initial delay of 10 seconds, and if the router takes a long time to synchronize (for example because the cluster has an extremely large number of routes or endpoints), the liveness probe can cause the kubelet to restart the container before the router has even finished its initial synchronization.  

The deployment should specify a startup probe that allows a generous amount of time (for example, 2 minutes) to give the router time to start up if initial synchronization takes a substantial amount of time but still have the kubelet start performing liveness and readiness probes quickly if the initial synchronization is quick.  


Version-Release number of selected component (if applicable):

Startup probes graduated to beta in Kubernetes 1.18 (OpenShift 4.6) and to stable in Kubernetes 1.20.  See <https://github.com/kubernetes/enhancements/blob/c1cec820b3b3d0fa18dede73107a2cbb43e27e33/keps/sig-node/950-liveness-probe-holdoff/README.md#implementation-history>.


How reproducible:

100%.


Steps to Reproduce:

1. Check the default router deployment's definition:

    oc -n openshift-ingress get deployments/router-default -o yaml


Actual results:

No startup probe is defined.


Expected results:

A startup probe should be defined on the "router" container:

          startupProbe:
            failureThreshold: 120
            httpGet:
              path: /healthz/ready
              port: 1936
            periodSeconds: 1

Comment 2 Arvind iyengar 2021-04-06 05:12:39 UTC
Verified in "4.8.0-0.nightly-2021-04-05-174735" release version. With this payload, it is observed that the router deployment now includes the "startup" probe:
-------

oc get clusterversion              
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-05-174735   True        False         48m     Cluster version is 4.8.0-0.nightly-2021-04-05-174735


oc -n openshift-ingress get deployments/router-default -o yaml
        startupProbe:
          failureThreshold: 120
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1
-------

Comment 5 errata-xmlrpc 2021-07-27 22:56:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.