2076193 – oc patch command for the liveness probe and readiness probe parameters of an OpenShift router deployment doesn't take effect

Bug 2076193 - oc patch command for the liveness probe and readiness probe parameters of an OpenShift router deployment doesn't take effect

Summary: oc patch command for the liveness probe and readiness probe parameters of an ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Miciah Dashiel Butler Masters
QA Contact:	Shudi Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-18 07:38 UTC by Shudi Li
Modified:	2022-08-10 11:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:07:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 742	0	None	open	Bug 2076193: Remove special-casing for default ingresscontroller probe parameters	2022-04-19 18:42:32 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:07:44 UTC

Description Shudi Li 2022-04-18 07:38:05 UTC

Description of problem: The default timeout of liveness probe and readiness probe is 1s, try to modify it to 5s by the oc patch command, but it doesn't take effect. And the SCC warning message appears while executing the oc command.


OpenShift release version:
4.11.0-0.nightly-2022-04-16-163450

Cluster Platform:


How reproducible:
execute the oc patch command below, and try to see the changes in router deployment and router pod in openshift-ingress namespace

oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'

Steps to Reproduce (in detail):
1. Try to change the timeout to 5s by the oc patch command

% oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "router" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/router-default patched
% 

2. The timeout is still 1s when oc describe deploy/router-default
% oc -n openshift-ingress describe deploy/router-default | grep -e Liveness: -e Readiness:
    Liveness:   http-get http://:1936/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz/ready delay=0s timeout=1s period=10s #success=1 #failure=3
% 

3. check the yaml file of deploy/router-default
% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 template:
  template:
    metadata:
      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
        unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: "10"     <------
      creationTimestamp: null
      labels:
        ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
        ingresscontroller.operator.openshift.io/hash: 6d57464bc8
% 

% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 nessProbe:
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
--
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
%

4. Check the router pod
% oc -n openshift-ingress get pods
NAME                              READY   STATUS        RESTARTS   AGE
router-default-5944d9f999-zjsp6   1/1     Terminating   0          8m51s
router-default-5d6fd94455-9wdkn   1/1     Running       0          92m
router-default-5d6fd94455-ftcbc   1/1     Running       0          8m50s
%

% oc -n openshift-ingress get pod router-default-5d6fd94455-ftcbc -o yaml | grep -A8 nessProbe:
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
--
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz/ready
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
%

5.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-16-163450   True        False         6h      Cluster version is 4.11.0-0.nightly-2022-04-16-163450
%

Actual results:
The timeout of liveness probe and readiness probe is 1s.

Expected results:
The timeout of liveness probe and readiness probe is 5s.

Impact of the problem:


Additional info:



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 Shudi Li 2022-04-18 10:14:41 UTC

Can't list the security context constraint of router pod by the scc-subject-review command
1.
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-5d6fd94455-9wdkn   1/1     Running   0          4h5m
router-default-5d6fd94455-ftcbc   1/1     Running   0          161m
%

2.
% oc -n openshift-ingress get pod router-default-5d6fd94455-9wdkn | oc adm policy scc-subject-review -f -
unable to decode "STDIN": couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
%

Comment 6 Shudi Li 2022-04-25 02:54:31 UTC

Verified it with 4.11.0-0.nightly-2022-04-24-135651

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-24-135651   True        False         70m     Cluster version is 4.11.0-0.nightly-2022-04-24-135651
shudi@Shudis-MacBook-Pro 410 % oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "router" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/router-default patched
%

2. check timeout of liveness probe and readiness probe in deploy/router-default event, it is 5s.
% oc -n openshift-ingress describe deploy/router-default | grep -e Liveness: -e Readiness:
    Liveness:   http-get http://:1936/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz/ready delay=0s timeout=5s period=10s #success=1 #failure=3
% 

3. check timeoutSeconds of liveness probe and readiness probe in deploy/router-default, it is 5.
% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 nessProbe:
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
--
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
% 

4.
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-56d5fcbbdf-g64tn   1/1     Running   0          2m48s
router-default-56d5fcbbdf-p6n9h   1/1     Running   0          2m48s
% 

5. check timeoutSeconds of liveness probe and readiness probe in a router pod, it is 5.
% oc -n openshift-ingress get pod router-default-56d5fcbbdf-g64tn -o yaml | grep -A8 nessProbe:
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
--
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz/ready
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
%

Comment 8 errata-xmlrpc 2022-08-10 11:07:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.