Bug 2076193 - oc patch command for the liveness probe and readiness probe parameters of an OpenShift router deployment doesn't take effect
Summary: oc patch command for the liveness probe and readiness probe parameters of an ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Shudi Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-18 07:38 UTC by Shudi Li
Modified: 2022-08-10 11:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:07:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 742 0 None open Bug 2076193: Remove special-casing for default ingresscontroller probe parameters 2022-04-19 18:42:32 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:07:44 UTC

Description Shudi Li 2022-04-18 07:38:05 UTC
Description of problem: The default timeout of liveness probe and readiness probe is 1s, try to modify it to 5s by the oc patch command, but it doesn't take effect. And the SCC warning message appears while executing the oc command.


OpenShift release version:
4.11.0-0.nightly-2022-04-16-163450

Cluster Platform:


How reproducible:
execute the oc patch command below, and try to see the changes in router deployment and router pod in openshift-ingress namespace

oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'

Steps to Reproduce (in detail):
1. Try to change the timeout to 5s by the oc patch command

% oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "router" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/router-default patched
% 

2. The timeout is still 1s when oc describe deploy/router-default
% oc -n openshift-ingress describe deploy/router-default | grep -e Liveness: -e Readiness:
    Liveness:   http-get http://:1936/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz/ready delay=0s timeout=1s period=10s #success=1 #failure=3
% 

3. check the yaml file of deploy/router-default
% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 template:
  template:
    metadata:
      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
        unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: "10"     <------
      creationTimestamp: null
      labels:
        ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
        ingresscontroller.operator.openshift.io/hash: 6d57464bc8
% 

% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 nessProbe:
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
--
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
%

4. Check the router pod
% oc -n openshift-ingress get pods
NAME                              READY   STATUS        RESTARTS   AGE
router-default-5944d9f999-zjsp6   1/1     Terminating   0          8m51s
router-default-5d6fd94455-9wdkn   1/1     Running       0          92m
router-default-5d6fd94455-ftcbc   1/1     Running       0          8m50s
%

% oc -n openshift-ingress get pod router-default-5d6fd94455-ftcbc -o yaml | grep -A8 nessProbe:
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
--
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz/ready
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
%

5.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-16-163450   True        False         6h      Cluster version is 4.11.0-0.nightly-2022-04-16-163450
%

Actual results:
The timeout of liveness probe and readiness probe is 1s.

Expected results:
The timeout of liveness probe and readiness probe is 5s.

Impact of the problem:


Additional info:



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 Shudi Li 2022-04-18 10:14:41 UTC
Can't list the security context constraint of router pod by the scc-subject-review command
1.
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-5d6fd94455-9wdkn   1/1     Running   0          4h5m
router-default-5d6fd94455-ftcbc   1/1     Running   0          161m
%

2.
% oc -n openshift-ingress get pod router-default-5d6fd94455-9wdkn | oc adm policy scc-subject-review -f -
unable to decode "STDIN": couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
%

Comment 6 Shudi Li 2022-04-25 02:54:31 UTC
Verified it with 4.11.0-0.nightly-2022-04-24-135651

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-24-135651   True        False         70m     Cluster version is 4.11.0-0.nightly-2022-04-24-135651
shudi@Shudis-MacBook-Pro 410 % oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}'
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "router" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/router-default patched
%

2. check timeout of liveness probe and readiness probe in deploy/router-default event, it is 5s.
% oc -n openshift-ingress describe deploy/router-default | grep -e Liveness: -e Readiness:
    Liveness:   http-get http://:1936/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz/ready delay=0s timeout=5s period=10s #success=1 #failure=3
% 

3. check timeoutSeconds of liveness probe and readiness probe in deploy/router-default, it is 5.
% oc -n openshift-ingress get  deploy/router-default -o yaml | grep -A8 nessProbe:
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
--
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz/ready
            port: 1936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
% 

4.
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-56d5fcbbdf-g64tn   1/1     Running   0          2m48s
router-default-56d5fcbbdf-p6n9h   1/1     Running   0          2m48s
% 

5. check timeoutSeconds of liveness probe and readiness probe in a router pod, it is 5.
% oc -n openshift-ingress get pod router-default-56d5fcbbdf-g64tn -o yaml | grep -A8 nessProbe:
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
--
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz/ready
        port: 1936
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
%

Comment 8 errata-xmlrpc 2022-08-10 11:07:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.