Bug 2007246 - Openshift Container Platform - Ingress Controller does not set allowPrivilegeEscalation in the router deployment
Summary: Openshift Container Platform - Ingress Controller does not set allowPrivilege...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 4.11.0
Assignee: Chad Scribner
QA Contact: Shudi Li
URL:
Whiteboard:
Depends On:
Blocks: 2079034
TreeView+ depends on / blocked
 
Reported: 2021-09-23 11:56 UTC by Simon Reber
Modified: 2022-12-15 09:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The default IngressController Deployment creates a container named "router" without requesting sufficient permissions in the `securityContext` of the container. Consequence: Normally, this will not cause an issue but in cases where clusters have a Security Context Constraint (SCC) that's similar enough to the hostnetwork SCC could result in router pods failing to start. Fix: Set `allowPrivilegeEscalation: true` in the `router` container's `securityContext` to ensure that it matches the default hostnetwork SCC. Result: The router pods will be admitted to the correct SCC and be created without error.
Clone Of:
: 2079034 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:37:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 718 0 None Merged Bug 2007246: Ingress Controller does not set allowPrivilegeEscalation in the router deployment 2022-04-19 17:35:15 UTC
Github openshift cluster-ingress-operator pull 741 0 None Merged Revert "Bug 2007246: Ingress Controller does not set allowPrivilegeEscalation in the router deployment" 2022-04-19 17:35:17 UTC
Github openshift cluster-ingress-operator pull 743 0 None Merged Bug 2007246: Add allowPrivilegeEscalation to the router container 2022-04-26 16:31:21 UTC
Red Hat Knowledge Base (Solution) 6355842 0 None None None 2021-09-23 12:31:54 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:38:04 UTC

Description Simon Reber 2021-09-23 11:56:58 UTC
Description of problem:

According to "Red Hat OpenShift 4 Hardening Guide v1.1" (attached - see 5.2.5 Minimize admission of containers with Allow Privilege Escalation set to true and SELinux context set to RunAsAny (Manual), `allowPrivilegeEscalation` should be set to `false` in a customer specific Security Context Constrain (SCC) to have as many application related pod running with `allowPrivilegeEscalation` set to `false`. Application requiring `allowPrivilegeEscalation` set to `true` should either specify this in the `deployment` to have the default `restricted` SCC selected or else provide/use a specific SCC to address their use-case.

Thus when creating a custom restricted SCC with `allowPrivilegeEscalation` set to `false`, the `router` is unable to run (not really sure why). But it seems to rely on `no_new_privs` flag. Since this is the case, the `IngressController` should actually set `allowPrivilegeEscalation` to `true` in the `securityContext` of the `deployment` to make sure it always picks the `restricted` SCC which has `allowPrivilegeEscalation` set to `true`.

Alternative, the `IngressController` respectively `IngressOperator` could provide a specific `scc` and link that with the `router` `ServiceAccount` to run it with the given SCC which would make it more independent and not cause things to fail if more restrictive SCC are being created by the customer.

Take a look at https://docs.openshift.com/container-platform/4.8/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth to understand how the SCC is selected if nothing is defined.


OpenShift release version:

 - OpenShift Container Platform 4.9.0-rc.1, 4.8.*, 4.7.*

Cluster Platform:

 - Any

How reproducible:

 - Always

Steps to Reproduce (in detail):
1. Create a SCC as attached
2. Restart the `router` pod and see how it's failing to start


Actual results:

If a SCC is selected with `allowPrivilegeEscalation` set to `false` it will fail to start and log the following error.

> [NOTICE] 265/114634 (19) : haproxy version is 2.2.15-5e8f49d
> [NOTICE] 265/114634 (19) : path to executable is /usr/sbin/haproxy
> [ALERT] 265/114634 (19) : Starting frontend public: cannot bind socket [0.0.0.0:80]
> [ALERT] 265/114634 (19) : Starting frontend public_ssl: cannot bind socket [0.0.0.0:443]
> E0923 11:46:58.907432       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:47:02.963872       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:47:28.874274       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:47:32.961877       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:47:58.878652       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:48:02.971619       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> E0923 11:48:28.881073       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused
> I0923 11:48:28.983101       1 template.go:704] router "msg"="Shutdown requested, waiting 45s for new connections to cease"  
> E0923 11:48:32.962518       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused

Expected results:

The `router` should either be able to run with `allowPrivilegeEscalation` set to `false` or else specify that requirement in the `Deployment` or provide it's own specific SCC to prevent issues when customers are creating a more restricted SCC that what is provided by default.

Impact of the problem:

Router is failing to start and thus won't work. In worse case, the environment could completely become unavailable because routers are not working as expected and are selecting the most restrictive SCC even though they require certain capabilities.

Additional info:

Comment 3 Miciah Dashiel Butler Masters 2021-09-23 16:17:00 UTC
Out of curiosity, did you test on OpenShift 4.6 and determine that the issue does not affect it, or have you just not tested on OpenShift 4.6?

Comment 4 Simon Reber 2021-09-23 18:01:00 UTC
(In reply to Miciah Dashiel Butler Masters from comment #3)
> Out of curiosity, did you test on OpenShift 4.6 and determine that the issue
> does not affect it, or have you just not tested on OpenShift 4.6?
Sorry, I only checked on 4.9-rc as well as on 4.8 and 4.7 - but I suspect the behavior was always like that and therefore OpenShift Container Platform 4 in general is affected.

Comment 31 Shudi Li 2022-04-27 05:57:48 UTC
Verified it with 4.11.0-0.nightly-2022-04-26-181148: 
1. securityContext with allowPrivilegeEscalation true is added to the deployment/router-default
a,
% oc -n openshift-ingress get deployment.apps/router-default -o yaml | grep -i -A1 securityContext
        securityContext:
          allowPrivilegeEscalation: true
--
      securityContext: {}
      serviceAccount: router
%
b,
% oc -n openshift-ingress get pod/router-default-54c658ddc-8r99h -o yaml | grep -i -A1 securityContext
    securityContext:
      allowPrivilegeEscalation: true
--
  securityContext:
    fsGroup: 1000590000
410 % 

2. Create the customer SCC, delete a router pod, and a new router pod can be created successfully(Same as bug comment 23)

3. Create an ingress-controller, securityContext with allowPrivilegeEscalation true is also added to its deployment
%oc -n openshift-ingress get deployment.apps/router-internalapps2   -o yaml | grep -i -A1 securityContext     
        securityContext:
          allowPrivilegeEscalation: true
--
      securityContext: {}
      serviceAccount: router
%

4. Try to modify deployment/router-default with securityContext/allowPrivilegeEscalation false, the ingress-controller will revert it to true. A router pod is terminated and a new one is created.
a,
%oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-54c658ddc-b9sdx   1/1     Running   0          4h28m
router-default-54c658ddc-zdxpq   1/1     Running   0          137m
%

b, Edit deployment/router-default and try to configure spec\containers\securityContext\allowPrivilegeEscalation with false
% oc -n openshift-ingress edit deployment/router-default
Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/router-default edited
% 

c,
% oc -n openshift-ingress get all                       
NAME                                  READY   STATUS        RESTARTS   AGE
pod/router-default-54c658ddc-8t6hr    0/1     Pending       0          36s
pod/router-default-54c658ddc-b9sdx    1/1     Running       0          4h35m
pod/router-default-54c658ddc-zdxpq    1/1     Terminating   0          144m
pod/router-default-58dc79958d-rdwv2   1/1     Terminating   0          37s

NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
service/router-default            LoadBalancer   172.30.243.173   34.122.142.53   80:32169/TCP,443:30398/TCP   4h35m
service/router-internal-default   ClusterIP      172.30.99.215    <none>          80/TCP,443/TCP,1936/TCP      4h35m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/router-default   1/2     2            1           4h35m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/router-default-54c658ddc    2         2         1       4h35m
replicaset.apps/router-default-58dc79958d   0         0         0       149m
% 

d, 
% oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-54c658ddc-8t6hr   1/1     Running   0          5m13s
router-default-54c658ddc-b9sdx   1/1     Running   0          4h40m
%

Comment 33 errata-xmlrpc 2022-08-10 10:37:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.