Hide Forgot
Description of problem: According to "Red Hat OpenShift 4 Hardening Guide v1.1" (attached - see 5.2.5 Minimize admission of containers with Allow Privilege Escalation set to true and SELinux context set to RunAsAny (Manual), `allowPrivilegeEscalation` should be set to `false` in a customer specific Security Context Constrain (SCC) to have as many application related pod running with `allowPrivilegeEscalation` set to `false`. Application requiring `allowPrivilegeEscalation` set to `true` should either specify this in the `deployment` to have the default `restricted` SCC selected or else provide/use a specific SCC to address their use-case. Thus when creating a custom restricted SCC with `allowPrivilegeEscalation` set to `false`, the `router` is unable to run (not really sure why). But it seems to rely on `no_new_privs` flag. Since this is the case, the `IngressController` should actually set `allowPrivilegeEscalation` to `true` in the `securityContext` of the `deployment` to make sure it always picks the `restricted` SCC which has `allowPrivilegeEscalation` set to `true`. Alternative, the `IngressController` respectively `IngressOperator` could provide a specific `scc` and link that with the `router` `ServiceAccount` to run it with the given SCC which would make it more independent and not cause things to fail if more restrictive SCC are being created by the customer. Take a look at https://docs.openshift.com/container-platform/4.8/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth to understand how the SCC is selected if nothing is defined. OpenShift release version: - OpenShift Container Platform 4.9.0-rc.1, 4.8.*, 4.7.* Cluster Platform: - Any How reproducible: - Always Steps to Reproduce (in detail): 1. Create a SCC as attached 2. Restart the `router` pod and see how it's failing to start Actual results: If a SCC is selected with `allowPrivilegeEscalation` set to `false` it will fail to start and log the following error. > [NOTICE] 265/114634 (19) : haproxy version is 2.2.15-5e8f49d > [NOTICE] 265/114634 (19) : path to executable is /usr/sbin/haproxy > [ALERT] 265/114634 (19) : Starting frontend public: cannot bind socket [0.0.0.0:80] > [ALERT] 265/114634 (19) : Starting frontend public_ssl: cannot bind socket [0.0.0.0:443] > E0923 11:46:58.907432 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:47:02.963872 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:47:28.874274 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:47:32.961877 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:47:58.878652 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:48:02.971619 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > E0923 11:48:28.881073 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused > I0923 11:48:28.983101 1 template.go:704] router "msg"="Shutdown requested, waiting 45s for new connections to cease" > E0923 11:48:32.962518 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused Expected results: The `router` should either be able to run with `allowPrivilegeEscalation` set to `false` or else specify that requirement in the `Deployment` or provide it's own specific SCC to prevent issues when customers are creating a more restricted SCC that what is provided by default. Impact of the problem: Router is failing to start and thus won't work. In worse case, the environment could completely become unavailable because routers are not working as expected and are selecting the most restrictive SCC even though they require certain capabilities. Additional info:
Out of curiosity, did you test on OpenShift 4.6 and determine that the issue does not affect it, or have you just not tested on OpenShift 4.6?
(In reply to Miciah Dashiel Butler Masters from comment #3) > Out of curiosity, did you test on OpenShift 4.6 and determine that the issue > does not affect it, or have you just not tested on OpenShift 4.6? Sorry, I only checked on 4.9-rc as well as on 4.8 and 4.7 - but I suspect the behavior was always like that and therefore OpenShift Container Platform 4 in general is affected.
Verified it with 4.11.0-0.nightly-2022-04-26-181148: 1. securityContext with allowPrivilegeEscalation true is added to the deployment/router-default a, % oc -n openshift-ingress get deployment.apps/router-default -o yaml | grep -i -A1 securityContext securityContext: allowPrivilegeEscalation: true -- securityContext: {} serviceAccount: router % b, % oc -n openshift-ingress get pod/router-default-54c658ddc-8r99h -o yaml | grep -i -A1 securityContext securityContext: allowPrivilegeEscalation: true -- securityContext: fsGroup: 1000590000 410 % 2. Create the customer SCC, delete a router pod, and a new router pod can be created successfully(Same as bug comment 23) 3. Create an ingress-controller, securityContext with allowPrivilegeEscalation true is also added to its deployment %oc -n openshift-ingress get deployment.apps/router-internalapps2 -o yaml | grep -i -A1 securityContext securityContext: allowPrivilegeEscalation: true -- securityContext: {} serviceAccount: router % 4. Try to modify deployment/router-default with securityContext/allowPrivilegeEscalation false, the ingress-controller will revert it to true. A router pod is terminated and a new one is created. a, %oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-54c658ddc-b9sdx 1/1 Running 0 4h28m router-default-54c658ddc-zdxpq 1/1 Running 0 137m % b, Edit deployment/router-default and try to configure spec\containers\securityContext\allowPrivilegeEscalation with false % oc -n openshift-ingress edit deployment/router-default Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") deployment.apps/router-default edited % c, % oc -n openshift-ingress get all NAME READY STATUS RESTARTS AGE pod/router-default-54c658ddc-8t6hr 0/1 Pending 0 36s pod/router-default-54c658ddc-b9sdx 1/1 Running 0 4h35m pod/router-default-54c658ddc-zdxpq 1/1 Terminating 0 144m pod/router-default-58dc79958d-rdwv2 1/1 Terminating 0 37s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/router-default LoadBalancer 172.30.243.173 34.122.142.53 80:32169/TCP,443:30398/TCP 4h35m service/router-internal-default ClusterIP 172.30.99.215 <none> 80/TCP,443/TCP,1936/TCP 4h35m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/router-default 1/2 2 1 4h35m NAME DESIRED CURRENT READY AGE replicaset.apps/router-default-54c658ddc 2 2 1 4h35m replicaset.apps/router-default-58dc79958d 0 0 0 149m % d, % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-54c658ddc-8t6hr 1/1 Running 0 5m13s router-default-54c658ddc-b9sdx 1/1 Running 0 4h40m %
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069