1934400 – [ocp_4][4.6][apiserver-auth] OAuth API servers are not ready - PreconditionNotReady

Bug 1934400 - [ocp_4][4.6][apiserver-auth] OAuth API servers are not ready - PreconditionNotReady

Summary: [ocp_4][4.6][apiserver-auth] OAuth API servers are not ready - PreconditionNo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Standa Laznicka
QA Contact:	pmali
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1967359 1989060
TreeView+	depends on / blocked

Reported:	2021-03-03 07:12 UTC by Vincent Lours
Modified:	2024-10-01 17:36 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: A custom SCC that contains an unlikely, but possible combination of `defaultAllowPrivilegeEscalation: false` and `allowPrivilegedContainer: true` fields was causing the privileged openshift-apiserver and oauth-apiserver pods to fail as the SCC mutates the pods to a state that fails API validation. Consequence: openshift-apiserver and oauth-apiserver pods would be prevented from starting, which might cause an outage on the OpenShift APIs. Fix: Make the security context mutator ignore the `defaultAllowPrivilegeEscalation` field on containers that are already privileged. Result: A custom SCC should no longer be able to block privileged pods from starting.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:49:27 UTC
Target Upstream Version:
Embargoed:
Flags:	rkshirsa: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift apiserver-library-go pull 47	None	open	Bug 1934400: scc-admission: don't apply defaultAllowPrivilegeEscalation:false when container is privileged	2021-04-12 12:49:12 UTC
Github	openshift kubernetes pull 673	None	open	Bug 1934400: bump(apiserver-library-go): scc-admission: don't apply defaultAllowPrivilegeEscalation:false when container...	2021-04-16 08:30:11 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:51:00 UTC

Comment 10 Standa Laznicka 2021-03-03 12:04:13 UTC

(from a private comment)
> ~~~
>    - lastTransitionTime: "2021-03-02T06:33:06Z"
>      lastUpdateTime: "2021-03-02T06:33:06Z"
>      message: 'Pod "apiserver-d476db957-9csf9" is invalid: [spec.containers[0].securityContext: Invalid value: core.SecurityContext{Capabilities:(*core.Capabilities)(nil), Privileged:(*bool)(0xc0702fc836), SELinuxOptions:(*core.SELinuxOptions)(nil), WindowsOptions:(*core.WindowsSecurityContextOptions)(nil), RunAsUser:(*int64)(nil), RunAsGroup:(*int64)(nil), RunAsNonRoot:(*bool)(nil), ReadOnlyRootFilesystem:(*bool)(nil), AllowPrivilegeEscalation:(*bool) (0xc0702fc39c), ProcMount:(*core.ProcMountType)(nil), SeccompProfile:(*core.SeccompProfile)(nil)}: cannot set `allowPrivilegeEscalation` to false and `privileged` to true, spec.initContainers[0].securityContext: Invalid value: core.SecurityContext{Capabilities:(*core.Capabilities)(nil), Privileged:(*bool)(0xc0702fc835), SELinuxOptions:(*core.SELinuxOptions)(nil), WindowsOptions:(*core.WindowsSecurityContextOptions)(nil), RunAsUser:(*int64)(nil), RunAsGroup:(*int64)(nil), RunAsNonRoot:(*bool)(nil), ReadOnlyRootFilesystem:(*bool)(nil), AllowPrivilegeEscalation:(*bool)(0xc0702fc39c), ProcMount:(*core.ProcMountType)(nil), SeccompProfile:(*core.SeccompProfile)(nil)}: cannot set `allowPrivilegeEscalation` to false and `privileged` to true]'
>      reason: FailedCreate
>      status: "True"
>      type: ReplicaFailure
> ~~~

Did they add any SCC that would allow privileged pods but would still cause `allowPrivilegeEscalation` to default to false? Are there custom SCCs in this cluster, even if possibly deployed by a 3rd party product?

Comment 11 Standa Laznicka 2021-03-03 12:08:46 UTC

nvm, I see we already have SCCs in the must-gather. I'll check those.

Comment 12 Standa Laznicka 2021-03-03 12:12:40 UTC

Can you please get me the audit logs for the cluster?

Comment 13 Standa Laznicka 2021-03-03 14:36:10 UTC

Actually, I can see the issue, I overlooked a small detail.

The "vulnerability-advisor-scc" is to blame here. It apparently matches the pod's security context, and while it configures `allowPrivilegedContainer`, it also has `defaultAllowPrivilegeEscalation: false` and `priority: 1` which causes this behavior.

We can fix this, but to work around this behavior, either remove the `vulnerability-advisor-scc` SCC, or remove its `defaultAllowPrivilegeEscalation: false` field.

I assume this is not only going to be a problem for the oauth-apiserver, but for openshift-apiserver pods, too.

Comment 14 Vincent Lours 2021-03-04 04:11:04 UTC

Hi Standa,

Thank you so much for the workaround.
In addition of removing the `defaultAllowPrivilegeEscalation: false` field in the `vulnerability-advisor-scc` SCC, we also had to remove the same field in the following SCC:
- mutation-advisor-scc
- management-restricted

I've reached IBM to check with them if the SCCs are part of the MCM core installation.

Do you think there is something that should be changed in the `openshift-oauth-apiserver` config to avoid it being impacted by a `defaultAllowPrivilegeEscalation: false` field in an SCC?

In addition to that, do you think I should create a new BZ for the missing `openshift-oauth-apiserver` from the must-gather?

Thanks again for your support.

Cheers,
Vincent

Comment 16 Standa Laznicka 2021-03-04 08:32:50 UTC

I don't think you need to be concerned with SCCs that set `defaultAllowPrivilegeEscalation: false` if they're not setting `allowPrivilegedContainer` to `true`.

> In addition to that, do you think I should create a new BZ for the missing `openshift-oauth-apiserver` from the must-gather?
I am not sure about the must-gather thing being a bug.

> Do you think there is something that should be changed in the `openshift-oauth-apiserver` config to avoid it being impacted by a `defaultAllowPrivilegeEscalation: false` field in an SCC?
I'll open a PR that should fix this.

Comment 18 Vincent Lours 2021-03-05 02:59:39 UTC

The KCS 5859251[1] has been created to address the issue and provide the workaround.

[1] https://access.redhat.com/solutions/5859251

Comment 29 Standa Laznicka 2021-04-30 10:04:30 UTC

Vincent, I don't see those failing pods anywhere in the must-gather and you did not share the pods' statuses. When you get them and if they match the description from comment 10, then this fix is going to cover it. Otherwise it might be a different bugzilla.

Comment 47 errata-xmlrpc 2021-07-27 22:49:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.