Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1898933

Summary: set "protectKernelDefaults" by default to "true"
Product: OpenShift Container Platform Reporter: Juan Antonio Osorio <josorior>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, knewcome, mkalinin, rphillips, sreber, travier
Version: 4.7Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-23 14:30:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Antonio Osorio 2020-11-18 10:54:43 UTC
Description of problem:

Currently, the "protectKernelDefaults" flag in the kubelet is left unset. This flag by default is "false" which enables the kubelet to change sysctl's to what it needs. These sysctl's are not many. Namely, they are these:

```
kernel.keys.root_maxbytes=25000000
kernel.keys.root_maxkeys=1000000
kernel.panic=10
kernel.panic_on_oops=1
vm.overcommit_memory=1
vm.panic_on_oom=0
```

While this works just fine, the CIS benchmark's control 4.2.6 recommends setting this to true.

This means that the Kubelet will expect the values to be set already set beforehand, else, the kubelet will error out.

Ideally, we should:

* Include these sysctl's as defaults in RHCOS (via the hyperkube rpm, maybe?)
* Set `protectKernelDefaults` to `true` in the default kubelet configuration
* Ensure that the kubelet is not able to set sysctl's in general (by setting `ProtectKernelTunables=true` in the kubelet's systemd unit... thought I'm not sure if this is currently possible)

This would improve our security by limiting one attack vector in case of a kubelet compromise.


Version-Release number of selected component (if applicable):

* All

How reproducible:

* Always

Steps to Reproduce:
1. Deploy an OpenShift cluster
2. Look at /etc/kubernetes/kubelet.conf on any host

Actual results:

$ oc debug node/$NODE -- grep protectKernelDefaults /host/etc/kubernetes/kubelet.conf
Creating debug namespace/openshift-debug-node-jkmvf ...
Starting pod/ip-10-0-128-17ec2internal-debug ...
To use host binaries, run `chroot /host`

Removing debug pod ...
Removing debug namespace/openshift-debug-node-jkmvf ...
error: non-zero exit code from debug container


Expected results:
$ oc debug node/$NODE -- grep protectKernelDefaults /host/etc/kubernetes/kubelet.conf
Creating debug namespace/openshift-debug-node-ltvnd ...
Starting pod/ip-10-0-137-231ec2internal-debug ...
To use host binaries, run `chroot /host`
  "protectKernelDefaults": true,

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ltvnd ...

Additional info:

Currently, it takes a two-step process to be able to set this parameter:

* First: Create a MachineConfig object that sets the sysctl's (Wait for the pools to update)
* Second: Create a Kubeletconfig object that sets the parameter

It two steps because in cases of applying this as a day-2 operation, the kubelet runs before the MachineConfigDaemon, and so, the Kubelet will fail to start since it won't be able to set the sysctl's... and these wouldn't be set yet by the MCD.

I tried to document the whole process here: https://jaosorior.dev/2020/protectkerneldefaults-in-openshift/

Comment 1 Ryan Phillips 2020-11-20 15:59:27 UTC
I do not believe we should do this. There are other authoritative components in the system that set the kernel defaults. RHCOS and Node Tuning Operator are the documented components to be changing sysctl tunables.

Comment 2 Juan Antonio Osorio 2020-11-21 10:05:53 UTC
But, wouldn't they still be able to do this since they're handled by CRI-O? CRI-O wouldn't be limited by this flag. On the other hand, `ProtectKernelTunables=true` is merely a code flag for the kubelet not to try to set those defaults. Real protection against such changes should be done on the systemd side by changing the kubelet's service unit.

Comment 3 Ryan Phillips 2020-11-23 14:30:40 UTC
Currently, Kubelet sets the sysctls to what it needs to start up. I do not believe we would enable this option since it creates a dependency on the correct syctls to be installed on the system. Setting the sysctls on Metal or bring-your-own RHEL would require user intervention to make sure the settings were configured correctly. Upgrades from older versions of Openshift could potentially break as well.

Comment 4 Juan Antonio Osorio 2020-11-23 14:33:37 UTC
If the sysctl's would be delivered via the hyperkube rpm, and installed via the systemd unit, this would be feasible and work even in BYO-RHEL environments.

Comment 5 Timothée Ravier 2020-11-23 14:57:42 UTC
I think that shipping the right sysctl values in an RPM for them to be applied at boot time by systemd is more reliable and discover-able that relying on the kubelet setting them later from a list hardcoded in code.
This is even more interesting in the RHCOS case where are exist only to set good defaults for OCP.
So even if the systemd restrictions and the kubelet config changes were not to be kept (and left for CIS benchmarks users to easily apply via a MC), I think setting the sysctls in an RPM is an improvement.

Comment 6 Ryan Phillips 2020-11-23 15:11:45 UTC
There are two values that are set by the kubelet that are not the default:

Nov 23 15:05:56 test1-hdrwh-bootstrap hyperkube[2223]: I1123 15:05:56.874595    2223 container_manager_linux.go:437] Updating kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0
Nov 23 15:05:56 test1-hdrwh-bootstrap hyperkube[2223]: I1123 15:05:56.874694    2223 container_manager_linux.go:437] Updating kernel flag: kernel/panic, expected value: 10, actual value: 0

Comment 9 Timothée Ravier 2022-10-19 15:42:31 UTC
Making each customer do the work of figuring out which sysctl to set to which value and updating them for each OCP release to have them meet external security requirements is not a great user experience.
I think we should do that by default in RHCOS so that it's set to a good default value and tested in CI.

Comment 10 Red Hat Bugzilla 2023-09-18 00:23:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days