Bug 2091546

Summary: Machine config pool paused when trying to apply remediation after applying machine config for kubeletconfig
Product: OpenShift Container Platform Reporter: xiyuan
Component: Compliance OperatorAssignee: Andrew Taylor <antaylor>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 4.11CC: antaylor, jhrozek, lbragsta, mrogers, suprs, wenshen, xiyuan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-05 12:51:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xiyuan 2022-05-30 09:52:41 UTC
*Description of problem:*
If a user applied the kubeletconfig, and then try to apply remediation for ocp4-cis and ocp4-cis-node profiles, the scan could be done successfully. However, the machine config pool will be paused, no remediation will be triggered. In compliance-operator logs, it reported “Waiting until all kubeletconfigs are rendered before un-pausing”.
$ oc label machineconfigpool worker cis-hardening=true
machineconfigpool.machineconfiguration.openshift.io/worker labeled
$ oc label machineconfigpool master cis-hardening=true
machineconfigpool.machineconfiguration.openshift.io/master labeled
$ oc apply -f -<<EOF
> apiVersion: machineconfiguration.openshift.io/v1 
> kind: KubeletConfig
> metadata:
>   name: cis-hardening
> spec:
>   machineConfigPoolSelector:
>     matchLabels:
>       cis-hardening: "true"
>   kubeletConfig:
>     eventRecordQPS: 5
>     tlsCipherSuites:
>     - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
>     - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>     - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>     - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
>     protectKernelDefaults: false
>     evictionSoftGracePeriod:
>       memory.available:  "5m"
>       nodefs.available:  "5m"
>       nodefs.inodesFree: "5m"
>       imagefs.available: "5m"
>     evictionHard:
>       memory.available:  "100Mi"
>       nodefs.available:  "10%"
>       nodefs.inodesFree: "5%"
>       imagefs.available: "15%"
>     evictionSoft:
>       memory.available:  "100Mi"
>       nodefs.available:  "10%"
>       nodefs.inodesFree: "5%"
>       imagefs.available: "15%"
> EOF
kubeletconfig.machineconfiguration.openshift.io/cis-hardening created
*Version-Release number of selected components (if applicable):*
4.11.0-0.nightly-2022-05-25-193227 + compliance-operator.v0.1.52

*How reproducible:*
 Always

*Steps to Reproduce:*
1. Install compliance operator 0.1.52-2
2. Create KubeletConfig:
$ oc label machineconfigpool worker cis-hardening=true
machineconfigpool.machineconfiguration.openshift.io/worker labeled
$ oc label machineconfigpool master cis-hardening=true
machineconfigpool.machineconfiguration.openshift.io/master labeled
$ oc apply -f -<<EOF
> apiVersion: machineconfiguration.openshift.io/v1 
> kind: KubeletConfig
> metadata:
>   name: cis-hardening
> spec:
>   machineConfigPoolSelector:
>     matchLabels:
>       cis-hardening: "true"
>   kubeletConfig:
>     eventRecordQPS: 5
>     tlsCipherSuites:
>     - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
>     - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>     - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>     - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
>     protectKernelDefaults: false
>     evictionSoftGracePeriod:
>       memory.available:  "5m"
>       nodefs.available:  "5m"
>       nodefs.inodesFree: "5m"
>       imagefs.available: "5m"
>     evictionHard:
>       memory.available:  "100Mi"
>       nodefs.available:  "10%"
>       nodefs.inodesFree: "5%"
>       imagefs.available: "15%"
>     evictionSoft:
>       memory.available:  "100Mi"
>       nodefs.available:  "10%"
>       nodefs.inodesFree: "5%"
>       imagefs.available: "15%"
> EOF
kubeletconfig.machineconfiguration.openshift.io/cis-hardening created
3. Wait until cluster reboot finished, create ssb:
$ oc patch ss default-auto-apply -p '{"debug":true}' --type='merge'
scansetting.compliance.openshift.io/default-auto-apply patched

$ oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-cis
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default-auto-apply
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF


*Actual results:*
The scan could be done successfully. However, the machine config pool will be paused, no remediation will be triggered. In compliance-operator logs, it reported “Waiting until all kubeletconfigs are rendered before un-pausing”.
$ oc get suite
NAME       PHASE   RESULT
my-ssb-r   DONE    NON-COMPLIANT


$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-08fc805cc060ea39fd03b7295ebe0ea6   False     False      False      3              0                   0                     0                      159m
worker   rendered-worker-c6b27222bddf6af46d4ab0cda23e8c01   False     False      False      3              0                   0                     0                      159m



$ oc logs pod/compliance-operator-c574d54b-gsxpq
{"level":"info","ts":1653902446.8609312,"logger":"suitectrl","msg":"All scans are in Done phase. Post-processing remediations","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902446.8651483,"logger":"suitectrl","msg":"Waiting until all kubeletconfigs are rendered before un-pausing","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","MachineConfigPool.Name":"master"}
{"level":"info","ts":1653902446.8653052,"logger":"suitectrl","msg":"KubeletConfig render diff:","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","MachineConfigPool.Name":"master","Diff":"kubeletconfig cis-hardening is not subset of rendered MC 99-master-generated-kubelet, diff: [[Path: /protectKernelDefaults Expected: %!s(bool=false) Got: NOT FOUND]]"}
{"level":"info","ts":1653902456.888762,"logger":"suitectrl","msg":"Reconciling ComplianceSuite","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902456.8889003,"logger":"suitectrl","msg":"Not updating scan, the phase is the same","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis","ComplianceScan.Phase":"DONE"}
{"level":"info","ts":1653902456.888921,"logger":"suitectrl","msg":"Generating events for suite","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902456.889031,"logger":"suitectrl","msg":"Scan is up to date","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis"}
{"level":"info","ts":1653902456.889078,"logger":"suitectrl","msg":"Not updating scan, the phase is the same","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis-node-master","ComplianceScan.Phase":"DONE"}
{"level":"info","ts":1653902456.889086,"logger":"suitectrl","msg":"Generating events for suite","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902456.889126,"logger":"suitectrl","msg":"Scan is up to date","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis-node-master"}
{"level":"info","ts":1653902456.889147,"logger":"suitectrl","msg":"Not updating scan, the phase is the same","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis-node-worker","ComplianceScan.Phase":"DONE"}
{"level":"info","ts":1653902456.8891559,"logger":"suitectrl","msg":"Generating events for suite","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902456.8891933,"logger":"suitectrl","msg":"Scan is up to date","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceScan.Name":"ocp4-cis-node-worker"}
{"level":"info","ts":1653902456.943579,"logger":"suitectrl","msg":"Setting Remediation to applied","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceRemediation.Name":"ocp4-cis-api-server-encryption-provider-config"}
{"level":"info","ts":1653902457.6052217,"logger":"suitectrl","msg":"Setting Remediation to applied","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","ComplianceRemediation.Name":"ocp4-cis-api-server-encryption-provider-cipher"}
{"level":"info","ts":1653902458.3994703,"logger":"suitectrl","msg":"All scans are in Done phase. Post-processing remediations","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r"}
{"level":"info","ts":1653902458.4001622,"logger":"suitectrl","msg":"Waiting until all kubeletconfigs are rendered before un-pausing","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","MachineConfigPool.Name":"worker"}
{"level":"info","ts":1653902458.4001853,"logger":"suitectrl","msg":"KubeletConfig render diff:","Request.Namespace":"openshift-compliance","Request.Name":"my-ssb-r","MachineConfigPool.Name":"worker","Diff":"kubeletconfig cis-hardening is not subset of rendered MC 99-worker-generated-kubelet, diff: [[Path: /protectKernelDefaults Expected: %!s(bool=false) Got: NOT FOUND]]"}

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-08fc805cc060ea39fd03b7295ebe0ea6   False     False      False      3              0                   0                     0                      159m
worker   rendered-worker-c6b27222bddf6af46d4ab0cda23e8c01   False     False      False      3              0                   0                     0                      159m
[xiyuan@MiWiFi-RA69-srv func]$ oc get kubeletconfig -o yaml
apiVersion: v1
items:
- apiVersion: machineconfiguration.openshift.io/v1
  kind: KubeletConfig
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"KubeletConfig","metadata":{"annotations":{},"name":"cis-hardening"},"spec":{"kubeletConfig":{"eventRecordQPS":5,"evictionHard":{"imagefs.available":"15%","memory.available":"100Mi","nodefs.available":"10%","nodefs.inodesFree":"5%"},"evictionSoft":{"imagefs.available":"15%","memory.available":"100Mi","nodefs.available":"10%","nodefs.inodesFree":"5%"},"evictionSoftGracePeriod":{"imagefs.available":"5m","memory.available":"5m","nodefs.available":"5m","nodefs.inodesFree":"5m"},"protectKernelDefaults":false,"tlsCipherSuites":["TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256"]},"machineConfigPoolSelector":{"matchLabels":{"cis-hardening":"true"}}}}
    creationTimestamp: "2022-05-30T08:14:41Z"
    finalizers:
    - 99-master-generated-kubelet
    - 99-worker-generated-kubelet
    generation: 6
    name: cis-hardening
    resourceVersion: "94531"
    uid: a4f6dbe7-ee98-41fd-bfbe-dc741a9e24a4
  spec:
    kubeletConfig:
      eventRecordQPS: 5
      evictionHard:
        imagefs.available: 15%
        imagefs.inodesFree: 5%
        memory.available: 100Mi
        nodefs.available: 10%
        nodefs.inodesFree: 5%
      evictionPressureTransitionPeriod: 0s
      evictionSoft:
        imagefs.available: 15%
        imagefs.inodesFree: 10%
        memory.available: 100Mi
        nodefs.available: 10%
        nodefs.inodesFree: 5%
      evictionSoftGracePeriod:
        imagefs.available: 5m
        imagefs.inodesFree: 1m30s
        memory.available: 5m
        nodefs.available: 5m
        nodefs.inodesFree: 5m
      makeIPTablesUtilChains: true
      protectKernelDefaults: false
      tlsCipherSuites:
      - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
      - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    machineConfigPoolSelector:
      matchLabels:
        cis-hardening: "true"
  status:
    conditions:
    - lastTransitionTime: "2022-05-30T09:05:28Z"
      message: Success
      status: "True"
      type: Success
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

$ oc get mc
NAME                                                           GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                      58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
00-worker                                                      58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
01-master-container-runtime                                    58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
01-master-kubelet                                              58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
01-worker-container-runtime                                    58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
01-worker-kubelet                                              58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
75-ocp4-cis-node-master-kubelet-enable-protect-kernel-sysctl                                              3.1.0             25m
75-ocp4-cis-node-worker-kubelet-enable-protect-kernel-sysctl                                              3.1.0             25m
99-master-fips                                                                                            3.2.0             170m
99-master-generated-kubelet                                    58e73629d83330deae8b829b890b004de22b836b   3.2.0             76m
99-master-generated-registries                                 58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
99-master-ssh                                                                                             3.2.0             170m
99-worker-fips                                                                                            3.2.0             170m
99-worker-generated-kubelet                                    58e73629d83330deae8b829b890b004de22b836b   3.2.0             76m
99-worker-generated-registries                                 58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
99-worker-ssh                                                                                             3.2.0             170m
master-chrony-configuration                                                                               3.1.0             170m
rendered-master-08fc805cc060ea39fd03b7295ebe0ea6               58e73629d83330deae8b829b890b004de22b836b   3.2.0             76m
rendered-master-34f434a96760b6314b768403938c04c5               58e73629d83330deae8b829b890b004de22b836b   3.2.0             25m
rendered-master-3528f5b5add5e716ffee55827207d5d0               58e73629d83330deae8b829b890b004de22b836b   3.2.0             25m
rendered-master-be8ffd817a5e0fd857b8f86da6b02241               58e73629d83330deae8b829b890b004de22b836b   3.2.0             120m
rendered-master-eb453438f6c7d8619d5a4867258392d2               58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
rendered-worker-c6b27222bddf6af46d4ab0cda23e8c01               58e73629d83330deae8b829b890b004de22b836b   3.2.0             76m
rendered-worker-e7fe9cf355b34b211eb67148da9f92ae               58e73629d83330deae8b829b890b004de22b836b   3.2.0             25m
rendered-worker-f2b87e335ecc35bb48cc9fb2d95a4d07               58e73629d83330deae8b829b890b004de22b836b   3.2.0             120m
rendered-worker-fa22a35402bd90183b590edbfc6c3064               58e73629d83330deae8b829b890b004de22b836b   3.2.0             158m
worker-chrony-configuration                                                                               3.1.0             170m


 
*Expected results:*
The remediation could be applied successfully.  

*Additional info:*

Comment 2 Jakub Hrozek 2022-06-02 16:17:20 UTC
Sounds like a bug from the description, needs to be reproduced.

Comment 3 Vincent Shen 2022-06-10 06:11:37 UTC
As discussed here: https://coreos.slack.com/archives/CHCRR73PF/p1653877281147329, let's change this one to a documentation bug instead. 

Hi @antaylor, I am wondering if we could add something like the following under "Troubleshooting the Compliance Operator": https://docs.openshift.com/container-platform/4.10/security/compliance_operator/compliance-operator-troubleshooting.html

We must avoid setting the Kubelet configuration option "protectKernelDefaults" as false because it will not get rendered into the machine config, and it will cause the machine config pool to pause unexpectedly.

Comment 6 Vincent Shen 2022-06-21 15:08:46 UTC
@antaylor

Sorry for the late reply, maybe we can add the following:

There is a known CO issue that setting `protectKernelDefaults: false` in the kubeletconfig will cause the MachineConfigPool to pause unexpectedly.

And I think the location you mentioned looks good.

Comment 7 Andrew Taylor 2022-06-21 15:14:52 UTC
Thanks Vincent, I'll get a pull request to you to review this week.

Comment 8 Andrew Taylor 2022-06-27 21:10:09 UTC
Hey Vincent, 

I created the pull request to add a note to the documentation, just add a /lgtm or let me know if you have any suggestions for improvement.

https://github.com/openshift/openshift-docs/pull/47148

Just to confirm - this only applies to 4.11, correct?

Comment 9 Vincent Shen 2022-06-29 07:13:30 UTC
I think it applies to all OCP version