Bug 2075029 - Cluster Autoscaler is failing to scale down Nodes because of Compliance Operator
Summary: Cluster Autoscaler is failing to scale down Nodes because of Compliance Operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Compliance Operator
Version: 4.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.11.0
Assignee: Jakub Hrozek
QA Contact: xiyuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-13 12:54 UTC by Simon Reber
Modified: 2022-07-13 12:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-06 14:39:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift compliance-operator pull 820 0 None open scans: Delete scan pods and aggregator when they're done 2022-04-21 14:53:34 UTC
Red Hat Knowledge Base (Solution) 6821651 0 None None None 2022-04-13 12:55:42 UTC
Red Hat Product Errata RHBA-2022:4657 0 None None None 2022-06-06 14:39:55 UTC

Description Simon Reber 2022-04-13 12:54:05 UTC
Description of problem:

The Cluster Autoscaler in OpenShift Container Platform 4.8.35 is unable to scale down OpenShift Container Platform 4 - Node(s) because the Compliance Operator is creating pod objects that can not be removed by the Cluster Autoscaler.

I0413 12:21:04.217745       1 cluster.go:148] Fast evaluation: foo-bar-compute-standard-static-2-w4g6h for removal
I0413 12:21:04.217754       1 cluster.go:169] Fast evaluation: node foo-bar-compute-standard-static-2-w4g6h cannot be removed: openshift-compliance/openscap-pod-961119612ca9c5a100003570fd6cf6ded5cd287d is not replicated
I0413 12:21:04.217758       1 cluster.go:148] Fast evaluation: foo-bar-compute-standard-static-2-ffpbs for removal
I0413 12:21:04.217767       1 cluster.go:169] Fast evaluation: node foo-bar-compute-standard-static-2-ffpbs cannot be removed: openshift-compliance/openscap-pod-577b48b11774b6bc044b430a1a3fe60d613f12d5 is not replicated
I0413 12:21:04.217771       1 cluster.go:148] Fast evaluation: rfoo-bar-compute-standard-static-1-2cd92 for removal
I0413 12:21:04.217777       1 cluster.go:169] Fast evaluation: node rfoo-bar-compute-standard-static-1-2cd92 cannot be removed: openshift-compliance/openscap-pod-87f097a0e1f3e43ee8368f19f6e4f86d81c86c9e is not replicated
I0413 12:21:04.217781       1 cluster.go:148] Fast evaluation: foo-bar-compute-standard-static-3-wdprs for removal
I0413 12:21:04.217793       1 cluster.go:169] Fast evaluation: node foo-bar-compute-standard-static-3-wdprs cannot be removed: openshift-compliance/openscap-pod-95160c35fc2106c3f145f8c825f502fd6bfc2c57 is not replicated

Based on https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node objects as created by the Compliance Operator are not able to be removed.

Being unable to downscale OpenShift Container Platform 4 - Node(s) because of Compliance Operator has financial impact and therefore should be addressed in a way that pods can be removed and will potentially be re-created on other available OpenShift Container Platform 4 - Node(s).

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.8.35 (likely all version of Compliance Operator are impacted)

How reproducible:

 - Always

Steps to Reproduce:
1. Install Compliance Operator on OpenShift Container Platform 4
2. Configure Cluster Autoscaler as per https://docs.openshift.com/container-platform/4.10/post_installation_configuration/cluster-tasks.html#cluster-autoscaler-cr_post-install-cluster-tasks
3. Place Compliance Operator pods on OpenShift Container Platform - Node(s) that are dynamically added and removed
4. Watch Cluster Autoscaler to fail scaling down because the Compliance Operator pods are part of https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

Actual results:

Compliance Operator is causing OpenShift Container Platform 4 - Node scale-down from working

Expected results:

Red Hat provided products should not cause OpenShift Container Platform 4 - Node scale-down from failing

Additional info:

Please see https://access.redhat.com/solutions/6821651 which does provide documentation around that issue and https://bugzilla.redhat.com/show_bug.cgi?id=2019963 which was a similar issue.

Comment 8 xiyuan 2022-05-27 14:29:14 UTC
Hi Jakub,
I tried to verify the bug when debug is true or false. However, I didn't reproduce the "Node scale-down from failing". All passed.
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-25-193227   True        False         9h      Cluster version is 4.11.0-0.nightly-2022-05-25-193227
$ oc get ip
NAME            CSV                           APPROVAL    APPROVED
install-88n4p   compliance-operator.v0.1.52   Automatic   true
$ oc get csv
NAME                           DISPLAY                            VERSION   REPLACES   PHASE
compliance-operator.v0.1.52    Compliance Operator                0.1.52               Succeeded
elasticsearch-operator.5.4.2   OpenShift Elasticsearch Operator   5.4.2                Succeeded


Prepare: create autoscaler and MachineAutoscaler:
oc apply -f -<<EOF
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:                                    
  balanceSimilarNodeGroups: true
  resourceLimits:
    maxNodesTotal: 20
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s
    unneededTime: 10s
EOF
 
oc apply -f -<<EOF
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  finalizers:
  - machinetarget.autoscaling.openshift.io
  name: machineautoscaler1
  namespace: openshift-machine-api
spec:                
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: xiyuan28-c-bsg57-worker-a                <<Machineset name
EOF
 
Scenaio 1: debug: true
1. trigger auto scaleup with workload:
$ oc apply -f -<<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scale-up
  labels:
    app: scale-up
spec:
  replicas: 80
  selector:
    matchLabels:
      app: scale-up
  template:
    metadata:
      labels:
        app: scale-up
    spec:
      containers:
      - name: busybox
        image: quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
        resources:
          requests:
            memory: 4Gi
        command:
        - /bin/sh
        - "-c"
        - "echo 'this should be in the logs' && sleep 86400"
      terminationGracePeriodSeconds: 0
EOF
W0527 20:30:37.488961   16840 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/scale-up created
$ oc get machineset  -n openshift-machine-api -w
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan28-c-bsg57-worker-a   10        10        1       1           8h
xiyuan28-c-bsg57-worker-b   1         1         1       1           8h
xiyuan28-c-bsg57-worker-c   1         1         1       1           8h
xiyuan28-c-bsg57-worker-f   0         0                             8h
xiyuan28-c-bsg57-worker-a   10        10        2       2           8h
xiyuan28-c-bsg57-worker-a   10        10        3       3           8h
xiyuan28-c-bsg57-worker-a   10        10        4       4           8h
xiyuan28-c-bsg57-worker-a   10        10        5       5           8h
xiyuan28-c-bsg57-worker-a   10        10        6       6           8h
xiyuan28-c-bsg57-worker-a   10        10        7       7           8h
xiyuan28-c-bsg57-worker-a   10        10        8       8           8h
xiyuan28-c-bsg57-worker-a   10        10        9       9           8h
xiyuan28-c-bsg57-worker-a   10        10        10      10          8h
 
##. create ssb and wait until it done:
$ oc patch ss default -p '{"debug":true}' --type='merge'      ###to keep the pod
$ oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-cis
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
$ oc get pod
NAME                                                    READY   STATUS      RESTARTS   AGE
aggregator-pod-ocp4-cis                                 0/1     Completed   0          2m28s
aggregator-pod-ocp4-cis-node-master                     0/1     Completed   0          108s
aggregator-pod-ocp4-cis-node-worker                     0/1     Completed   0          58s
compliance-operator-59b569f68d-nzt96                    1/1     Running     0          3h16m
ocp4-cis-api-checks-pod                                 0/2     Completed   0          3m10s
ocp4-openshift-compliance-pp-5cd896b74c-zfmg4           1/1     Running     0          3h16m
openscap-pod-16868888f4b9b294c485cf4d0fae522d2711f11a   0/2     Completed   0          3m9s
openscap-pod-2cbabd88cf69664dab369a361f955e5091d15e86   0/2     Completed   0          3m9s
openscap-pod-2d79298421fcff2f0ae485f605f1229af412a1c3   0/2     Completed   0          3m9s
openscap-pod-2da7af0e42f11f0b250aa439b7283ee21680b970   0/2     Completed   0          3m11s
openscap-pod-2e0e7464cdd0a367ce33a822b4c83d7a4e4b8c82   0/2     Completed   0          3m10s
openscap-pod-411d106269a6bd7663210b648720e8122c4b3dd1   0/2     Completed   0          3m9s
openscap-pod-4953187ed5203309285dcd48e1702276dea0df2b   0/2     Completed   0          3m9s
openscap-pod-52325de3be7282576ff8803691d9d2b36015c70a   0/2     Completed   0          3m10s
openscap-pod-57551c87d1ec49c2375c73a21ca67d7b14f4735c   0/2     Completed   0          3m10s
openscap-pod-57749d280bb87de1b20878cf8b083ded57116b94   0/2     Completed   0          3m10s
openscap-pod-6226566951529d52aa0d288eb88682488dd0ecfb   0/2     Completed   0          3m10s
openscap-pod-688e459e8cd71b31bb0aac1a086952b7049d7590   0/2     Completed   0          3m11s
openscap-pod-6bb60245ca232d5b84c308997b5311bf51e1f9db   0/2     Completed   0          3m9s
openscap-pod-a97bba570517e6f018146d894c30c46fef786a3a   0/2     Completed   0          3m9s
openscap-pod-fe8ea3ec722771b8ffd019433aaab1d819e00efd   0/2     Completed   0          3m10s
rhcos4-openshift-compliance-pp-78bf7c5bf9-mhhkk         1/1     Running     0          3h16m
scale-up-587b994bdd-2vzr2                               1/1     Running     0          9m46s
scale-up-587b994bdd-2wpg6                               0/1     Pending     0          9m45s
scale-up-587b994bdd-2zkg7                               1/1     Running     0          9m46s
scale-up-587b994bdd-499sf                               1/1     Running     0          9m46s
scale-up-587b994bdd-4nbjz                               1/1     Running     0          9m46s
scale-up-587b994bdd-4qqc2                               1/1     Running     0          9m46s
scale-up-587b994bdd-5g5xj                               0/1     Pending     0          9m46s
scale-up-587b994bdd-5zcrk                               0/1     Pending     0          9m46s
scale-up-587b994bdd-69848                               0/1     Pending     0          9m46s
scale-up-587b994bdd-6lxhx                               0/1     Pending     0          9m46s
scale-up-587b994bdd-6wgdp                               0/1     Pending     0          9m46s
scale-up-587b994bdd-7j2vc                               1/1     Running     0          9m46s
scale-up-587b994bdd-7l5xc                               0/1     Pending     0          9m46s
scale-up-587b994bdd-7rtcm                               0/1     Pending     0          9m46s
scale-up-587b994bdd-7xd7q                               0/1     Pending     0          9m46s
scale-up-587b994bdd-8frkx                               0/1     Pending     0          9m45s
scale-up-587b994bdd-8qv8j                               1/1     Running     0          9m46s
scale-up-587b994bdd-8z7s4                               1/1     Running     0          9m46s
scale-up-587b994bdd-92kzk                               1/1     Running     0          9m46s
scale-up-587b994bdd-98j2r                               0/1     Pending     0          9m45s
scale-up-587b994bdd-98mgq                               1/1     Running     0          9m45s
scale-up-587b994bdd-9kd64                               0/1     Pending     0          9m45s
scale-up-587b994bdd-9nxmc                               1/1     Running     0          9m46s
scale-up-587b994bdd-b477x                               0/1     Pending     0          9m46s
scale-up-587b994bdd-cd8l6                               0/1     Pending     0          9m46s
scale-up-587b994bdd-cvm2l                               1/1     Running     0          9m46s
scale-up-587b994bdd-d74tv                               0/1     Pending     0          9m46s
scale-up-587b994bdd-dsqvj                               1/1     Running     0          9m46s
scale-up-587b994bdd-gkcb7                               0/1     Pending     0          9m46s
scale-up-587b994bdd-h5dhc                               0/1     Pending     0          9m45s
scale-up-587b994bdd-h7ncj                               0/1     Pending     0          9m46s
scale-up-587b994bdd-hrzgw                               1/1     Running     0          9m46s
scale-up-587b994bdd-j9qrl                               0/1     Pending     0          9m46s
scale-up-587b994bdd-jk67j                               0/1     Pending     0          9m46s
scale-up-587b994bdd-jxwmm                               1/1     Running     0          9m45s
scale-up-587b994bdd-k9t9s                               1/1     Running     0          9m46s
scale-up-587b994bdd-kgf8s                               0/1     Pending     0          9m46s
scale-up-587b994bdd-ks6cz                               1/1     Running     0          9m45s
scale-up-587b994bdd-l9f58                               0/1     Pending     0          9m46s
scale-up-587b994bdd-ll7pp                               1/1     Running     0          9m46s
scale-up-587b994bdd-mkcst                               0/1     Pending     0          9m46s
scale-up-587b994bdd-n7qdp                               1/1     Running     0          9m46s
scale-up-587b994bdd-n9x6h                               0/1     Pending     0          9m46s
scale-up-587b994bdd-nhmcv                               1/1     Running     0          9m46s
scale-up-587b994bdd-nn9fj                               1/1     Running     0          9m46s
scale-up-587b994bdd-p6sqp                               1/1     Running     0          9m46s
scale-up-587b994bdd-phszs                               1/1     Running     0          9m46s
scale-up-587b994bdd-q2xb2                               0/1     Pending     0          9m46s
scale-up-587b994bdd-qnq8v                               0/1     Pending     0          9m46s
scale-up-587b994bdd-qqxts                               1/1     Running     0          9m46s
scale-up-587b994bdd-qs28t                               0/1     Pending     0          9m45s
scale-up-587b994bdd-qz6qm                               0/1     Pending     0          9m45s
scale-up-587b994bdd-rk5p2                               0/1     Pending     0          9m46s
scale-up-587b994bdd-rv5kf                               1/1     Running     0          9m45s
scale-up-587b994bdd-rxwrh                               1/1     Running     0          9m46s
scale-up-587b994bdd-s2f7z                               1/1     Running     0          9m46s
scale-up-587b994bdd-s6mhw                               0/1     Pending     0          9m46s
scale-up-587b994bdd-s72nb                               0/1     Pending     0          9m45s
scale-up-587b994bdd-s8fns                               0/1     Pending     0          9m46s
scale-up-587b994bdd-sdh2x                               0/1     Pending     0          9m46s
scale-up-587b994bdd-szr94                               0/1     Pending     0          9m45s
scale-up-587b994bdd-tcvbp                               1/1     Running     0          9m46s
scale-up-587b994bdd-trdxj                               1/1     Running     0          9m46s
scale-up-587b994bdd-tvt57                               0/1     Pending     0          9m46s
scale-up-587b994bdd-tw9j7                               0/1     Pending     0          9m46s
scale-up-587b994bdd-v2c8d                               0/1     Pending     0          9m46s
scale-up-587b994bdd-vd826                               0/1     Pending     0          9m46s
scale-up-587b994bdd-vnnvv                               0/1     Pending     0          9m46s
scale-up-587b994bdd-vzmwn                               1/1     Running     0          9m46s
scale-up-587b994bdd-w2n2q                               1/1     Running     0          9m45s
scale-up-587b994bdd-wb6tp                               1/1     Running     0          9m46s
scale-up-587b994bdd-xh5kx                               0/1     Pending     0          9m46s
scale-up-587b994bdd-xlxpm                               0/1     Pending     0          9m45s
scale-up-587b994bdd-xtnrg                               0/1     Pending     0          9m46s
scale-up-587b994bdd-xvvvr                               0/1     Pending     0          9m46s
scale-up-587b994bdd-xx659                               0/1     Pending     0          9m45s
scale-up-587b994bdd-xxl94                               0/1     Pending     0          9m46s
scale-up-587b994bdd-zdf64                               1/1     Running     0          9m46s
scale-up-587b994bdd-zm6l7                               0/1     Pending     0          9m46s
scale-up-587b994bdd-zpcxt                               1/1     Running     0          9m45s
e$ oc get suite
NAME       PHASE   RESULT
my-ssb-r   DONE    NON-COMPLIANT
 
##. delete workload:
$ oc delete deployment scale-up
deployment.apps "scale-up" deleted
$ oc get machineset  -n openshift-machine-api -w
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan28-c-bsg57-worker-a   10        10        10      10          8h
xiyuan28-c-bsg57-worker-b   1         1         1       1           8h
xiyuan28-c-bsg57-worker-c   1         1         1       1           8h
xiyuan28-c-bsg57-worker-f   0         0                             8h
xiyuan28-c-bsg57-worker-a   9         10        10      10          8h
xiyuan28-c-bsg57-worker-a   9         10        10      10          8h
xiyuan28-c-bsg57-worker-a   9         9         9       9           8h
xiyuan28-c-bsg57-worker-a   8         9         9       9           8h
xiyuan28-c-bsg57-worker-a   8         9         9       9           8h
xiyuan28-c-bsg57-worker-a   8         8         8       8           8h
xiyuan28-c-bsg57-worker-a   7         8         8       8           8h
xiyuan28-c-bsg57-worker-a   7         8         8       8           8h
xiyuan28-c-bsg57-worker-a   7         7         7       7           8h
xiyuan28-c-bsg57-worker-a   6         7         7       7           8h
xiyuan28-c-bsg57-worker-a   6         7         7       7           8h
xiyuan28-c-bsg57-worker-a   6         6         6       6           8h
xiyuan28-c-bsg57-worker-a   5         6         6       6           8h
xiyuan28-c-bsg57-worker-a   5         6         6       6           8h
xiyuan28-c-bsg57-worker-a   5         5         5       5           8h
xiyuan28-c-bsg57-worker-a   4         5         5       5           8h
xiyuan28-c-bsg57-worker-a   4         5         5       5           8h
xiyuan28-c-bsg57-worker-a   4         4         4       4           8h
xiyuan28-c-bsg57-worker-a   3         4         4       4           8h
xiyuan28-c-bsg57-worker-a   3         4         4       4           8h
xiyuan28-c-bsg57-worker-a   3         3         3       3           8h
xiyuan28-c-bsg57-worker-a   2         3         3       3           8h
xiyuan28-c-bsg57-worker-a   2         3         3       3           8h
xiyuan28-c-bsg57-worker-a   2         2         2       2           8h
xiyuan28-c-bsg57-worker-a   1         2         2       2           8h
xiyuan28-c-bsg57-worker-a   1         2         2       2           8h
xiyuan28-c-bsg57-worker-a   1         1         1       1           8h

===============================================================


Scenatio 2: debug: false
$ oc patch ss default -p '{"debug":false}' --type='merge' 
scansetting.compliance.openshift.io/default patched

$ oapply -f -<<EOF
apiVersion: apps/v1                         
kind: Deployment
metadata:
  name: scale-up
  labels:
    app: scale-up
spec:            
  replicas: 80                                
  selector:
    matchLabels:
      app: scale-up
  template:
    metadata:
      labels:
        app: scale-up
    spec:
      containers:
      - name: busybox
        image: quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
        resources:
          requests:
            memory: 4Gi
        command:
        - /bin/sh
        - "-c"
        - "echo 'this should be in the logs' && sleep 86400"
      terminationGracePeriodSeconds: 0
EOF
W0527 21:13:11.198881   19536 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/scale-up created
$ oc get machineset  -n openshift-machine-api -w
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan28-c-bsg57-worker-a   10        10        1       1           10h
xiyuan28-c-bsg57-worker-b   1         1         1       1           10h
xiyuan28-c-bsg57-worker-c   1         1         1       1           10h
xiyuan28-c-bsg57-worker-f   0         0                             10h
xiyuan28-c-bsg57-worker-a   10        10        2       2           10h
xiyuan28-c-bsg57-worker-a   10        10        3       3           10h
xiyuan28-c-bsg57-worker-a   10        10        4       4           10h
xiyuan28-c-bsg57-worker-a   10        10        5       5           10h
xiyuan28-c-bsg57-worker-a   10        10        6       6           10h
xiyuan28-c-bsg57-worker-a   10        10        7       7           10h
xiyuan28-c-bsg57-worker-a   10        10        8       8           10h
xiyuan28-c-bsg57-worker-a   10        10        9       9           10h
xiyuan28-c-bsg57-worker-a   10        10        10      10          10h
^C[xiyuan@MiWiFi-RA69-srv func]$ oapply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-moderate-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
scansettingbinding.compliance.openshift.io/my-ssb-r created
$ oc get suite
NAME       PHASE       RESULT
my-ssb-r   LAUNCHING   NOT-AVAILABLE
$ oc get suite
NAME       PHASE     RESULT
my-ssb-r   PENDING   NOT-AVAILABLE
$ oc get suite
NAME       PHASE     RESULT
my-ssb-r   PENDING   NOT-AVAILABLE
$ oc get suite
NAME       PHASE       RESULT
my-ssb-r   LAUNCHING   NOT-AVAILABLE
$ oc delete deployment scale-up
deployment.apps "scale-up" deleted
$ oc get machineset  -n openshift-machine-api -w
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan28-c-bsg57-worker-a   10        10        10      10          10h
xiyuan28-c-bsg57-worker-b   1         1         1       1           10h
xiyuan28-c-bsg57-worker-c   1         1         1       1           10h
xiyuan28-c-bsg57-worker-f   0         0                             10h
xiyuan28-c-bsg57-worker-a   9         10        10      10          10h
xiyuan28-c-bsg57-worker-a   9         10        10      10          10h
xiyuan28-c-bsg57-worker-a   9         9         9       9           10h
xiyuan28-c-bsg57-worker-a   8         9         9       9           10h
xiyuan28-c-bsg57-worker-a   8         9         9       9           10h
xiyuan28-c-bsg57-worker-a   8         8         8       8           10h
xiyuan28-c-bsg57-worker-a   7         8         8       8           10h
xiyuan28-c-bsg57-worker-a   7         8         8       8           10h
xiyuan28-c-bsg57-worker-a   7         7         7       7           10h
xiyuan28-c-bsg57-worker-a   6         7         7       7           10h
xiyuan28-c-bsg57-worker-a   6         7         7       7           10h
xiyuan28-c-bsg57-worker-a   6         6         6       6           10h
xiyuan28-c-bsg57-worker-a   5         6         6       6           10h
xiyuan28-c-bsg57-worker-a   5         6         6       6           10h
xiyuan28-c-bsg57-worker-a   5         5         5       5           10h
xiyuan28-c-bsg57-worker-a   4         5         5       5           10h
xiyuan28-c-bsg57-worker-a   4         5         5       5           10h
xiyuan28-c-bsg57-worker-a   4         4         4       4           10h
xiyuan28-c-bsg57-worker-a   3         4         4       4           10h
xiyuan28-c-bsg57-worker-a   3         4         4       4           10h
xiyuan28-c-bsg57-worker-a   3         3         3       3           10h
xiyuan28-c-bsg57-worker-a   2         3         3       3           10h
xiyuan28-c-bsg57-worker-a   2         3         3       3           10h
xiyuan28-c-bsg57-worker-a   2         2         2       2           10h
xiyuan28-c-bsg57-worker-a   1         2         2       2           10h
xiyuan28-c-bsg57-worker-a   1         2         2       2           10h
xiyuan28-c-bsg57-worker-a   1         1         1       1           10h
^C$ oc get suite
NAME       PHASE         RESULT
my-ssb-r   AGGREGATING   NOT-AVAILABLE

Comment 9 Jakub Hrozek 2022-05-29 14:57:01 UTC
Did the autoscaler scale down the cluster during your testing?
I think the simplest test would be to drain one of the worker nodes after finishing a scan with debug=false.

Comment 10 xiyuan 2022-05-31 05:20:28 UTC
$ oc apply -f -<<EOF
> apiVersion: autoscaling.openshift.io/v1beta1
> kind: MachineAutoscaler
> metadata:
>   finalizers:
>   - machinetarget.autoscaling.openshift.io
>   name: machineautoscaler1
>   namespace: openshift-machine-api
> spec:                
>   maxReplicas: 10
>   minReplicas: 1
>   scaleTargetRef:
>     apiVersion: machine.openshift.io/v1beta1
>     kind: MachineSet
>     name: xiyuan31-11-f4gjb-worker-us-east-2a
> EOF
machineautoscaler.autoscaling.openshift.io/machineautoscaler1 created
$ oc apply -f -<<EOF
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: scale-up
>   labels:
>     app: scale-up
> spec:
>   replicas: 80
>   selector:
>     matchLabels:
>       app: scale-up
>   template:
>     metadata:
>       labels:
>         app: scale-up
>     spec:
>       containers:
>       - name: busybox
>         image: quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
>         resources:
>           requests:
>             memory: 4Gi
>         command:
>         - /bin/sh
>         - "-c"
>         - "echo 'this should be in the logs' && sleep 86400"
>       terminationGracePeriodSeconds: 0
> EOF
deployment.apps/scale-up created
$ oc get machineset  -n openshift-machine-api -w
NAME                                  DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan31-11-f4gjb-worker-us-east-2a   1         1         1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2b   1         1         1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2c   1         1         1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2a   10        1         1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2a   10        1         1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        1       1           3h54m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        2       2           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        3       3           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        4       4           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        5       5           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        6       6           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        7       7           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        8       8           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        9       9           3h58m
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        10      10          3h59m
$ oc apply -f -<<EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ScanSettingBinding
> metadata:
>   name: my-ssb-r
> profiles:
>   - name: ocp4-cis
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
>   - name: ocp4-cis-node
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
> settingsRef:
>   name: default
>   kind: ScanSetting
>   apiGroup: compliance.openshift.io/v1alpha1
> EOF
scansettingbinding.compliance.openshift.io/my-ssb-r created


$ oc get machineset  -n openshift-machine-api -w
NAME                                  DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        10      10          4h
xiyuan31-11-f4gjb-worker-us-east-2b   1         1         1       1           4h
xiyuan31-11-f4gjb-worker-us-east-2c   1         1         1       1           4h
^$ oc get suite -w
NAME       PHASE     RESULT
my-ssb-r   RUNNING   NOT-AVAILABLE
my-ssb-r   RUNNING   NOT-AVAILABLE
my-ssb-r   RUNNING   NOT-AVAILABLE
my-ssb-r   AGGREGATING   NOT-AVAILABLE
my-ssb-r   AGGREGATING   NOT-AVAILABLE
my-ssb-r   AGGREGATING   NOT-AVAILABLE
my-ssb-r   DONE          NON-COMPLIANT
my-ssb-r   DONE          NON-COMPLIANT
$ oc get cm
NAME                                                    DATA   AGE
compliance-operator-lock                                0      93m
kube-root-ca.crt                                        1      3h12m
ocp4-cis-api-checks-pod                                 3      12m
ocp4-cis-node-master-openscap-container-entrypoint      1      13m
ocp4-cis-node-master-openscap-env-map                   4      13m
ocp4-cis-node-master-openscap-env-map-platform          3      13m
ocp4-cis-node-worker-openscap-container-entrypoint      1      13m
ocp4-cis-node-worker-openscap-env-map                   4      13m
ocp4-cis-node-worker-openscap-env-map-platform          3      13m
ocp4-cis-openscap-container-entrypoint                  1      13m
ocp4-cis-openscap-env-map                               4      13m
ocp4-cis-openscap-env-map-platform                      3      13m
openscap-pod-01f686dee5e362c787f5b42e170127435042fcf0   3      12m
openscap-pod-37cdc96215d8e43d4eefbbe325a8726631974f1c   3      12m
openscap-pod-43d971f28ade6342f18ef30cac1e71fcc5f935ac   3      12m
openscap-pod-6353dc45a366a339011200b34cf6acdef01e5a22   3      12m
openscap-pod-816bd2594489f218c3ae7fcf445445ed5d3c923a   3      12m
openscap-pod-846c3a0459c9cb8b4d658c32cdcb38fd5f8b59b8   3      12m
openscap-pod-89a0cc594f6d9088eff9f189b870bcbaa490faaf   3      12m
openscap-pod-8f8d0ee2426bc43da587627559bbcc5baae3824c   3      12m
openscap-pod-8faa57d7cb1f5fa95c311b35f3b2e7e3d8468060   3      12m
openscap-pod-a65d508f883706e2a906b9822790841aec21aee2   3      12m
openscap-pod-ba34ed76ce9ab4fb5c4f6ca7efc6aaa58bf77bf3   3      12m
openscap-pod-c92fabc7f4ea8e432377034aae538636b5cf12dd   3      12m
openscap-pod-d9bb91d36e788496120615bbc287d362124992a7   3      12m
openscap-pod-efc534f72bffec33dbb25f2447aa3b3809db39c7   3      12m
openscap-pod-f614b363710192b8c5e8ebe3cf2c036c5cff3eea   3      12m
openshift-service-ca.crt                                1      3h12m
test-node-tp                                            1      157m
$ oc get machineset  -n openshift-machine-api -w
NAME                                  DESIRED   CURRENT   READY   AVAILABLE   AGE
xiyuan31-11-f4gjb-worker-us-east-2a   10        10        10      10          4h7m
xiyuan31-11-f4gjb-worker-us-east-2b   1         1         1       1           4h7m
xiyuan31-11-f4gjb-worker-us-east-2c   1         1         1       1           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   9         10        10      10          4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   9         10        10      10          4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   9         9         9       9           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   8         9         9       9           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   8         9         9       9           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   8         8         8       8           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   7         8         8       8           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   7         8         8       8           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   7         7         7       7           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   6         7         7       7           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   6         7         7       7           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   6         6         6       6           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   5         6         6       6           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   5         6         6       6           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   5         5         5       5           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   4         5         5       5           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   4         5         5       5           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   4         4         4       4           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   3         4         4       4           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   3         4         4       4           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   3         3         3       3           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   2         3         3       3           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   2         3         3       3           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   2         2         2       2           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   1         2         2       2           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   1         2         2       2           4h7m
xiyuan31-11-f4gjb-worker-us-east-2a   1         1         1       1           4h7m

Comment 11 xiyuan 2022-05-31 05:22:04 UTC
autoscaler could scale down the cluster during & after compliance operator testing
Per https://bugzilla.redhat.com/show_bug.cgi?id=2075029#c8 and https://bugzilla.redhat.com/show_bug.cgi?id=2075029#c10, move it to verified.

Comment 13 errata-xmlrpc 2022-06-06 14:39:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Compliance Operator bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4657


Note You need to log in before you can comment on or make changes to this bug.