Bug 2070118

Summary: Compliance operator scan pods crash-looping OOM
Product: OpenShift Container Platform Reporter: vyoganan <vyoganan>
Component: Compliance OperatorAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED DUPLICATE QA Contact: xiyuan
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: agawand, dpateriy, igreen, jhrozek, lbragsta, mbagga, mrogers, suprs, wenshen, xiyuan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-16 12:54:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description vyoganan 2022-03-30 12:40:29 UTC
Description of problem:
Compliance scanner pod for ocp4-pci-dss cluster checks starts but is crash looping after an initContainer is kill due to OOM situation. The pod is looping when starting "api-resource-collector" InitContainer.

Below output of "oc get pod -w"
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:0/2            0          2s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            0          6s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            0          9s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled      0          16s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            1          20s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled      1          30s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   1          43s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                2          45s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          2          55s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   2          70s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                3          85s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          3          93s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   3          105s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                4          2m17s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          4          2m27s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   4          2m39s


Version-Release number of selected component (if applicable):
Currently running OCP 4.6.52 with compliance-operator v0.1.48

How reproducible:
Currently running OCP 4.6.52 with compliance-operator v0.1.48

Steps to Reproduce:
1.
2.
3.

Actual results:
Compliance operator should be running state.

Expected results:
scan results need to be generate. Results are saved in 2 formats xccdf + arf.

Additional info:
Action taken
==
Please execute inside the pod
--
grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 0 // Example Output
$ sed -e '' </dev/zero  # provoke an OOM kill
Killed // Example Output
$ echo $?
137 // Example Output
$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 1 // Example Output

Comment 2 vyoganan 2022-04-01 10:14:52 UTC
In my search, haven't found a relevant solution to fix. Please look into this error.

// Compliance Operator logs 


Logs for openshift-compliance_compliance-operator
==
[vyoganan@supportshell compliance-operator]$ pwd
/home/remote/vyoganan/03146955/sosreport-20220331-081339/master0.cacf-ais-ocp.dcx.dlh.de/var/log/pods/openshift-compliance_compliance-operator-56944ddddb-vwlrx_5fc514dd-4aa4-4249-a88a-15bcc58bbe35/compliance-operator

2022-03-25T08:59:47.317717590+00:00 stderr F {"level":"error","ts":1648198787.3175986,"logger":"controller","msg":"Reconciler error","controller":"compliancesuite-controller","name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance","error":"Error setting ready status for suite: Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
2022-03-25T09:02:04.131751053+00:00 stderr F {"level":"error","ts":1648198924.1316404,"logger":"scanctrl","msg":"Cannot retrieve pod","Request.Namespace":"openshift-compliance","Request.Name":"ocp4-pci-dss-modified","Pod.Name":"aggregator-pod-ocp4-pci-dss-modified","error":"Pod \"aggregator-pod-ocp4-pci-dss-modified\" not found","stacktrace":"github.com/openshift/compliance-operator/pkg/controller/compliancescan.isAggregatorRunning\n\t/remote-source/app/pkg/controller/compliancescan/aggregator.go:133\ngithub.com/openshift/compliance-operator/pkg/controller/compliancescan.(*ReconcileComplianceScan).phaseAggregatingHandler\n\t/remote-source/app/pkg/controller/compliancescan/compliancescan_controller.go:407\ngithub.com/openshift/compliance-operator/pkg/controller/compliancescan.(*ReconcileComplianceScan).Reconcile\n\t/remote-source/app/pkg/controller/compliancescan/compliancescan_controller.go:172\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}


[vyoganan@supportshell compliance-operator]$ pwd
/home/remote/vyoganan/03146955/sosreport-20220331-081339/master0.cacf-ais-ocp.dcx.dlh.de/var/log/pods/openshift-compliance_compliance-operator-56944ddddb-vwlrx_5fc514dd-4aa4-4249-a88a-15bcc58bbe35/compliance-operator


2022-03-31T07:55:21.427182630+00:00 stderr F {"level":"error","ts":1648713321.4270937,"logger":"suitectrl","msg":"Retriable error","Request.Namespace":"openshift-compliance","Request.Name":"ocp4-pci-cluster-binding","error":"Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/openshift/compliance-operator/pkg/controller/compliancesuite.(*ReconcileComplianceSuite).Reconcile\n\t/remote-source/app/pkg/controller/compliancesuite/compliancesuite_controller.go:174\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
2022-03-31T07:55:21.427182630+00:00 stderr F {"level":"error","ts":1648713321.4271314,"logger":"controller","msg":"Reconciler error","controller":"compliancesuite-controller","name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance","error":"Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
[supportshell.prod.useraccess-us-west-2.redhat.com] [07:24:34+0000]

Comment 4 vyoganan 2022-04-07 08:48:44 UTC
Please find the scan binding.
---------
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"compliance.openshift.io/v1alpha1","kind":"ScanSettingBinding","metadata":{"annotations":{},"name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance"},"profiles":[{"apiGroup":"compliance.openshift.io/v1alpha1","kind":"TailoredProfile","name":"ocp4-pci-dss-modified"}],"settingsRef":{"apiGroup":"compliance.openshift.io/v1alpha1","kind":"ScanSetting","name":"default"}}
  creationTimestamp: "2022-03-22T09:14:48Z"
  generation: 2
  managedFields:
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:profiles: {}
      f:settingsRef:
        .: {}
        f:apiGroup: {}
        f:kind: {}
    manager: OpenAPI-Generator
    operation: Update
    time: "2022-03-22T09:14:48Z"
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:outputRef:
          .: {}
          f:apiGroup: {}
          f:kind: {}
          f:name: {}
    manager: compliance-operator
    operation: Update
    time: "2022-03-22T09:15:08Z"
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:settingsRef:
        f:name: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-03-25T08:52:15Z"
  name: ocp4-pci-cluster-binding
  namespace: openshift-compliance
  resourceVersion: "453951817"
  selfLink: /apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/scansettingbindings/ocp4-pci-cluster-binding
  uid: c21eed25-80a7-418a-ae71-a72beedd9275
profiles:
- apiGroup: compliance.openshift.io/v1alpha1
  kind: TailoredProfile
  name: ocp4-pci-dss-modified
settingsRef:
  apiGroup: compliance.openshift.io/v1alpha1
  kind: ScanSetting
  name: default
status:
  conditions:
  - lastTransitionTime: "2022-03-22T09:15:08Z"
    message: The scan setting binding was successfully processed
    reason: Processed
    status: "True"
    type: Ready
  outputRef:
    apiGroup: compliance.openshift.io
    kind: ComplianceSuite
    name: ocp4-pci-cluster-binding

Comment 13 vyoganan 2022-05-31 14:08:34 UTC
Dear Team,
We have found the "Back-off restarting failed container" for the pod `ocp4-pci-dss-modified-api-checks-pod`

[vyoganan@supportshell-1 0140-openshift-compliance-inspect.tgz]$ omg get events -n openshift-compliance
LAST SEEN  TYPE     REASON   OBJECT                                    MESSAGE
1h12m      Normal   Pulling  pod/ocp4-pci-dss-modified-api-checks-pod  Pulling image "registry.redhat.io/compliance/openshift-compliance-rhel8-operator@sha256:b910fd7322b2e6b1d486d0732e191917fcae9df240df93d1a667be948e63c553"
2m15s      Warning  BackOff  pod/ocp4-pci-dss-modified-api-checks-pod  Back-off restarting failed container

Comment 17 Jakub Hrozek 2022-06-16 12:54:47 UTC

*** This bug has been marked as a duplicate of bug 2094854 ***

Comment 18 Red Hat Bugzilla 2023-09-15 01:53:25 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days