Bug 2070118 - Compliance operator scan pods crash-looping OOM
Summary: Compliance operator scan pods crash-looping OOM
Keywords:
Status: CLOSED DUPLICATE of bug 2094854
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Compliance Operator
Version: 4.6
Hardware: All
OS: All
high
high
Target Milestone: ---
: ---
Assignee: Jakub Hrozek
QA Contact: xiyuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-30 12:40 UTC by vyoganan
Modified: 2023-09-15 01:53 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-16 12:54:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description vyoganan 2022-03-30 12:40:29 UTC
Description of problem:
Compliance scanner pod for ocp4-pci-dss cluster checks starts but is crash looping after an initContainer is kill due to OOM situation. The pod is looping when starting "api-resource-collector" InitContainer.

Below output of "oc get pod -w"
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:0/2            0          2s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            0          6s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            0          9s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled      0          16s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2            1          20s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled      1          30s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   1          43s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                2          45s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          2          55s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   2          70s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                3          85s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          3          93s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   3          105s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:1/2                4          2m17s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:OOMKilled          4          2m27s
ocp4-pci-dss-modified-api-checks-pod                           0/2     Init:CrashLoopBackOff   4          2m39s


Version-Release number of selected component (if applicable):
Currently running OCP 4.6.52 with compliance-operator v0.1.48

How reproducible:
Currently running OCP 4.6.52 with compliance-operator v0.1.48

Steps to Reproduce:
1.
2.
3.

Actual results:
Compliance operator should be running state.

Expected results:
scan results need to be generate. Results are saved in 2 formats xccdf + arf.

Additional info:
Action taken
==
Please execute inside the pod
--
grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 0 // Example Output
$ sed -e '' </dev/zero  # provoke an OOM kill
Killed // Example Output
$ echo $?
137 // Example Output
$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 1 // Example Output

Comment 2 vyoganan 2022-04-01 10:14:52 UTC
In my search, haven't found a relevant solution to fix. Please look into this error.

// Compliance Operator logs 


Logs for openshift-compliance_compliance-operator
==
[vyoganan@supportshell compliance-operator]$ pwd
/home/remote/vyoganan/03146955/sosreport-20220331-081339/master0.cacf-ais-ocp.dcx.dlh.de/var/log/pods/openshift-compliance_compliance-operator-56944ddddb-vwlrx_5fc514dd-4aa4-4249-a88a-15bcc58bbe35/compliance-operator

2022-03-25T08:59:47.317717590+00:00 stderr F {"level":"error","ts":1648198787.3175986,"logger":"controller","msg":"Reconciler error","controller":"compliancesuite-controller","name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance","error":"Error setting ready status for suite: Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
2022-03-25T09:02:04.131751053+00:00 stderr F {"level":"error","ts":1648198924.1316404,"logger":"scanctrl","msg":"Cannot retrieve pod","Request.Namespace":"openshift-compliance","Request.Name":"ocp4-pci-dss-modified","Pod.Name":"aggregator-pod-ocp4-pci-dss-modified","error":"Pod \"aggregator-pod-ocp4-pci-dss-modified\" not found","stacktrace":"github.com/openshift/compliance-operator/pkg/controller/compliancescan.isAggregatorRunning\n\t/remote-source/app/pkg/controller/compliancescan/aggregator.go:133\ngithub.com/openshift/compliance-operator/pkg/controller/compliancescan.(*ReconcileComplianceScan).phaseAggregatingHandler\n\t/remote-source/app/pkg/controller/compliancescan/compliancescan_controller.go:407\ngithub.com/openshift/compliance-operator/pkg/controller/compliancescan.(*ReconcileComplianceScan).Reconcile\n\t/remote-source/app/pkg/controller/compliancescan/compliancescan_controller.go:172\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}


[vyoganan@supportshell compliance-operator]$ pwd
/home/remote/vyoganan/03146955/sosreport-20220331-081339/master0.cacf-ais-ocp.dcx.dlh.de/var/log/pods/openshift-compliance_compliance-operator-56944ddddb-vwlrx_5fc514dd-4aa4-4249-a88a-15bcc58bbe35/compliance-operator


2022-03-31T07:55:21.427182630+00:00 stderr F {"level":"error","ts":1648713321.4270937,"logger":"suitectrl","msg":"Retriable error","Request.Namespace":"openshift-compliance","Request.Name":"ocp4-pci-cluster-binding","error":"Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/openshift/compliance-operator/pkg/controller/compliancesuite.(*ReconcileComplianceSuite).Reconcile\n\t/remote-source/app/pkg/controller/compliancesuite/compliancesuite_controller.go:174\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
2022-03-31T07:55:21.427182630+00:00 stderr F {"level":"error","ts":1648713321.4271314,"logger":"controller","msg":"Reconciler error","controller":"compliancesuite-controller","name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance","error":"Operation cannot be fulfilled on compliancesuites.compliance.openshift.io \"ocp4-pci-cluster-binding\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.11/pkg/util/wait/wait.go:90"}
[supportshell.prod.useraccess-us-west-2.redhat.com] [07:24:34+0000]

Comment 4 vyoganan 2022-04-07 08:48:44 UTC
Please find the scan binding.
---------
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"compliance.openshift.io/v1alpha1","kind":"ScanSettingBinding","metadata":{"annotations":{},"name":"ocp4-pci-cluster-binding","namespace":"openshift-compliance"},"profiles":[{"apiGroup":"compliance.openshift.io/v1alpha1","kind":"TailoredProfile","name":"ocp4-pci-dss-modified"}],"settingsRef":{"apiGroup":"compliance.openshift.io/v1alpha1","kind":"ScanSetting","name":"default"}}
  creationTimestamp: "2022-03-22T09:14:48Z"
  generation: 2
  managedFields:
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:profiles: {}
      f:settingsRef:
        .: {}
        f:apiGroup: {}
        f:kind: {}
    manager: OpenAPI-Generator
    operation: Update
    time: "2022-03-22T09:14:48Z"
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:outputRef:
          .: {}
          f:apiGroup: {}
          f:kind: {}
          f:name: {}
    manager: compliance-operator
    operation: Update
    time: "2022-03-22T09:15:08Z"
  - apiVersion: compliance.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:settingsRef:
        f:name: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-03-25T08:52:15Z"
  name: ocp4-pci-cluster-binding
  namespace: openshift-compliance
  resourceVersion: "453951817"
  selfLink: /apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/scansettingbindings/ocp4-pci-cluster-binding
  uid: c21eed25-80a7-418a-ae71-a72beedd9275
profiles:
- apiGroup: compliance.openshift.io/v1alpha1
  kind: TailoredProfile
  name: ocp4-pci-dss-modified
settingsRef:
  apiGroup: compliance.openshift.io/v1alpha1
  kind: ScanSetting
  name: default
status:
  conditions:
  - lastTransitionTime: "2022-03-22T09:15:08Z"
    message: The scan setting binding was successfully processed
    reason: Processed
    status: "True"
    type: Ready
  outputRef:
    apiGroup: compliance.openshift.io
    kind: ComplianceSuite
    name: ocp4-pci-cluster-binding

Comment 13 vyoganan 2022-05-31 14:08:34 UTC
Dear Team,
We have found the "Back-off restarting failed container" for the pod `ocp4-pci-dss-modified-api-checks-pod`

[vyoganan@supportshell-1 0140-openshift-compliance-inspect.tgz]$ omg get events -n openshift-compliance
LAST SEEN  TYPE     REASON   OBJECT                                    MESSAGE
1h12m      Normal   Pulling  pod/ocp4-pci-dss-modified-api-checks-pod  Pulling image "registry.redhat.io/compliance/openshift-compliance-rhel8-operator@sha256:b910fd7322b2e6b1d486d0732e191917fcae9df240df93d1a667be948e63c553"
2m15s      Warning  BackOff  pod/ocp4-pci-dss-modified-api-checks-pod  Back-off restarting failed container

Comment 17 Jakub Hrozek 2022-06-16 12:54:47 UTC

*** This bug has been marked as a duplicate of bug 2094854 ***

Comment 18 Red Hat Bugzilla 2023-09-15 01:53:25 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.