Bug 2117268 - ocp4-pci-dss-api-checks-pod in CrashLoopBackoff state due to ignition spec.config not in MC
Summary: ocp4-pci-dss-api-checks-pod in CrashLoopBackoff state due to ignition spec.co...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Compliance Operator
Version: 4.11
Hardware: s390
OS: Linux
high
high
Target Milestone: ---
: 4.12.0
Assignee: Jakub Hrozek
QA Contact: xiyuan
Jeana Routh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-10 13:14 UTC by mapillai
Modified: 2022-12-22 19:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, the Compliance Operator failed to fetch API resources when parsing machine configurations without ignition specifications. This caused the `api-check-pods` check to crash loop. With this release, the Compliance Operator is updated to gracefully handle machine config pools without ignition specifications. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2117268[*BZ#2117268*])
Clone Of:
Environment:
Last Closed: 2022-11-02 16:00:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The logs from the pod for all the containers. (6.46 KB, text/plain)
2022-08-10 13:14 UTC, mapillai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ComplianceAsCode compliance-operator pull 81 0 None open Bug 2117268: api-resource-collector: Don't attempt to parse empty Ignition 2022-08-11 12:41:53 UTC
Red Hat Product Errata RHBA-2022:6657 0 None None None 2022-11-02 16:01:00 UTC

Description mapillai 2022-08-10 13:14:23 UTC
Created attachment 1904711 [details]
The logs from the pod for all the containers.

Description of problem:
While enabling the PCI scan the ocp4-cpi-dss-api-checks-pod is crashloopbackoff as the containers are not starting up. Have a word with Jhrozek  and looks like the issue is with ignition spec.config not present in the MC. 

Version-Release number of selected component (if applicable):
CO version 0.1.53
OCP version 4.11.0-rc.2

How reproducible:


Steps to Reproduce:
1. Install the compliance operator in s390x
2. Create/Apply the scan setting and scan binding for the ocp4-pci-dss and ocp4-pci-dss-node
3.

Actual results:


ocp4-pci-dss-api-checks-pod                       0/2     Init:CrashLoopBackOff   40 (4m23s ago)   3h13m

Expected results:

ocp4-pci-dss-api-checks-pod                       2/2     Running   40 (4m23s ago)   3h13m


Additional info:

Comment 1 mapillai 2022-08-10 13:17:09 UTC
MC content of 99-master-kargs-mpath
oc get mc 99-master-kargs-mpath -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfig","metadata":{"annotations":{},"labels":{"machineconfiguration.openshift.io/role":"master"},"name":"99-master-kargs-mpath"},"spec":{"kernelArguments":["rd.multipath=default","root=/dev/disk/by-label/dm-mpath-root"]}}
  creationTimestamp: "2022-08-09T17:20:24Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-master-kargs-mpath
  resourceVersion: "32552"
  uid: 676bdf51-e8e6-4d47-9b12-eb5b85763193
spec:
  kernelArguments:
  - rd.multipath=default
  - root=/dev/disk/by-label/dm-mpath-root


oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
00-worker                                          35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
01-master-container-runtime                        35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
01-master-kubelet                                  35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
01-worker-container-runtime                        35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
01-worker-kubelet                                  35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
99-master-fips                                                                                3.2.0             19h
99-master-generated-registries                     35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
99-master-kargs-mpath                                                                                           19h
99-master-ssh                                                                                 3.2.0             19h
99-worker-fips                                                                                3.2.0             19h
99-worker-generated-registries                     35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
99-worker-ssh                                                                                 3.2.0             19h
rendered-master-31752550f462eb064898b4438d3eddc0   35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
rendered-master-7775ea95940677bd253bce8f83e30a0f   35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
rendered-master-db9a448ba06608879ebcbd032edd2834   35d79621a58766190071f95415f0bef74ee204a7   3.2.0             5h37m
rendered-worker-34e13f499a246c46951d0ae7efee91b7   35d79621a58766190071f95415f0bef74ee204a7   3.2.0             19h
rendered-worker-482af1dbfb11969f0d7236345ceb8b4e   35d79621a58766190071f95415f0bef74ee204a7   3.2.0             5h3

Comment 2 Jakub Hrozek 2022-08-10 13:25:40 UTC
As discussed on Slack, this is a legit bug.

Comment 3 Jakub Hrozek 2022-08-11 12:41:07 UTC
This might be an easier MC to test with:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2022-08-11T10:20:36Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kargs-audit
  resourceVersion: "58789"
  uid: bafa4e44-976b-4628-bec3-4b0e07655c05
spec:
  kernelArguments:
  - audit=1

Comment 4 Jakub Hrozek 2022-08-11 12:43:24 UTC
Setting blocker- because even if embarassing, this bug has an easy workaround of either merging the non-ignition params into another MC that has ignition or the other way around.

Comment 5 xiyuan 2022-08-16 10:35:16 UTC
verification pass with pre-merge process.
$ git log |head
commit 3bcc1bd367a6ec8270b5c92b94d4a06a9f479804
Author: Jakub Hrozek <jhrozek>
Date:   Thu Aug 11 14:33:26 2022 +0200

    api-resource-collector: Don't attempt to parse empty Ignition
    
    While filtering files out of MachineConfigs, we tried to also parse
    ignition specification of MachineConfigs that didn't specify any,
    leading to an error.
    
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-08-15-150248   True        False         8h      Cluster version is 4.12.0-0.nightly-2022-08-15-150248

1. create mc without ignition defined:
$ oc apply -f -<<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2022-08-11T10:20:36Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kargs-audit
  resourceVersion: "58789"
  uid: bafa4e44-976b-4628-bec3-4b0e07655c05
spec:
  kernelArguments:
  - audit=1
EOF
machineconfig.machineconfiguration.openshift.io/99-worker-kargs-audit created
2. Create large scale of mc:
# python3 genmc.py --in-file=/usr/share/dict/linux.words --num-files=150 --start-num=0 --create
machineconfig.machineconfiguration.openshift.io/75-testmc-1.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-2.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-3.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-4.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-5.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-6.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-7.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-8.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-9.rule created
machineconfig.machineconfiguration.openshift.io/75-testmc-10.rule created
...
machineconfig.machineconfiguration.openshift.io/75-testmc-150.rule created
$ oc get mc -o json | pv -b >/dev/null
 172MiB

3. trigger test:
# oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-pci-dss
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-pci-dss-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
scansettingbinding.compliance.openshift.io/my-ssb-r created
# oc get suite -w
NAME       PHASE     RESULT
my-ssb-r   RUNNING   NOT-AVAILABLE
my-ssb-r   AGGREGATING   NOT-AVAILABLE
my-ssb-r   DONE          NON-COMPLIANT
my-ssb-r   DONE          NON-COMPLIANT

Comment 8 xiyuan 2022-09-21 09:28:28 UTC
Verified with 4.12.0-0.nightly-2022-09-20-095559 + compliance-operator.v0.1.55.
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-20-095559   True        False         7h40m   Cluster version is 4.12.0-0.nightly-2022-09-20-095559
$ oc get csv
NAME                           DISPLAY                            VERSION   REPLACES                                    PHASE
compliance-operator.v0.1.55    Compliance Operator                0.1.55                                                Succeeded

$ oc apply -f -<<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2022-08-11T10:20:36Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kargs-audit
  resourceVersion: "58789"
  uid: bafa4e44-976b-4628-bec3-4b0e07655c05
spec:
  kernelArguments:
  - audit=1
EOF
machineconfig.machineconfiguration.openshift.io/99-worker-kargs-audit created

$ oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-pci-dss
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-pci-dss-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
scansettingbinding.compliance.openshift.io/my-ssb-r created

$ oc get suite -w
NAME             PHASE     RESULToc g
my-ssb-r         RUNNING   NOT-AVAILABLE
my-ssb-r         RUNNING   NOT-AVAILABLE
my-ssb-r         RUNNING   NOT-AVAILABLE
my-ssb-r         AGGREGATING   NOT-AVAILABLE
my-ssb-r         AGGREGATING   NOT-AVAILABLE
my-ssb-r         AGGREGATING   NOT-AVAILABLE
my-ssb-r         DONE          NON-COMPLIANT
my-ssb-r         DONE          NON-COMPLIANT
$ oc get scan
NAME                       PHASE   RESULT
ocp4-pci-dss               DONE    NON-COMPLIANT
ocp4-pci-dss-node-master   DONE    COMPLIANT
ocp4-pci-dss-node-worker   DONE    COMPLIANT

Comment 10 errata-xmlrpc 2022-11-02 16:00:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Compliance Operator bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6657


Note You need to log in before you can comment on or make changes to this bug.