Description of problem: After RHOCP upgrade from v4.6.30 to v4.6.34, 2 compliance operator `rules` were deleted by `profile parser` Version-Release number of selected component (if applicable): RHOCP 4.6.34 complaince operator 0.1.35 How reproducible: Always Steps to Reproduce: I. Install compliance operator v0.1.35 on RHOCP 4.6.30 and upgrade to v4.6.34 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.30 True False 13m Cluster version is 4.6.30 $ oc get rules.compliance -n openshift-compliance | wc -l 750 $ oc get profilebundles.compliance NAME CONTENTIMAGE CONTENTFILE STATUS ocp4 registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:5058d9130943ddf3d8a0e64dae5f81bf3fd612767d708e4f52e6ff19d6accf8f ssg-ocp4-ds.xml VALID rhcos4 registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:5058d9130943ddf3d8a0e64dae5f81bf3fd612767d708e4f52e6ff19d6accf8f ssg-rhcos4-ds.xml VALID 2. After upgrade check the rules $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.34 True False 13m Cluster version is 4.6.34 $ oc get rules.compliance -n openshift-compliance | wc -l 748 $ oc get profilebundles.compliance NAME CONTENTIMAGE CONTENTFILE STATUS ocp4 registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:5058d9130943ddf3d8a0e64dae5f81bf3fd612767d708e4f52e6ff19d6accf8f ssg-ocp4-ds.xml INVALID rhcos4 registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:5058d9130943ddf3d8a0e64dae5f81bf3fd612767d708e4f52e6ff19d6accf8f ssg-rhcos4-ds.xml VALID $ oc describe profilebundles.compliance ocp4 Name: ocp4 Namespace: openshift-compliance Labels: <none> Annotations: <none> API Version: compliance.openshift.io/v1alpha1 Kind: ProfileBundle Metadata: Creation Timestamp: 2021-07-30T04:36:26Z Finalizers: profilebundle.finalizers.compliance.openshift.io Generation: 1 Managed Fields: API Version: compliance.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"profilebundle.finalizers.compliance.openshift.io": f:spec: .: f:contentFile: f:contentImage: f:status: .: f:conditions: f:dataStreamStatus: f:errorMessage: Manager: compliance-operator Operation: Update Time: 2021-07-30T05:58:02Z Resource Version: 292580 Self Link: /apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/profilebundles/ocp4 UID: e7674866-a3f9-4f97-af70-f970cce326c6 Spec: Content File: ssg-ocp4-ds.xml Content Image: registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:5058d9130943ddf3d8a0e64dae5f81bf3fd612767d708e4f52e6ff19d6accf8f Status: Conditions: Last Transition Time: 2021-07-30T05:58:02Z Message: Couldn't parse profile bundle Reason: Invalid Status: False Type: Ready Data Stream Status: INVALID Error Message: Operation cannot be fulfilled on profiles.compliance.openshift.io "ocp4-cis-node": the object has been modified; please apply your changes to the latest version and try again Events: <none> $ oc logs rhcos4-openshift-compliance-pp-xxxx -c profileparser ~~~ {"level":"info","ts":1627626752.1480162,"logger":"profileparser","msg":"Deleting object no longer used by the current profileBundle","kind":"Rule","name":"rhcos4-auditd-data-disk-full-action"} {"level":"info","ts":1627626752.358363,"logger":"profileparser","msg":"Deleting object no longer used by the current profileBundle","kind":"Rule","name":"rhcos4-network-nmcli-permissions"} ~~~ [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep rhcos4-network-nmcli-permissions [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep rhcos4-auditd-data-disk-full-action Actual results: - The number of rules belonging to a profile change after an upgrade. - rules are deleted Expected results: - The number of rules belonging to a profile does not change after an upgrade - rules are not deleted Additional info:
Thank you for the detailed bug report. (In reply to Sayali Bhavsar from comment #0) > [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep > rhcos4-network-nmcli-permissions I suspect that this rule got removed because at one point we removed the NCP profile, which was shipped by accident and the NCP profile was the only one using this rule. > [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep > rhcos4-auditd-data-disk-full-action Here I'm not sure. I do see the rule being used in the rhcos4-moderate profile, so I need to check things locally. However, as a general note, the rhcos4-moderate profile which is the only OCP-related profile using this rule is not production-ready yet. Also... > > > Actual results: > - The number of rules belonging to a profile change after an upgrade. > - rules are deleted > > > Expected results: > - The number of rules belonging to a profile does not change after an > upgrade > - rules are not deleted ...this expectation doesn't have to be necessarily true. Rules can become obsolete and get removed. I tend to agree that it should be documented, though, to avoid surprises.
Here seems to be another issue: Status: Conditions: Last Transition Time: 2021-07-30T05:58:02Z Message: Couldn't parse profile bundle Reason: Invalid Status: False Type: Ready Data Stream Status: INVALID Error Message: Operation cannot be fulfilled on profiles.compliance.openshift.io "ocp4-cis-node": the object has been modified; please apply your changes to the latest version and try again Events: <none> We shouldn't probably mark the profile bundle as invalid, but just retry..
(In reply to Jakub Hrozek from comment #5) > Here seems to be another issue: > > Status: > Conditions: > Last Transition Time: 2021-07-30T05:58:02Z > Message: Couldn't parse profile bundle > Reason: Invalid > Status: False > Type: Ready > Data Stream Status: INVALID > Error Message: Operation cannot be fulfilled on > profiles.compliance.openshift.io "ocp4-cis-node": the object has been > modified; please apply your changes to the latest version and try again > Events: <none> > > We shouldn't probably mark the profile bundle as invalid, but just retry.. No manual changes were made to the profile bundle at any given point of time. Neither before, during nor after upgrade operation. Post upgrade, it has become invalid, not sure why.
btw just to set the severity right: the fact that the Rule object disappeared is very annoying but does not mean that the check itself is gone, especially if the whole datastream and not a subset (via tailoring) is used. Looking at the CO codebase, the only place where the Rule objects are actually used is when a tailoring profile is created. Existing tailored profiles wouldn't be affected because the configmap they produce already contains an XCCDF reference, "only" creating new tailored profiles. So there seems to be a bug, but it's not like we're sudddendly not performing that check.
Sayali was kind enough to have reproduced the bug. In his environment, I saw this in the profileparser logs: {"level":"info","ts":1627626705.0728796,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"rhcos4-auditd-data-disk-full-action"}} {"level":"error","ts":1627626705.2833045,"logger":"profileparser","msg":"couldn't execute action for rule","error":"Get \"https://172.30.0.1:443/apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/rules/rhcos4-auditd-data-disk-full-action\": dial tcp 172.30.0.1:443: connect: connection refused"} --> here {"level":"info","ts":1627626705.2833915,"logger":"profileparser","msg":"Found rule","id":"xccdf_org.ssgproject.content_rule_network_nmcli_permissions"} {"level":"info","ts":1627626705.3059528,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"rhcos4-network-nmcli-permissions"}} {"level":"error","ts":1627626705.318621,"logger":"profileparser","msg":"couldn't execute action for rule","error":"Get \"https://172.30.0.1:443/apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/rules/rhcos4-network-nmcli-permissions\": dial tcp 172.30.0.1:443: connect: connection refused"} --> and here {"level":"info","ts":1627626705.3187175,"logger":"profileparser","msg":"Found rule","id":"xccdf_org.ssgproject.content_rule_sysctl_net_ipv4_conf_default_secure_redirects"} So it appears we're not retrying errors in the profileparser at all, which sort of makes sense, it's not a controller loop. But this also means that any errors cause the rules to not be flagged with the new digest annotation and are subsequently removed.
Oh this one is even better: {"level":"info","ts":1627624679.4095025,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"ocp4-api-server-client-ca"}} {"level":"error","ts":1627624682.5629635,"logger":"profileparser","msg":"couldn't execute action","error":"Operation cannot be fulfilled on profiles.compliance.openshift.io \"ocp4-cis-node\": the object has been modified; please apply you --> here we get a conflict {"level":"info","ts":1627624682.563104,"logger":"profileparser","msg":"Checking for unused object","kind":"Profile","owner":"ocp4","namespace":"openshift-compliance"} {"level":"info","ts":1627624682.6403625,"logger":"profileparser","msg":"Deleting object no longer used by the current profileBundle","kind":"Profile","name":"ocp4-cis"} --> and we just nuke the whole profile, oops
WIP patch: https://github.com/openshift/compliance-operator/pull/675 It's totally untested, but I wanted to throw something out there before I leave for 2 weeks. Maybe someone else can prettify the PR in the meantime or create their own version.
After a bit of discussion, here is a different approach: https://github.com/openshift/compliance-operator/pull/692
Apparently this BZ was not linked properly with the GH PR, but the PR was merged and the fix will be released in 0.1.40 upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Compliance Operator bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4530