Bug 1988259
Summary: | Rules missing from compliance operator after upgrade from 4.6.30 to 4.6.34 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sayali Bhavsar <sbhavsar> |
Component: | Compliance Operator | Assignee: | Jakub Hrozek <jhrozek> |
Status: | CLOSED ERRATA | QA Contact: | Prashant Dhamdhere <pdhamdhe> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.6.z | CC: | jhrozek, josorior, mrogers, xiyuan |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When parsing an updated profile, each object parsed from the profile is updated with an annotation reflecting the new profile version. Any object missing this annotation after the whole profile is parsed is considered removed from the profile and is deleted. Because the profileparser used to ignore errors, the rules might have not been annotated and would be considered removed.
Consequence: On cluster upgrade, when a transient error occurs, an object such as a compliance rule that would be processed at the time would be considered removed from the profile and deleted.
Fix: The profileparser propagates errors up instead of ignoring them, which restarts the whole profile parsing operation and eventually processes the content correctly.
Result: No rules, variables or profiles are removed during CO upgrade even in case of transient errors e.g. when connecting to the API server fails.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-10 07:37:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sayali Bhavsar
2021-07-30 07:53:36 UTC
Thank you for the detailed bug report. (In reply to Sayali Bhavsar from comment #0) > [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep > rhcos4-network-nmcli-permissions I suspect that this rule got removed because at one point we removed the NCP profile, which was shipped by accident and the NCP profile was the only one using this rule. > [quicklab@upi-0 ~]$ oc get rules.compliance -n openshift-compliance | grep > rhcos4-auditd-data-disk-full-action Here I'm not sure. I do see the rule being used in the rhcos4-moderate profile, so I need to check things locally. However, as a general note, the rhcos4-moderate profile which is the only OCP-related profile using this rule is not production-ready yet. Also... > > > Actual results: > - The number of rules belonging to a profile change after an upgrade. > - rules are deleted > > > Expected results: > - The number of rules belonging to a profile does not change after an > upgrade > - rules are not deleted ...this expectation doesn't have to be necessarily true. Rules can become obsolete and get removed. I tend to agree that it should be documented, though, to avoid surprises. Here seems to be another issue: Status: Conditions: Last Transition Time: 2021-07-30T05:58:02Z Message: Couldn't parse profile bundle Reason: Invalid Status: False Type: Ready Data Stream Status: INVALID Error Message: Operation cannot be fulfilled on profiles.compliance.openshift.io "ocp4-cis-node": the object has been modified; please apply your changes to the latest version and try again Events: <none> We shouldn't probably mark the profile bundle as invalid, but just retry.. (In reply to Jakub Hrozek from comment #5) > Here seems to be another issue: > > Status: > Conditions: > Last Transition Time: 2021-07-30T05:58:02Z > Message: Couldn't parse profile bundle > Reason: Invalid > Status: False > Type: Ready > Data Stream Status: INVALID > Error Message: Operation cannot be fulfilled on > profiles.compliance.openshift.io "ocp4-cis-node": the object has been > modified; please apply your changes to the latest version and try again > Events: <none> > > We shouldn't probably mark the profile bundle as invalid, but just retry.. No manual changes were made to the profile bundle at any given point of time. Neither before, during nor after upgrade operation. Post upgrade, it has become invalid, not sure why. btw just to set the severity right: the fact that the Rule object disappeared is very annoying but does not mean that the check itself is gone, especially if the whole datastream and not a subset (via tailoring) is used. Looking at the CO codebase, the only place where the Rule objects are actually used is when a tailoring profile is created. Existing tailored profiles wouldn't be affected because the configmap they produce already contains an XCCDF reference, "only" creating new tailored profiles. So there seems to be a bug, but it's not like we're sudddendly not performing that check. Sayali was kind enough to have reproduced the bug. In his environment, I saw this in the profileparser logs: {"level":"info","ts":1627626705.0728796,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"rhcos4-auditd-data-disk-full-action"}} {"level":"error","ts":1627626705.2833045,"logger":"profileparser","msg":"couldn't execute action for rule","error":"Get \"https://172.30.0.1:443/apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/rules/rhcos4-auditd-data-disk-full-action\": dial tcp 172.30.0.1:443: connect: connection refused"} --> here {"level":"info","ts":1627626705.2833915,"logger":"profileparser","msg":"Found rule","id":"xccdf_org.ssgproject.content_rule_network_nmcli_permissions"} {"level":"info","ts":1627626705.3059528,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"rhcos4-network-nmcli-permissions"}} {"level":"error","ts":1627626705.318621,"logger":"profileparser","msg":"couldn't execute action for rule","error":"Get \"https://172.30.0.1:443/apis/compliance.openshift.io/v1alpha1/namespaces/openshift-compliance/rules/rhcos4-network-nmcli-permissions\": dial tcp 172.30.0.1:443: connect: connection refused"} --> and here {"level":"info","ts":1627626705.3187175,"logger":"profileparser","msg":"Found rule","id":"xccdf_org.ssgproject.content_rule_sysctl_net_ipv4_conf_default_secure_redirects"} So it appears we're not retrying errors in the profileparser at all, which sort of makes sense, it's not a controller loop. But this also means that any errors cause the rules to not be flagged with the new digest annotation and are subsequently removed. Oh this one is even better: {"level":"info","ts":1627624679.4095025,"logger":"profileparser","msg":"Creating object","kind":"Rule","key":{"namespace":"openshift-compliance","name":"ocp4-api-server-client-ca"}} {"level":"error","ts":1627624682.5629635,"logger":"profileparser","msg":"couldn't execute action","error":"Operation cannot be fulfilled on profiles.compliance.openshift.io \"ocp4-cis-node\": the object has been modified; please apply you --> here we get a conflict {"level":"info","ts":1627624682.563104,"logger":"profileparser","msg":"Checking for unused object","kind":"Profile","owner":"ocp4","namespace":"openshift-compliance"} {"level":"info","ts":1627624682.6403625,"logger":"profileparser","msg":"Deleting object no longer used by the current profileBundle","kind":"Profile","name":"ocp4-cis"} --> and we just nuke the whole profile, oops WIP patch: https://github.com/openshift/compliance-operator/pull/675 It's totally untested, but I wanted to throw something out there before I leave for 2 weeks. Maybe someone else can prettify the PR in the meantime or create their own version. After a bit of discussion, here is a different approach: https://github.com/openshift/compliance-operator/pull/692 Apparently this BZ was not linked properly with the GH PR, but the PR was merged and the fix will be released in 0.1.40 upstream. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Compliance Operator bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4530 |