Description of problem The complianceremediations go in Error state when there is configuration differences in the same role nodes Version-Release -Cluster version 4.6.0-0.nightly-2020-08-04-002217 Reproduce always Reproduce step 1. install compliance operator: 1.1 clone compliance-operator git repo $ git clone https://github.com/openshift/compliance-operator.git 1.2 Create 'openshift-compliance' namespace $ oc create -f compliance-operator/deploy/ns.yaml 1.3 Switch to 'openshift-compliance' namespace $ oc project openshift-compliance 1.4 Deploy CustomResourceDefinition. $ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done 1.5 Deploy compliance-operator. $ oc create -f compliance-operator/deploy/ 2. create /etc/securetty file on one of the worker node to trigger difference between node: $ oc debug node/ip-10-0-158-85.us-east-2.compute.internal Starting pod/ip-10-0-158-85us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.158.85 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cd etc/ sh-4.4# ls | grep securetty sh-4.4# touch securetty sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... 3. $ oc create -f - <<EOF apiVersion: compliance.openshift.io/v1alpha1 kind: ComplianceScan metadata: name: example-compliancescan3 spec: profile: xccdf_org.ssgproject.content_profile_moderate content: ssg-rhcos4-ds.xml debug: true nodeSelector: node-role.kubernetes.io/worker: "" EOF compliancescan.compliance.openshift.io "example-compliancescan3" created Actual result The complianceremediations go in Error state when there is configuration differences in the same role nodes and the result of compliancescans shows INCONSISTENT $ oc get compliancescans NAME PHASE RESULT example-compliancescan3 DONE INCONSISTENT $ oc get compliancecheckresults --selector compliance.openshift.io/inconsistent-check NAME STATUS SEVERITY example-compliancescan3-no-direct-root-logins INCONSISTENT medium $ oc get complianceremediations NAME STATE example-compliancescan3-audit-rules-dac-modification-chmod Error example-compliancescan3-audit-rules-dac-modification-chown Error example-compliancescan3-audit-rules-dac-modification-fchmod Error example-compliancescan3-audit-rules-dac-modification-fchmodat Error example-compliancescan3-audit-rules-dac-modification-fchown Error example-compliancescan3-audit-rules-dac-modification-fchownat Error example-compliancescan3-audit-rules-dac-modification-fremovexattr Error ... Expected result The STATE of complianceremediations should show Non-Applied while the result of compliancescans is INCONSISTENT
This is probably fixed by a PR I did yesterday: https://github.com/openshift/compliance-operator/commit/cdb91ba0aa62a044ce66a2d1f35f9e4557088954 This would set the default initial state to NotApplied for remediations. Would this be sufficient in your opinion?
(In reply to Juan Antonio Osorio from comment #2) > This is probably fixed by a PR I did yesterday: > https://github.com/openshift/compliance-operator/commit/ > cdb91ba0aa62a044ce66a2d1f35f9e4557088954 > > This would set the default initial state to NotApplied for remediations. > Would this be sufficient in your opinion? According to the logs, the remediation controller is complaining about labels missing from the remediation object. How does the default state help with that?
(In reply to Jakub Hrozek from comment #3) > (In reply to Juan Antonio Osorio from comment #2) > > This is probably fixed by a PR I did yesterday: > > https://github.com/openshift/compliance-operator/commit/ > > cdb91ba0aa62a044ce66a2d1f35f9e4557088954 > > > > This would set the default initial state to NotApplied for remediations. > > Would this be sufficient in your opinion? > > According to the logs, the remediation controller is complaining about > labels missing from the remediation object. How does the default state help > with that? I had misunderstood the issue. The new default value won't help.
I'm going to look at this one.
I think I know what's going on but I'm not sure if we'd fix the issue. The root cause is that you are using a Scan, not a Suite. The issue has nothing to do with the inconsistent result, it would have happened even with regular scan when all nodes are the same. And because you are using a scan without a suite, the scan doesn't have a Suite label, which we use to construct the composite MC name. This leads to: status: applicationState: Error errorMessage: could not construct MC name, check if it has the correct labels So I can see three options: 1) support Scans without Suites and construct the MC name from what we have. I don't like this because in general, the scans shouldn't be used on their own, I would much rather keep their usage restricted. 2) Don't create remediation objects for scans without a suite. 3) Do nothing or at most document the limitation. My preference is 2). Ozz, Matt, what do you think?
I think a scan should *ideally* work out of the box, since that would reduce the tech debt and it's what folks would expect (principle of least surprise). However, We are indeed telling folks to stay away from bare-scans and using suites instead. So for this release I would say: Go for what's the easiest. We are gonna document that folks should use suites only (not scans). So, if it's the easiest to skip creating remediations when a result comes from a scan without a suite, let's do that. We don't want to introduce too many changes now that feature freeze is in place.
WIP PR: https://github.com/openshift/compliance-operator/pull/403
Merged as https://github.com/openshift/compliance-operator/commit/4cb233633aa4eb01f751e45fd8a169e8b0a3eea7
Now, The state of complianceremediations shows Non-Applied in compliancescan. Verified on: OCP 4.6.0-0.nightly-2020-08-27-005538 compliance-operator.v0.1.13 The below results noticed when there were configuration differences in the same role nodes $ oc get pods -w NAME READY STATUS RESTARTS AGE aggregator-pod-example-compliancescan3 0/1 Completed 0 62s compliance-operator-869646dd4f-5vq7z 1/1 Running 0 34m ocp4-pp-7f89f556cc-zzmkj 1/1 Running 0 33m openscap-pod-2f626fd3ddb8168e3c1c510b4c0d519d61e16862 0/2 Completed 0 4m23s openscap-pod-5987d061fa159773b69e6c4ea1df6c8b8317e8f8 0/2 Completed 0 4m23s openscap-pod-c3902b5df863b3e5c2c655e19862f9116423fb50 0/2 Completed 0 4m23s rhcos4-pp-7c44999587-bckrn 1/1 Running 0 33m $ oc get compliancescan NAME PHASE RESULT example-compliancescan3 DONE INCONSISTENT $ oc get complianceremediations |head NAME STATE example-compliancescan3-audit-rules-dac-modification-chmod NotApplied example-compliancescan3-audit-rules-dac-modification-chown NotApplied example-compliancescan3-audit-rules-dac-modification-fchmod NotApplied example-compliancescan3-audit-rules-dac-modification-fchmodat NotApplied example-compliancescan3-audit-rules-dac-modification-fchown NotApplied example-compliancescan3-audit-rules-dac-modification-fchownat NotApplied example-compliancescan3-audit-rules-dac-modification-fremovexattr NotApplied example-compliancescan3-audit-rules-dac-modification-fsetxattr NotApplied example-compliancescan3-audit-rules-dac-modification-lchown NotApplied The below results noticed when the configuration was idle on the same role nodes $ oc get pods -w NAME READY STATUS RESTARTS AGE aggregator-pod-example-compliancescan3 0/1 Completed 0 10m compliance-operator-869646dd4f-5vq7z 1/1 Running 0 55m ocp4-pp-7f89f556cc-zzmkj 1/1 Running 0 54m openscap-pod-2f626fd3ddb8168e3c1c510b4c0d519d61e16862 0/2 Completed 0 12m openscap-pod-5987d061fa159773b69e6c4ea1df6c8b8317e8f8 0/2 Completed 0 12m openscap-pod-c3902b5df863b3e5c2c655e19862f9116423fb50 0/2 Completed 0 12m rhcos4-pp-7c44999587-bckrn 1/1 Running 0 54m $ oc get compliancescan NAME PHASE RESULT example-compliancescan3 DONE NON-COMPLIANT $ oc get complianceremediations |tail example-compliancescan3-sysctl-net-ipv4-conf-default-rp-filter NotApplied example-compliancescan3-sysctl-net-ipv4-conf-default-secure-redirects NotApplied example-compliancescan3-sysctl-net-ipv4-conf-default-send-redirects NotApplied example-compliancescan3-sysctl-net-ipv4-icmp-echo-ignore-broadcasts NotApplied example-compliancescan3-sysctl-net-ipv4-icmp-ignore-bogus-error-responses NotApplied example-compliancescan3-sysctl-net-ipv4-tcp-syncookies NotApplied example-compliancescan3-sysctl-net-ipv6-conf-all-accept-source-route NotApplied example-compliancescan3-sysctl-net-ipv6-conf-default-accept-ra NotApplied example-compliancescan3-sysctl-net-ipv6-conf-default-accept-redirects NotApplied example-compliancescan3-sysctl-net-ipv6-conf-default-accept-source-route NotApplied
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196