Bug 1866244

Summary: The complianceremediations go in Error state when created from a ComplianceScan, not a ComplianceSuite
Product: OpenShift Container Platform Reporter: xiyuan
Component: Compliance OperatorAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED ERRATA QA Contact: Prashant Dhamdhere <pdhamdhe>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: josorior, mrogers, nkinder, xiyuan
Target Milestone: ---Flags: xiyuan: needinfo-
xiyuan: needinfo-
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xiyuan 2020-08-05 07:55:26 UTC
Description of problem 
The complianceremediations go in Error state when there is 
configuration differences in the same role nodes

Version-Release -Cluster version 
4.6.0-0.nightly-2020-08-04-002217

Reproduce
always

Reproduce step
1. install compliance operator:
 1.1 clone compliance-operator git repo
 $ git clone https://github.com/openshift/compliance-operator.git
 1.2 Create 'openshift-compliance' namespace
 $ oc create -f compliance-operator/deploy/ns.yaml  
 1.3 Switch to 'openshift-compliance' namespace
 $ oc project openshift-compliance
 1.4 Deploy CustomResourceDefinition.
 $ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done
 1.5 Deploy compliance-operator.
 $ oc create -f compliance-operator/deploy/

2. create /etc/securetty file on one of the worker node to trigger difference between node:
 $ oc debug node/ip-10-0-158-85.us-east-2.compute.internal
 Starting pod/ip-10-0-158-85us-east-2computeinternal-debug ...
 To use host binaries, run `chroot /host`
 Pod IP: 10.0.158.85
 If you don't see a command prompt, try pressing enter.
 sh-4.2# chroot /host
 sh-4.4# cd etc/
 sh-4.4# ls  | grep securetty
 sh-4.4# touch securetty
 sh-4.4# exit
 exit
 sh-4.2# exit
 exit
 Removing debug pod ...

3. $ oc create -f - <<EOF 
 apiVersion: compliance.openshift.io/v1alpha1
 kind: ComplianceScan
 metadata:
   name: example-compliancescan3
 spec:
   profile: xccdf_org.ssgproject.content_profile_moderate
   content: ssg-rhcos4-ds.xml               
   debug: true
   nodeSelector:
     node-role.kubernetes.io/worker: ""
 EOF
 compliancescan.compliance.openshift.io "example-compliancescan3" created

Actual result
The complianceremediations go in Error state when there is 
configuration differences in the same role nodes and the result of compliancescans shows INCONSISTENT


$ oc get compliancescans
NAME                      PHASE   RESULT
example-compliancescan3   DONE    INCONSISTENT
$ oc get compliancecheckresults --selector compliance.openshift.io/inconsistent-check
NAME                                            STATUS         SEVERITY
example-compliancescan3-no-direct-root-logins   INCONSISTENT   medium
$ oc get complianceremediations
NAME                                                                          STATE
example-compliancescan3-audit-rules-dac-modification-chmod                    Error
example-compliancescan3-audit-rules-dac-modification-chown                    Error
example-compliancescan3-audit-rules-dac-modification-fchmod                   Error
example-compliancescan3-audit-rules-dac-modification-fchmodat                 Error
example-compliancescan3-audit-rules-dac-modification-fchown                   Error
example-compliancescan3-audit-rules-dac-modification-fchownat                 Error
example-compliancescan3-audit-rules-dac-modification-fremovexattr             Error
...

Expected result
The STATE of complianceremediations should show Non-Applied while the result of compliancescans is INCONSISTENT

Comment 1 Juan Antonio Osorio 2020-08-05 08:19:13 UTC
This is probably fixed by a PR I did yesterday: https://github.com/openshift/compliance-operator/commit/cdb91ba0aa62a044ce66a2d1f35f9e4557088954

This would set the default initial state to NotApplied for remediations. Would this be sufficient in your opinion?

Comment 2 Juan Antonio Osorio 2020-08-05 08:19:22 UTC
This is probably fixed by a PR I did yesterday: https://github.com/openshift/compliance-operator/commit/cdb91ba0aa62a044ce66a2d1f35f9e4557088954

This would set the default initial state to NotApplied for remediations. Would this be sufficient in your opinion?

Comment 3 Jakub Hrozek 2020-08-05 09:45:04 UTC
(In reply to Juan Antonio Osorio from comment #2)
> This is probably fixed by a PR I did yesterday:
> https://github.com/openshift/compliance-operator/commit/
> cdb91ba0aa62a044ce66a2d1f35f9e4557088954
> 
> This would set the default initial state to NotApplied for remediations.
> Would this be sufficient in your opinion?

According to the logs, the remediation controller is complaining about labels missing from the remediation object. How does the default state help with that?

Comment 4 Juan Antonio Osorio 2020-08-05 10:42:08 UTC
(In reply to Jakub Hrozek from comment #3)
> (In reply to Juan Antonio Osorio from comment #2)
> > This is probably fixed by a PR I did yesterday:
> > https://github.com/openshift/compliance-operator/commit/
> > cdb91ba0aa62a044ce66a2d1f35f9e4557088954
> > 
> > This would set the default initial state to NotApplied for remediations.
> > Would this be sufficient in your opinion?
> 
> According to the logs, the remediation controller is complaining about
> labels missing from the remediation object. How does the default state help
> with that?

I had misunderstood the issue. The new default value won't help.

Comment 5 Jakub Hrozek 2020-08-12 13:09:28 UTC
I'm going to look at this one.

Comment 6 Jakub Hrozek 2020-08-12 15:28:53 UTC
I think I know what's going on but I'm not sure if we'd fix the issue.

The root cause is that you are using a Scan, not a Suite. The issue has nothing to do with the inconsistent result, it would have happened even with regular scan when all nodes are the same. And because you are using a scan without a suite, the scan doesn't have a Suite label, which we use to construct the composite MC name. This leads to:

  status:
    applicationState: Error
    errorMessage: could not construct MC name, check if it has the correct labels

So I can see three options:
1) support Scans without Suites and construct the MC name from what we have. I don't like this because in general, the scans shouldn't be used on their own, I would much rather keep their usage restricted.
2) Don't create remediation objects for scans without a suite.
3) Do nothing or at most document the limitation.

My preference is 2). Ozz, Matt, what do you think?

Comment 7 Juan Antonio Osorio 2020-08-17 05:36:30 UTC
I think a scan should *ideally* work out of the box, since that would reduce the tech debt and it's what folks would expect (principle of least surprise). However, We are indeed telling folks to stay away from bare-scans and using suites instead. So for this release I would say: Go for what's the easiest.

We are gonna document that folks should use suites only (not scans). So, if it's the easiest to skip creating remediations when a result comes from a scan without a suite, let's do that.

We don't want to introduce too many changes now that feature freeze is in place.

Comment 8 Jakub Hrozek 2020-08-17 18:54:21 UTC
WIP PR: https://github.com/openshift/compliance-operator/pull/403

Comment 12 Prashant Dhamdhere 2020-08-27 05:38:35 UTC
Now, The state of complianceremediations shows Non-Applied in compliancescan.

Verified on: 
OCP 4.6.0-0.nightly-2020-08-27-005538
compliance-operator.v0.1.13

The below results noticed when there were configuration differences in the same role nodes

$ oc get pods -w
NAME                                                    READY   STATUS      RESTARTS   AGE
aggregator-pod-example-compliancescan3                  0/1     Completed   0          62s
compliance-operator-869646dd4f-5vq7z                    1/1     Running     0          34m
ocp4-pp-7f89f556cc-zzmkj                                1/1     Running     0          33m
openscap-pod-2f626fd3ddb8168e3c1c510b4c0d519d61e16862   0/2     Completed   0          4m23s
openscap-pod-5987d061fa159773b69e6c4ea1df6c8b8317e8f8   0/2     Completed   0          4m23s
openscap-pod-c3902b5df863b3e5c2c655e19862f9116423fb50   0/2     Completed   0          4m23s
rhcos4-pp-7c44999587-bckrn                              1/1     Running     0          33m

$ oc get compliancescan
NAME                      PHASE   RESULT
example-compliancescan3   DONE    INCONSISTENT

$ oc get complianceremediations |head
NAME                                                                          STATE
example-compliancescan3-audit-rules-dac-modification-chmod                    NotApplied
example-compliancescan3-audit-rules-dac-modification-chown                    NotApplied
example-compliancescan3-audit-rules-dac-modification-fchmod                   NotApplied
example-compliancescan3-audit-rules-dac-modification-fchmodat                 NotApplied
example-compliancescan3-audit-rules-dac-modification-fchown                   NotApplied
example-compliancescan3-audit-rules-dac-modification-fchownat                 NotApplied
example-compliancescan3-audit-rules-dac-modification-fremovexattr             NotApplied
example-compliancescan3-audit-rules-dac-modification-fsetxattr                NotApplied
example-compliancescan3-audit-rules-dac-modification-lchown                   NotApplied

The below results noticed when the configuration was idle on the same role nodes

$ oc get pods -w
NAME                                                    READY   STATUS      RESTARTS   AGE
aggregator-pod-example-compliancescan3                  0/1     Completed   0          10m
compliance-operator-869646dd4f-5vq7z                    1/1     Running     0          55m
ocp4-pp-7f89f556cc-zzmkj                                1/1     Running     0          54m
openscap-pod-2f626fd3ddb8168e3c1c510b4c0d519d61e16862   0/2     Completed   0          12m
openscap-pod-5987d061fa159773b69e6c4ea1df6c8b8317e8f8   0/2     Completed   0          12m
openscap-pod-c3902b5df863b3e5c2c655e19862f9116423fb50   0/2     Completed   0          12m
rhcos4-pp-7c44999587-bckrn                              1/1     Running     0          54m

$ oc get compliancescan
NAME                      PHASE   RESULT
example-compliancescan3   DONE    NON-COMPLIANT

$ oc get complianceremediations |tail
example-compliancescan3-sysctl-net-ipv4-conf-default-rp-filter                NotApplied
example-compliancescan3-sysctl-net-ipv4-conf-default-secure-redirects         NotApplied
example-compliancescan3-sysctl-net-ipv4-conf-default-send-redirects           NotApplied
example-compliancescan3-sysctl-net-ipv4-icmp-echo-ignore-broadcasts           NotApplied
example-compliancescan3-sysctl-net-ipv4-icmp-ignore-bogus-error-responses     NotApplied
example-compliancescan3-sysctl-net-ipv4-tcp-syncookies                        NotApplied
example-compliancescan3-sysctl-net-ipv6-conf-all-accept-source-route          NotApplied
example-compliancescan3-sysctl-net-ipv6-conf-default-accept-ra                NotApplied
example-compliancescan3-sysctl-net-ipv6-conf-default-accept-redirects         NotApplied
example-compliancescan3-sysctl-net-ipv6-conf-default-accept-source-route      NotApplied

Comment 14 errata-xmlrpc 2020-10-27 16:24:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196