Created attachment 1680531 [details] LogsForPodWithErrorConfiguration Description of problem: Wrong NodeStatus reports in file-integrity scan when configuration error in aide.conf file Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-04-18-184707 How reproducible: always Steps to reproduce: 1. install operator 2. apply config with wrong format $ cp file-integrity-operator/aide.conf.rhel8 file-integrity-operator/aide.conf.rhel8.err 3.update content in aide.conf.rhel8.err as below: change "NORMAL = p+i+n+u+g+s+m+c+acl+selinux+xattrs+sha512+md5" to "NORMAL = p+i+n+u+g+s+m+c+acl+selinux+xattrs+sha512+md5+XXXXXX" $ oc create configmap myconferr --from-file=aide-conf=file-integrity-operator/aide.conf.rhel8.err configmap/myconferr created [xiyuan@MiWiFi-R3G-srv securitycompliance]$ oc apply -f - <<EOF apiVersion: file-integrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: example-fileintegrity namespace: openshift-file-integrity spec: config: name: myconferr namespace: openshift-file-integrity key: aide-conf nodeSelector: node.openshift.io/os_id: rhcos EOF fileintegrity.file-integrity.openshift.io/example-fileintegrity configured Actual results: The aide-check pods are in `Running` state even if there is there is "configuration error" in init container aide-ds-init, and also the file-integrity scan status of Nodes shows `Succeed`. $ oc get pod NAME READY STATUS RESTARTS AGE pod/aide-ds-example-fileintegrity-2fmsz 2/2 Running 0 64s pod/aide-ds-example-fileintegrity-4crt4 2/2 Running 0 64s pod/aide-ds-example-fileintegrity-5pg68 2/2 Running 0 64s pod/aide-ds-example-fileintegrity-6v29c 2/2 Running 0 64s pod/aide-ds-example-fileintegrity-gw5r7 2/2 Running 0 64s pod/aide-ds-example-fileintegrity-p5mqw 2/2 Running 0 64s pod/file-integrity-operator-6db7d44557-r9w89 1/1 Running 0 57m [xiyuan@MiWiFi-R3G-srv securitycompliance]$ oc logs pod/aide-ds-example-fileintegrity-4crt4 -c aide-ds-init reinitializing AIDE db 68:Error in expression:XXX Configuration error [xiyuan@MiWiFi-R3G-srv securitycompliance]$ oc logs pod/aide-ds-example-fileintegrity-4crt4 -c aide running AIDE check.. 68:Error in expression:XXX Configuration error ... [xiyuan@MiWiFi-R3G-srv securitycompliance]$ oc get pod/aide-ds-example-fileintegrity-4crt4 -o yaml ... initContainerStatuses: - containerID: cri-o://6527f1a04ff9fadbf718289a8531c29b4b24258c59c78090340ccb68d0c4a7eb image: quay.io/file-integrity-operator/aide:latest imageID: quay.io/file-integrity-operator/aide@sha256:97cd05fbbd81349f11666675e545bfcf1cc2e735852bdf6202522f25cf53e01e lastState: {} name: aide-ds-init ready: true restartCount: 0 state: terminated: containerID: cri-o://6527f1a04ff9fadbf718289a8531c29b4b24258c59c78090340ccb68d0c4a7eb exitCode: 0 <<------- finishedAt: "2020-04-21T09:58:23Z" reason: Completed <<------- startedAt: "2020-04-21T09:58:22Z" phase: Running <<------- podIP: 10.128.0.69 podIPs: - ip: 10.128.0.69 qosClass: BestEffort startTime: "2020-04-21T09:58:19Z" $ oc get fileintegrities.file-integrity.openshift.io example-fileintegrity -o yaml ... spec: config: key: aide-conf name: myconferr namespace: openshift-file-integrity nodeSelector: node.openshift.io/os_id: rhcos status: nodeStatus: - condition: Succeeded lastProbeTime: "2020-04-21T10:03:51Z" nodeName: ip-10-0-52-155.us-east-2.compute.internal - condition: Succeeded lastProbeTime: "2020-04-21T10:03:51Z" nodeName: ip-10-0-71-184.us-east-2.compute.internal - condition: Succeeded lastProbeTime: "2020-04-21T10:03:51Z" nodeName: ip-10-0-49-177.us-east-2.compute.internal - condition: Succeeded lastProbeTime: "2020-04-21T10:03:51Z" nodeName: ip-10-0-53-110.us-east-2.compute.internal - condition: Succeeded lastProbeTime: "2020-04-21T10:03:52Z" nodeName: ip-10-0-76-227.us-east-2.compute.internal - condition: Succeeded lastProbeTime: "2020-04-21T10:03:52Z" nodeName: ip-10-0-63-179.us-east-2.compute.internal phase: Active Expected results: The aide-check pods should be in `init` state instead of `Running` as there is "configuration error" in init container aide-ds-init. Also The file-integrity scan status of Nodes should be `Error` instead of `Succeed`. Additional info: logs seen from attachment.
I started on the bug finally today. There are several issues: - the deamon (running in the DS pods) deadlocks if the config is wrong. This is addressed with https://github.com/openshift/file-integrity-operator/pull/108 - we don't surface the error to the admin. I'm going to work on this part of the problem next.
(In reply to Jakub Hrozek from comment #6) > I started on the bug finally today. There are several issues: > - the deamon (running in the DS pods) deadlocks if the config is wrong. > This is addressed with > https://github.com/openshift/file-integrity-operator/pull/108 > - we don't surface the error to the admin. I'm going to work on this part > of the problem next. Update: The PR was updated so that the problem now surfaces to the nodeintegritystatus objects, workign on surfacing the problem all the way up to the FI object.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633