Description of problem: Install file integrity operator, create fileintefrity CR, and check the metrics by CLI: $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://metrics.openshift-file-integrity.svc:8585/metrics-fio' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 8510 0 8510 0 0 169k 0 --:--:-- --:--:-- --:--:-- 173k # HELP file_integrity_operator_daemonset_update_total The total number of updates to the FileIntegrity AIDE daemonSet # TYPE file_integrity_operator_daemonset_update_total counter file_integrity_operator_daemonset_update_total{operation="update"} 1 # HELP file_integrity_operator_node_failed A gauge that is set to 1 when a node has unresolved integrity failures, and 0 when it is healthy # TYPE file_integrity_operator_node_failed gauge file_integrity_operator_node_failed{node="ip-10-0-143-254.us-east-2.compute.internal"} 1 file_integrity_operator_node_failed{node="ip-10-0-156-222.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-165-43.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-173-214.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-203-71.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-216-210.us-east-2.compute.internal"} 0 # HELP file_integrity_operator_node_status_total The total number of FileIntegrityNodeStatus transitions, per condition and node # TYPE file_integrity_operator_node_status_total counter file_integrity_operator_node_status_total{condition="Failed",node="ip-10-0-143-254.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-143-254.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-156-222.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-165-43.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-173-214.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-203-71.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-216-210.us-east-2.compute.internal"} 1 # HELP file_integrity_operator_phase_total The total number of transitions to the FileIntegrity phase # TYPE file_integrity_operator_phase_total counter file_integrity_operator_phase_total{phase="Active"} 1 file_integrity_operator_phase_total{phase="Initializing"} 1 file_integrity_operator_phase_total{phase="Pending"} 1 … However, logging into the console, navigating to Oberve -> Metrics Query metric {__name=~"file_integrity"}, will get error “No datapoints found” Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-29-010334 + file-integrity-operator.v0.1.18 How reproducible: Always Steps to Reproduce: 1. Install file integrity operator 2. Create fileintegrity: oc apply -f - <<EOF apiVersion: fileintegrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: example-fileintegrity namespace: openshift-file-integrity spec: # Change to debug: true to enable more verbose logging from the logcollector # container in the aide pods debug: false config: gracePeriod: 15 EOF 3. Trigger one error on node ip-10-0-143-254.us-east-2.compute.internal: $ oc debug node/ip-10-0-143-254.us-east-2.compute.internal -- chroot /host mkdir /root/test Starting pod/ip-10-0-143-254us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Removing debug pod ... $ oc get fileintegritynodestatus NAME NODE STATUS example-fileintegrity-ip-10-0-165-43.us-east-2.compute.internal ip-10-0-165-43.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-156-222.us-east-2.compute.internal ip-10-0-156-222.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-216-210.us-east-2.compute.internal ip-10-0-216-210.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-173-214.us-east-2.compute.internal ip-10-0-173-214.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-143-254.us-east-2.compute.internal ip-10-0-143-254.us-east-2.compute.internal Failed 4. logging into the console, navigating to Oberve -> Metrics Run query {__name=~"file_integrity"} Actual results: Will get error “No datapoints found” Expected results: 1. The metrics for file_integrity displayed on GUI Additional info: oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0|grep file-integrity ts=2021-08-30T10:02:03.053Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-file-integrity\"" ts=2021-08-30T10:02:03.063Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:446: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-file-integrity\"" ts=2021-08-30T10:02:03.064Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:445: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-file-integrity\"" ts=2021-08-30T10:02:03.900Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-file-integrity\""
Accepting this bug, it's the same situation as https://bugzilla.redhat.com/show_bug.cgi?id=1999374.
[Bug_verification] Looks good to me. The metrics is getting reported over GUI for file-integrity operator Verified on: 4.8.13-x86_64 + file-integrity-operator.v0.1.19 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.13 True False 10h Cluster version is 4.8.13 $ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.3-12 OpenShift Elasticsearch Operator 5.1.3-12 Succeeded file-integrity-operator.v0.1.19 File Integrity Operator 0.1.19 Succeeded $ oc get pods NAME READY STATUS RESTARTS AGE aide-example-fileintegrity-88tvc 1/1 Running 0 65m aide-example-fileintegrity-bpcmw 1/1 Running 0 65m aide-example-fileintegrity-phm98 1/1 Running 0 65m aide-example-fileintegrity-q4sns 1/1 Running 0 65m aide-example-fileintegrity-xjqjb 1/1 Running 0 65m aide-example-fileintegrity-xz2hv 1/1 Running 0 65m file-integrity-operator-65b844f87f-ffwvh 1/1 Running 1 68m $ oc get pod file-integrity-operator-65b844f87f-ffwvh -oyaml |grep -A3 "RELATED_IMAGE" - name: RELATED_IMAGE_OPERATOR value: registry.redhat.io/compliance/openshift-file-integrity-rhel8-operator@sha256:084f91dc5b5e8f43305b1bd25999578b694e180e986f8779ec1092251f4826ad - name: OPERATOR_CONDITION_NAME value: file-integrity-operator.v0.1.19 $ oc apply -f - <<EOF > apiVersion: fileintegrity.openshift.io/v1alpha1 > kind: FileIntegrity > metadata: > name: example-fileintegrity > namespace: openshift-file-integrity > spec: > # Change to debug: true to enable more verbose logging from the logcollector > # container in the aide pods > debug: false > config: > gracePeriod: 15 > EOF fileintegrity.fileintegrity.openshift.io/example-fileintegrity created $ oc debug node/ip-10-0-210-208.us-east-2.compute.internal -- chroot /host mkdir /root/test Starting pod/ip-10-0-210-208us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Removing debug pod ... $ oc get fileintegritynodestatus NAME NODE STATUS example-fileintegrity-ip-10-0-141-80.us-east-2.compute.internal ip-10-0-141-80.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-148-195.us-east-2.compute.internal ip-10-0-148-195.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-176-70.us-east-2.compute.internal ip-10-0-176-70.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-190-247.us-east-2.compute.internal ip-10-0-190-247.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-209-102.us-east-2.compute.internal ip-10-0-209-102.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-210-208.us-east-2.compute.internal ip-10-0-210-208.us-east-2.compute.internal Failed $ token=`oc -n openshift-monitoring sa get-token prometheus-k8s` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://metrics.openshift-file-integrity.svc:8585/metrics-fio' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 8514 0 8514 0 0 346k 0 --:--:-- --:--:-- --:--:-- 346k # HELP file_integrity_operator_daemonset_update_total The total number of updates to the FileIntegrity AIDE daemonSet # TYPE file_integrity_operator_daemonset_update_total counter file_integrity_operator_daemonset_update_total{operation="update"} 1 # HELP file_integrity_operator_node_failed A gauge that is set to 1 when a node has unresolved integrity failures, and 0 when it is healthy # TYPE file_integrity_operator_node_failed gauge file_integrity_operator_node_failed{node="ip-10-0-141-80.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-148-195.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-176-70.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-190-247.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-209-102.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-210-208.us-east-2.compute.internal"} 1 # HELP file_integrity_operator_node_status_total The total number of FileIntegrityNodeStatus transitions, per condition and node # TYPE file_integrity_operator_node_status_total counter file_integrity_operator_node_status_total{condition="Failed",node="ip-10-0-210-208.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-141-80.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-148-195.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-176-70.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-190-247.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-209-102.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-210-208.us-east-2.compute.internal"} 1 # HELP file_integrity_operator_phase_total The total number of transitions to the FileIntegrity phase # TYPE file_integrity_operator_phase_total counter file_integrity_operator_phase_total{phase="Active"} 1 file_integrity_operator_phase_total{phase="Initializing"} 1 file_integrity_operator_phase_total{phase="Pending"} 1 ..... ...... promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 217 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0 $ oc get project openshift-file-integrity --show-labels NAME DISPLAY NAME STATUS LABELS openshift-file-integrity Active kubernetes.io/metadata.name=openshift-file-integrity,olm.operatorgroup.uid/3101a112-3647-4b46-b2d4-fb010d153205=,olm.operatorgroup.uid/9d433093-5efd-44e3-86d8-0f6e88ee2d7a=,openshift.io/cluster-monitoring=true Attaching screenshot of metrics displayed over GUI
[Bug_verification] Looks good on OCP 4.9 as well. The metrics is getting reported over GUI for file-integrity operator Verified on: 4.9.0-0.nightly-2021-10-05-004711 + file-integrity-operator.v0.1.19 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-10-05-004711 True False 39m Cluster version is 4.9.0-0.nightly-2021-10-05-004711 $ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.2.2-21 OpenShift Elasticsearch Operator 5.2.2-21 Succeeded file-integrity-operator.v0.1.19 File Integrity Operator 0.1.19 Succeeded $ oc get pods NAME READY STATUS RESTARTS AGE file-integrity-operator-65b844f87f-9ngb5 1/1 Running 1 (8m37s ago) 9m4s $ oc get pod file-integrity-operator-65b844f87f-9ngb5 -oyaml |grep -A3 "RELATED_IMAGE" - name: RELATED_IMAGE_OPERATOR value: registry.redhat.io/compliance/openshift-file-integrity-rhel8-operator@sha256:084f91dc5b5e8f43305b1bd25999578b694e180e986f8779ec1092251f4826ad - name: OPERATOR_CONDITION_NAME value: file-integrity-operator.v0.1.19 $ oc apply -f - <<EOF > apiVersion: fileintegrity.openshift.io/v1alpha1 > kind: FileIntegrity > metadata: > name: example-fileintegrity > namespace: openshift-file-integrity > spec: > # Change to debug: true to enable more verbose logging from the logcollector > # container in the aide pods > debug: false > config: > gracePeriod: 15 > EOF fileintegrity.fileintegrity.openshift.io/example-fileintegrity created $ oc get pods -w NAME READY STATUS RESTARTS AGE aide-example-fileintegrity-6fbqd 1/1 Running 0 41s aide-example-fileintegrity-7l26g 1/1 Running 0 41s aide-example-fileintegrity-kj4mw 1/1 Running 0 41s aide-example-fileintegrity-m2rsg 1/1 Running 0 41s aide-example-fileintegrity-ncldz 1/1 Running 0 41s aide-example-fileintegrity-rt7ww 1/1 Running 0 41s file-integrity-operator-65b844f87f-9ngb5 1/1 Running 1 (10m ago) 10m $ oc get fileintegritynodestatus -w NAME NODE STATUS example-fileintegrity-ip-10-0-141-110.us-east-2.compute.internal ip-10-0-141-110.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-161-55.us-east-2.compute.internal ip-10-0-161-55.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-168-12.us-east-2.compute.internal ip-10-0-168-12.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-204-114.us-east-2.compute.internal ip-10-0-204-114.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-207-172.us-east-2.compute.internal ip-10-0-207-172.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-131-24.us-east-2.compute.internal ip-10-0-131-24.us-east-2.compute.internal Succeeded $ oc debug node/ip-10-0-207-172.us-east-2.compute.internal -- chroot /host mkdir /root/test Starting pod/ip-10-0-207-172us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Removing debug pod ... $ oc get fileintegritynodestatus NAME NODE STATUS example-fileintegrity-ip-10-0-131-24.us-east-2.compute.internal ip-10-0-131-24.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-141-110.us-east-2.compute.internal ip-10-0-141-110.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-161-55.us-east-2.compute.internal ip-10-0-161-55.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-168-12.us-east-2.compute.internal ip-10-0-168-12.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-204-114.us-east-2.compute.internal ip-10-0-204-114.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-207-172.us-east-2.compute.internal ip-10-0-207-172.us-east-2.compute.internal Failed $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://metrics.openshift-file-integrity.svc:8585/metrics-fio' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP file_integrity_operator_daemonset_update_total The total number of updates to the FileIntegrity AIDE daemonSet # TYPE file_integrity_operator_daemonset_update_total counter file_integrity_operator_daemonset_update_total{operation="update"} 1 # HELP file_integrity_operator_node_failed A gauge that is set to 1 when a node has unresolved integrity failures, and 0 when it is healthy # TYPE file_integrity_operator_node_failed gauge file_integrity_operator_node_failed{node="ip-10-0-131-24.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-141-110.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-161-55.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-168-12.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-204-114.us-east-2.compute.internal"} 0 file_integrity_operator_node_failed{node="ip-10-0-207-172.us-east-2.compute.internal"} 1 # HELP file_integrity_operator_node_status_total The total number of FileIntegrityNodeStatus transitions, per condition and node # TYPE file_integrity_operator_node_status_total counter file_integrity_operator_node_status_total{condition="Failed",node="ip-10-0-207-172.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-131-24.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-141-110.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-161-55.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-168-12.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-204-114.us-east-2.compute.internal"} 1 file_integrity_operator_node_status_total{condition="Succeeded",node="ip-10-0-207-172.us-east-2.compute.internal"} 1 # HELP file_integrity_operator_phase_total The total number of transitions to the FileIntegrity phase # TYPE file_integrity_operator_phase_total counter file_integrity_operator_phase_total{phase="Active"} 1 file_integrity_operator_phase_total{phase="Initializing"} 1 file_integrity_operator_phase_total{phase="Pending"} 1 # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.8841e-05 ... ..... # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 54 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0 100 8504 0 8504 0 0 519k 0 --:--:-- --:--:-- --:--:-- 519k $ oc get project openshift-file-integrity --show-labels NAME DISPLAY NAME STATUS LABELS openshift-file-integrity Active kubernetes.io/metadata.name=openshift-file-integrity,olm.operatorgroup.uid/06b8e004-9fd1-4ad6-871f-5b39def1158c=,olm.operatorgroup.uid/dbb88c49-9d45-425d-aeb3-14836521c141=,openshift.io/cluster-monitoring=true $ oc get ClusterRole file-integrity-operator-metrics -oyaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: "2021-10-06T03:52:00Z" name: file-integrity-operator-metrics resourceVersion: "36342" uid: aeb0457e-3893-406c-a7c2-6993db30aa39 rules: - apiGroups: - "" resources: - pods - services - endpoints verbs: - get - list - watch - apiGroups: - "" resources: - configmaps verbs: - get - nonResourceURLs: - /metrics - /metrics-fio verbs: - get $ oc get ClusterRoleBinding file-integrity-operator-metrics -oyaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: "2021-10-06T03:52:00Z" name: file-integrity-operator-metrics resourceVersion: "36345" uid: e6ada866-1903-4e17-82fe-d89716b980d5 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: file-integrity-operator-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring Attaching screenshot of metrics displayed over GUI
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (File Integrity Operator version 0.1.21 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4631