Bug 1999374

Summary: Metrics not available on GUI for Compliance Operator
Product: OpenShift Container Platform Reporter: xiyuan
Component: Compliance OperatorAssignee: Matt Rogers <mrogers>
Status: CLOSED ERRATA QA Contact: Prashant Dhamdhere <pdhamdhe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.9CC: jhrozek, josorior, mrogers, xiyuan
Target Milestone: ---Flags: xiyuan: needinfo-
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-10 07:37:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xiyuan 2021-08-31 04:01:26 UTC
Description of problem:
Install Compliance operator, trigger a scan by scansettingbinding, and check the metrics by CLI:
$  oc run --rm -i --restart=Never --image=registry.fedoraproject.org/fedora-minimal:latest -n openshift-compliance test-metrics -- bash -c 'curl -ks -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://metrics.openshift-compliance.svc:8585/metrics-co' | grep compliance
# HELP compliance_operator_compliance_remediation_status_total A counter for the total number of updates to the status of a ComplianceRemediation
# TYPE compliance_operator_compliance_remediation_status_total counter
compliance_operator_compliance_remediation_status_total{name="ocp4-moderate-oauth-or-oauthclient-token-maxage",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-chronyd-or-ntpd-set-maxpoll",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-chronyd-or-ntpd-specify-multiple-servers",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-chronyd-or-ntpd-specify-remote-server",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-configure-usbguard-auditbackend",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-coreos-vsyscall-kernel-argument",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-service-usbguard-enabled",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-master-usbguard-allow-hid-and-hub",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-chronyd-or-ntpd-set-maxpoll",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-chronyd-or-ntpd-specify-multiple-servers",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-chronyd-or-ntpd-specify-remote-server",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-configure-usbguard-auditbackend",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-coreos-vsyscall-kernel-argument",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-service-usbguard-enabled",state="NotApplied"} 1
compliance_operator_compliance_remediation_status_total{name="rhcos4-moderate-worker-usbguard-allow-hid-and-hub",state="NotApplied"} 1

However, logging into the console, navigating to Oberve -> Metrics
Query metric compliance_operator_compliance_scan_status_total, will get error “No datapoints found”

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-29-010334 + compliance-operator.v0.1.39

How reproducible:
Always

Steps to Reproduce:
Install compliance operator
Trigger a scan with ssb:
$ oc create -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: rhcos4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
3. Login into the console, navigating to Oberve -> Metrics
Query metric compliance_operator_compliance_scan_status_total

Actual results:
Will get error “No datapoints found” on GUI

Expected results:
The metrics displayed on GUI

Additional info:
$ oc get namespace openshift-compliance --show-labels
NAME                   STATUS   AGE     LABELS
openshift-compliance   Active   3h30m   kubernetes.io/metadata.name=openshift-compliance,olm.operatorgroup.uid/331ab3eb-dfb4-47ef-8ecc-845e5d1a4d19=,olm.operatorgroup.uid/c04342d4-2aa4-4505-99ab-90cf3206e305=,openshift.io/cluster-monitoring=true
$ oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "compliance"
ts=2021-08-30T13:26:21.626Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:446: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:21.640Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:21.690Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:445: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:22.713Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:22.864Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:445: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:23.212Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:446: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:25.211Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:446: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:25.546Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-compliance\""
ts=2021-08-30T13:26:25.701Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:445: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-compliance\""

Comment 2 Matt Rogers 2021-09-03 16:04:03 UTC
Accepting and targeting 4.9 - Fix is ongoing https://github.com/openshift/compliance-operator/pull/694

Comment 11 Prashant Dhamdhere 2021-09-27 14:19:07 UTC
[Bug_verification]

Looks good to me. The metrics is getting reported over GUI for compliance operator

Verified on:

4.9.0-0.nightly-2021-09-25-094414 + compliance-operator.v0.1.41

$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES   PHASE
compliance-operator.v0.1.41       Compliance Operator                0.1.41                Succeeded
elasticsearch-operator.5.2.2-10   OpenShift Elasticsearch Operator   5.2.2-10              Succeeded

$ oc get pods 
NAME                                              READY   STATUS    RESTARTS      AGE
compliance-operator-656bb958f-kvzkk               1/1     Running   1 (16m ago)   16m
ocp4-openshift-compliance-pp-64dbd7c98f-8rcs5     1/1     Running   0             15m
rhcos4-openshift-compliance-pp-66575dc885-4sg4j   1/1     Running   0             15m


$ oc get pod compliance-operator-656bb958f-kvzkk -oyaml |grep -A3 "RELATED_IMAGE"
    - name: RELATED_IMAGE_OPENSCAP
      value: registry.redhat.io/compliance/openshift-compliance-openscap-rhel8@sha256:20656dd9b1e06a699f2294f4a9ac8e52606d9409d0ec75a055578e94117f4a5d
    - name: RELATED_IMAGE_OPERATOR
      value: registry.redhat.io/compliance/openshift-compliance-rhel8-operator@sha256:298e116c5840047f4c38ac9976a7016e52076b5398448fb977c83c1fae132d1d
    - name: RELATED_IMAGE_PROFILE
      value: registry.redhat.io/compliance/openshift-compliance-content-rhel8@sha256:b28ff0ae5ec3e8338ede1eea5379270f340e3b60d7bae509ebdf2b24f5289197
    - name: OPERATOR_CONDITION_NAME
      value: compliance-operator.v0.1.41

$ oc create -f -<<EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ScanSettingBinding
> metadata:
>   name: my-ssb-r
> profiles:
>   - name: ocp4-moderate
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
>   - name: rhcos4-moderate
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
> settingsRef:
>   name: default
>   kind: ScanSetting
>   apiGroup: compliance.openshift.io/v1alpha1
> EOF
scansettingbinding.compliance.openshift.io/my-ssb-r created


$ oc get suite 
NAME       PHASE   RESULT
my-ssb-r   DONE    NON-COMPLIANT


$ oc get scan
NAME                     PHASE   RESULT
ocp4-moderate            DONE    NON-COMPLIANT
rhcos4-moderate-master   DONE    NON-COMPLIANT
rhcos4-moderate-worker   DONE    NON-COMPLIANT


$ oc run --rm -i --restart=Never --image=registry.fedoraproject.org/fedora-minimal:latest -n openshift-compliance test-metrics -- bash -c 'curl -ks -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://metrics.openshift-compliance.svc:8585/metrics-co' | grep "compliance_operator_compliance_scan_status_total"
# HELP compliance_operator_compliance_scan_status_total A counter for the total number of updates to the status of a ComplianceScan
# TYPE compliance_operator_compliance_scan_status_total counter
compliance_operator_compliance_scan_status_total{name="ocp4-moderate",phase="AGGREGATING",result="NOT-AVAILABLE"} 4
compliance_operator_compliance_scan_status_total{name="ocp4-moderate",phase="DONE",result="NON-COMPLIANT"} 1
compliance_operator_compliance_scan_status_total{name="ocp4-moderate",phase="LAUNCHING",result="NOT-AVAILABLE"} 1
compliance_operator_compliance_scan_status_total{name="ocp4-moderate",phase="PENDING",result=""} 1
compliance_operator_compliance_scan_status_total{name="ocp4-moderate",phase="RUNNING",result="NOT-AVAILABLE"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-master",phase="AGGREGATING",result="NOT-AVAILABLE"} 5
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-master",phase="DONE",result="NON-COMPLIANT"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-master",phase="LAUNCHING",result="NOT-AVAILABLE"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-master",phase="PENDING",result=""} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-master",phase="RUNNING",result="NOT-AVAILABLE"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-worker",phase="AGGREGATING",result="NOT-AVAILABLE"} 4
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-worker",phase="DONE",result="NON-COMPLIANT"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-worker",phase="LAUNCHING",result="NOT-AVAILABLE"} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-worker",phase="PENDING",result=""} 1
compliance_operator_compliance_scan_status_total{name="rhcos4-moderate-worker",phase="RUNNING",result="NOT-AVAILABLE"} 1

$ oc get ns openshift-compliance --show-labels
NAME                   STATUS   AGE   LABELS
openshift-compliance   Active   58m   kubernetes.io/metadata.name=openshift-compliance,olm.operatorgroup.uid/0ffdfe52-fbeb-44dc-97fa-4ccae9230644=,olm.operatorgroup.uid/978d5930-fc16-44df-98a1-a984d0c42634=,openshift.io/cluster-monitoring=true


Attaching some screenshots of metrics displayed over GUI

Comment 18 errata-xmlrpc 2021-11-10 07:37:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Compliance Operator bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4530