Bug 2060726

Summary: Compliance operator does not generate alert notification for non-control namespace
Product: OpenShift Container Platform Reporter: Prashant Dhamdhere <pdhamdhe>
Component: Compliance OperatorAssignee: Matt Rogers <mrogers>
Status: CLOSED ERRATA QA Contact: xiyuan
Severity: low Docs Contact: Jeana Routh <jrouth>
Priority: low    
Version: 4.10CC: jhrozek, jrouth, lbragsta, mrogers, xiyuan
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
* Previously, the Compliance Operator hard-coded notifications to the default namespace. As a result, notifications from the Operator would not appear if the Operator was installed in a different namespace. This issue is fixed in this release. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2060726[*BZ#2060726*])
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-02 16:00:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prashant Dhamdhere 2022-03-04 06:15:17 UTC
Description of problem:
Compliance operator does not generate alert notification if the operator deploy in 
non-control namespace.

#  oc create -f - << EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ScanSettingBinding
> metadata:
>   name: moderate-test
> profiles:
>   - name: ocp4-moderate
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
> settingsRef:
>   name: default
>   kind: ScanSetting
>   apiGroup: compliance.openshift.io/v1alpha1
> EOF
scansettingbinding.compliance.openshift.io/moderate-test created

# oc get suite 
NAME            PHASE   RESULT
moderate-test   DONE    NON-COMPLIANT

# oc get pods
NAME                                         READY   STATUS      RESTARTS      AGE
aggregator-pod-ocp4-moderate                 0/1     Completed   0             71s
compliance-operator-6fb484b5cd-g244t         1/1     Running     1 (10m ago)   10m
ocp4-compliance-test-pp-75d888d7db-2wgnv     1/1     Running     0             9m43s
ocp4-moderate-api-checks-pod                 0/2     Completed   0             113s
rhcos4-compliance-test-pp-867b989956-p6snl   1/1     Running     0             9m43s

# oc get ccr -ncompliance-test |grep compliance-notification
ocp4-moderate-compliance-notification-enabled                           PASS     medium

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get route alertmanager-main -n openshift-monitoring
NAME                HOST/PORT                                                                         PATH   SERVICES            PORT   TERMINATION          WILDCARD
alertmanager-main   alertmanager-main-openshift-monitoring.apps.coci632.qe.devcluster.openshift.com   /api   alertmanager-main   web    reencrypt/Redirect   None

# ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring -o jsonpath='{@.spec.host}')

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)"  https://$ALERT_MANAGER/api/v1/alerts |jq '.data[] | select(.labels.alertname | contains("NonCompliant"))'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33720    0 33720    0     0  57928      0 --:--:-- --:--:-- --:--:-- 57938


Version-Release number of selected component (if applicable):
4.10.0 + compliance-operator.v0.1.48

How reproducible:
Always

Steps to Reproduce:

1. Deploy Compliance Operator latest version
2. Create scansettingbinding object

# oc project compliance-test
Now using project "compliance-test" on server "https://api.coci632.qe.devcluster.openshift.com:6443".

#  oc create -f - << EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ScanSettingBinding
> metadata:
>   name: moderate-test
> profiles:
>   - name: ocp4-moderate
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
> settingsRef:
>   name: default
>   kind: ScanSetting
>   apiGroup: compliance.openshift.io/v1alpha1
> EOF
scansettingbinding.compliance.openshift.io/moderate-test created
 
3. Once the scan complete, check for the suite & ccr 

# oc get suite 
NAME            PHASE   RESULT
moderate-test   DONE    NON-COMPLIANT

# oc get pods
NAME                                         READY   STATUS      RESTARTS      AGE
aggregator-pod-ocp4-moderate                 0/1     Completed   0             71s
compliance-operator-6fb484b5cd-g244t         1/1     Running     1 (10m ago)   10m
ocp4-compliance-test-pp-75d888d7db-2wgnv     1/1     Running     0             9m43s
ocp4-moderate-api-checks-pod                 0/2     Completed   0             113s
rhcos4-compliance-test-pp-867b989956-p6snl   1/1     Running     0             9m43s

# oc get ccr -ncompliance-test |grep compliance-notification
ocp4-moderate-compliance-notification-enabled                           PASS     medium

4. Check if the alert notification gets generated or not

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get route alertmanager-main -n openshift-monitoring
NAME                HOST/PORT                                                                         PATH   SERVICES            PORT   TERMINATION          WILDCARD
alertmanager-main   alertmanager-main-openshift-monitoring.apps.coci632.qe.devcluster.openshift.com   /api   alertmanager-main   web    reencrypt/Redirect   None

# ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring -o jsonpath='{@.spec.host}')

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)"  https://$ALERT_MANAGER/api/v1/alerts |jq '.data[] | select(.labels.alertname | contains("NonCompliant"))'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33720    0 33720    0     0  57928      0 --:--:-- --:--:-- --:--:-- 57938


Actual results:
The compliance operator does not generate alert notification if the operator deploy in non-control namespace.

Expected results:
The compliance operator should generate alert notification for non-control namespace as well.

Additional info:

1. Does not generate alert notification for non-control namespace


# oc project compliance-test
Now using project "compliance-test" on server "https://api.coci632.qe.devcluster.openshift.com:6443".

# oc get csv
NAME                          DISPLAY               VERSION   REPLACES   PHASE
compliance-operator.v0.1.48   Compliance Operator   0.1.48               Succeeded

# oc get pods
NAME                                         READY   STATUS    RESTARTS        AGE
compliance-operator-6fb484b5cd-g244t         1/1     Running   1 (8m17s ago)   8m54s
ocp4-compliance-test-pp-75d888d7db-2wgnv     1/1     Running   0               7m38s
rhcos4-compliance-test-pp-867b989956-p6snl   1/1     Running   0               7m38s

#  oc create -f - << EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ScanSettingBinding
> metadata:
>   name: moderate-test
> profiles:
>   - name: ocp4-moderate
>     kind: Profile
>     apiGroup: compliance.openshift.io/v1alpha1
> settingsRef:
>   name: default
>   kind: ScanSetting
>   apiGroup: compliance.openshift.io/v1alpha1
> EOF
scansettingbinding.compliance.openshift.io/moderate-test created

# oc get suite -w
NAME            PHASE       RESULT
moderate-test   LAUNCHING   NOT-AVAILABLE
moderate-test   RUNNING     NOT-AVAILABLE
moderate-test   LAUNCHING   NOT-AVAILABLE
moderate-test   RUNNING     NOT-AVAILABLE
moderate-test   AGGREGATING   NOT-AVAILABLE
moderate-test   DONE          NON-COMPLIANT
moderate-test   DONE          NON-COMPLIANT

# oc get suite 
NAME            PHASE   RESULT
moderate-test   DONE    NON-COMPLIANT

# oc get pods
NAME                                         READY   STATUS      RESTARTS      AGE
aggregator-pod-ocp4-moderate                 0/1     Completed   0             71s
compliance-operator-6fb484b5cd-g244t         1/1     Running     1 (10m ago)   10m
ocp4-compliance-test-pp-75d888d7db-2wgnv     1/1     Running     0             9m43s
ocp4-moderate-api-checks-pod                 0/2     Completed   0             113s
rhcos4-compliance-test-pp-867b989956-p6snl   1/1     Running     0             9m43s

# oc get ccr -ncompliance-test |grep compliance-notification
ocp4-moderate-compliance-notification-enabled                           PASS     medium

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# oc get route alertmanager-main -n openshift-monitoring
NAME                HOST/PORT                                                                         PATH   SERVICES            PORT   TERMINATION          WILDCARD
alertmanager-main   alertmanager-main-openshift-monitoring.apps.coci632.qe.devcluster.openshift.com   /api   alertmanager-main   web    reencrypt/Redirect   None

# ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring -o jsonpath='{@.spec.host}')

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)"  https://$ALERT_MANAGER/api/v1/alerts |jq '.data[] | select(.labels.alertname | contains("NonCompliant"))'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33720    0 33720    0     0  57928      0 --:--:-- --:--:-- --:--:-- 57938


2. Generates alert notification for openshift-compliance namespace only

# oc project openshift-compliance
Now using project "openshift-compliance" on server "https://api.coci632.qe.devcluster.openshift.com:6443".

# oc get csv
NAME                          DISPLAY               VERSION   REPLACES   PHASE
compliance-operator.v0.1.48   Compliance Operator   0.1.48               Succeeded

# oc get pods 
NAME                                              READY   STATUS    RESTARTS        AGE
compliance-operator-7f46b76c5d-26h2g              1/1     Running   1 (2m32s ago)   3m9s
ocp4-openshift-compliance-pp-8469fd7544-hhr64     1/1     Running   0               112s
rhcos4-openshift-compliance-pp-6b45c98fd8-rsjrp   1/1     Running   0               112s

#  oc create -f - << EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: moderate-test
profiles:
  - name: ocp4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
EOF
scansettingbinding.compliance.openshift.io/moderate-test created

# oc get suite
NAME            PHASE   RESULT
moderate-test   DONE    NON-COMPLIANT

# oc get pods
NAME                                              READY   STATUS      RESTARTS        AGE
aggregator-pod-ocp4-moderate                      0/1     Completed   0               4m57s
compliance-operator-7f46b76c5d-26h2g              1/1     Running     1 (8m23s ago)   9m
ocp4-moderate-api-checks-pod                      0/2     Completed   0               5m37s
ocp4-openshift-compliance-pp-8469fd7544-hhr64     1/1     Running     0               7m43s
rhcos4-openshift-compliance-pp-6b45c98fd8-rsjrp   1/1     Running     0               7m43s

# oc get ccr -nopenshift-compliance |grep compliance-notification
ocp4-moderate-compliance-notification-enabled                           PASS     medium

# oc get prometheusrules --all-namespaces -o json | jq '[.items[] | select(.metadata.name =="compliance") | .metadata.name]'
[
  "compliance",
  "compliance"
]

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)"  https://$ALERT_MANAGER/api/v1/alerts |jq '.data[] | select(.labels.alertname | contains("NonCompliant"))'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35936    0 35936    0     0  63191      0 --:--:-- --:--:-- --:--:-- 63267
{
  "labels": {
    "alertname": "NonCompliant",
    "endpoint": "metrics-co",
    "instance": "10.129.0.196:8585",
    "job": "metrics",
    "name": "moderate-test",
    "namespace": "openshift-compliance",
    "openshift_io_alert_source": "platform",
    "pod": "compliance-operator-7f46b76c5d-26h2g",
    "prometheus": "openshift-monitoring/k8s",
    "service": "metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "The compliance suite moderate-test returned as NON-COMPLIANT, ERROR, or INCONSISTENT",
    "summary": "The cluster is out-of-compliance"
  },
  "startsAt": "2022-03-04T05:55:00.825Z",
  "endsAt": "2022-03-04T06:03:37.902Z",
  "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.coci632.qe.devcluster.openshift.com/graph?g0.expr=compliance_operator_compliance_state%7Bname%3D~%22.%2B%22%7D+%3E+0&g0.tab=1",
  "status": {
    "state": "active",
    "silencedBy": null,
    "inhibitedBy": null
  },
  "receivers": [
    "Default"
  ],
  "fingerprint": "83c6a1886c47b1bd"
}

Comment 1 Jakub Hrozek 2022-03-10 13:47:27 UTC
It seems that everything should be created in the operator's namespace already, at least looking at the patches that added the alerts I don't see an obvious reason why it shouldn't work. Matt would know better, probably, though.

That said, why do we try to test this use-case? IIRC even with ACM integration, the operator is installed into openshift-compliance just watches resources in other namespaces, right?

Comment 2 Jakub Hrozek 2022-03-10 13:48:20 UTC
Lowering severity and unsetting blocker because this doesn't seem to be a super common use-case.

Comment 10 xiyuan 2022-09-23 06:10:42 UTC
Verification pass with 4.12.0-0.nightly-2022-09-22-153054 + compliance-operator.v0.1.55

#######1. install operator in a non-control namespace:
$ oc apply -f -<<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: co
  labels:
    openshift.io/cluster-monitoring: "true"
    security.openshift.io/scc.podSecurityLabelSync: "false"
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
   name: openshift-compliance-abcd
   namespace: co
spec:
   targetNamespaces:
   - co
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
   name: openshift-compliance-operator
   namespace: co
spec:
   channel: "release-0.1"
   Approval: Automatic
   name: compliance-operator
   source: qe-app-registry
   sourceNamespace: openshift-marketplace
EOF
namespace/co created
operatorgroup.operators.coreos.com/openshift-compliance-abcd created
subscription.operators.coreos.com/openshift-compliance-operator created
$ oc project co
Now using project "co" on server "https://api.xiyuan23-1.qe.azure.devcluster.openshift.com:6443".
$ oc get pod
NAME                                   READY   STATUS    RESTARTS      AGE
compliance-operator-75c4687f47-thjdr   1/1     Running   1 (22m ago)   3m
ocp4-co-pp-746bfb6c5c-d4c5h            1/1     Running   0             3m
rhcos4-co-pp-7c5946fdb9-d5bdb          1/1     Running   0             3m

#############2. create ssb:
$ oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: my-ssb-r
profiles:
  - name: ocp4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1
$ oc get suite
NAME       PHASE   RESULT
my-ssb-r   DONE    NON-COMPLIANT

##########3. check alert:
$ oc get route alertmanager-main -n openshift-monitoring
NAME                HOST/PORT                                                                                  PATH   SERVICES            PORT   TERMINATION          WILDCARD
alertmanager-main   alertmanager-main-openshift-monitoring.apps.xiyuan23-1.qe.azure.devcluster.openshift.com   /api   alertmanager-main   web    reencrypt/Redirect   None
$ ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring -o jsonpath='{@.spec.host}')
$  curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)"  https://$ALERT_MANAGER/api/v1/alerts |jq '.data[] | select(.labels.alertname | contains("NonCompliant"))'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5490    0  5490    0     0   3188      0 --:--:--  0:00:01 --:--:--  3188
{
  "labels": {
    "alertname": "NonCompliant",
    "endpoint": "metrics-co",
    "instance": "10.130.0.75:8585",
    "job": "metrics",
    "name": "my-ssb-r",
    "namespace": "co",
    "openshift_io_alert_source": "platform",
    "pod": "compliance-operator-75c4687f47-thjdr",
    "prometheus": "openshift-monitoring/k8s",
    "service": "metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "The compliance suite my-ssb-r returned as NON-COMPLIANT, ERROR, or INCONSISTENT",
    "summary": "The cluster is out-of-compliance"
  },
  "startsAt": "2022-09-23T05:52:22.939Z",
  "endsAt": "2022-09-23T05:57:52.939Z",
  "generatorURL": "https:///console-openshift-console.apps.xiyuan23-1.qe.azure.devcluster.openshift.com/monitoring/graph?g0.expr=compliance_operator_compliance_state%7Bname%3D~%22.%2B%22%7D+%3E+0&g0.tab=1",
  "status": {
    "state": "active",
    "silencedBy": null,
    "inhibitedBy": null
  },
  "receivers": [
    "Default"
  ],
  "fingerprint": "0e7e6f43de393147"
}

Comment 12 errata-xmlrpc 2022-11-02 16:00:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Compliance Operator bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6657