Bug 2101393 - NodeHasIntegrityFailure alert doesn't set namespace label
Summary: NodeHasIntegrityFailure alert doesn't set namespace label
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: File Integrity Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.12.0
Assignee: Jakub Hrozek
QA Contact: xiyuan
Jeana Routh
Depends On:
TreeView+ depends on / blocked
Reported: 2022-06-27 11:21 UTC by Katya Gordeeva
Modified: 2022-12-22 21:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, alerts issued by the File Integrity Operator did not set a namespace. This made it difficult to understand where the alert was coming from, or what component was responsible for issuing it. With this release, the Operator includes the namespace it was installed into in the alert, making it easier to narrow down what component needs attention. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2101393[*2101393*])
Clone Of:
Last Closed: 2022-08-02 08:17:03 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift file-integrity-operator pull 240 0 None open bug 2101393: operator: Set namespace label for alert 2022-06-27 16:38:51 UTC
Red Hat Product Errata RHBA-2022:5538 0 None None None 2022-08-02 08:17:09 UTC

Description Katya Gordeeva 2022-06-27 11:21:09 UTC
Description of problem:

According to style guide https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide an alert should set namespace label to identify which component is raising the alert however NodeHasIntegrityFailure is not setting namespace label: https://github.com/openshift/file-integrity-operator/blob/master/cmd/manager/operator.go#L266

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. install File Integrity Operator
2. make NodeHasIntegrityFailure fire from a node
3. inspect the alert lables

Actual results:

NodeHasIntegrityFailure doesn't set namespace label

Expected results:

NodeHasIntegrityFailure should be fired from the namespace the operator is is installed in (by default openshift-file-integrity)

Additional info:

Comment 3 xiyuan 2022-07-01 13:38:18 UTC
Verification pass with latest code and 4.11.0-0.nightly-2022-06-30-005428
Patch the PR, deploy the fileintegrity, trigger a fileintegrity failure on node. And then check from GUI for the alert, and screen the alert with label namespace=openshift-file-integrity, you could see the alert displayed in https://bugzilla.redhat.com/attachment.cgi?id=1893920

# oc get fileintegritynodestatus
NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-141-72.us-east-2.compute.internal    ip-10-0-141-72.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-145-158.us-east-2.compute.internal   ip-10-0-145-158.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-169-133.us-east-2.compute.internal   ip-10-0-169-133.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-177-171.us-east-2.compute.internal   ip-10-0-177-171.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-197-182.us-east-2.compute.internal   ip-10-0-197-182.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-209-136.us-east-2.compute.internal   ip-10-0-209-136.us-east-2.compute.internal   Succeeded
# oc extract cm/aide-example-fileintegrity-ip-10-0-141-72.us-east-2.compute.internal-failed --confirm
# cat integritylog 
Start timestamp: 2022-07-01 13:35:25 +0000 (AIDE 0.16)
AIDE found differences between database and filesystem!!

  Total number of entries:	35777
  Added entries:		1
  Removed entries:		0
  Changed entries:		0

Added entries:

d++++++++++++++++: /hostroot/root/test

The attributes of the (uncompressed) database(s):

  MD5      : u4xAmCIexJayzdvPcSdujQ==
  SHA1     : kV+goC3QvMpxMFxhlyQ04Pb0M2w=
  RMD160   : HnckhVVNj3jsMZB24CyKGEm9LFY=
  TIGER    : K23fBlZ+nH1MBp6avqDXuxGV32LVLrYv
  SHA256   : YgdronUu0anW0xmVuOl2bxto4r+VW6fx
  SHA512   : tPFTZMJnJLWsd+C6x6akOYAzDDa0nYSw

End timestamp: 2022-07-01 13:35:55 +0000 (run time: 0m 30s)

Comment 8 xiyuan 2022-07-06 14:39:22 UTC
verification failed with FIO v0.1.26-2.
It is a little weird. As verification pass with pre-merge process but failed with FIO v0.1.26-2.

1. install FIO v0.1.26-2
$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-84sxn   file-integrity-operator.v0.1.26   Automatic   true
$ oc get csv
NAME                              DISPLAY                            VERSION   REPLACES   PHASE
elasticsearch-operator.v5.5.0     OpenShift Elasticsearch Operator   5.5.0                Succeeded
file-integrity-operator.v0.1.26   File Integrity Operator            0.1.26               Succeeded
$ oc get pod
NAME                                       READY   STATUS    RESTARTS      AGE
aide-example-fileintegrity-bjptx           1/1     Running   0             19m
aide-example-fileintegrity-jpz4n           1/1     Running   0             19m
aide-example-fileintegrity-qjqhs           1/1     Running   0             19m
aide-example-fileintegrity-s2nhk           1/1     Running   0             19m
aide-example-fileintegrity-sfqmd           1/1     Running   0             19m
aide-example-fileintegrity-zdtj7           1/1     Running   0             19m
file-integrity-operator-764d7bf547-c5lf7   1/1     Running   1 (30m ago)   30m
$ oc describe pod/file-integrity-operator-764d7bf547-c5lf7 | grep Image
    Image:         registry.redhat.io/compliance/openshift-file-integrity-rhel8-operator@sha256:0694bab8e31fd75558c3fcba2e17bf548e668948fbf6124844ce57063a1ad00e
    Image ID:      registry.redhat.io/compliance/openshift-file-integrity-rhel8-operator@sha256:0694bab8e31fd75558c3fcba2e17bf548e668948fbf6124844ce57063a1ad00e
Per https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2076094, it is from build FIO v0.1.26-2.

2. trigger failures:
$ oc get fileintegritynodestatus
NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-136-242.us-east-2.compute.internal   ip-10-0-136-242.us-east-2.compute.internal   Failed
example-fileintegrity-ip-10-0-159-42.us-east-2.compute.internal    ip-10-0-159-42.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-175-108.us-east-2.compute.internal   ip-10-0-175-108.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-190-195.us-east-2.compute.internal   ip-10-0-190-195.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-193-242.us-east-2.compute.internal   ip-10-0-193-242.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-202-200.us-east-2.compute.internal   ip-10-0-202-200.us-east-2.compute.internal   Succeeded

3. Check on GUI:
the alert could show with label alertname=NodeHasIntegrityFailure, seen from https://bugzilla.redhat.com/attachment.cgi?id=1894946;
but it won't show with label namespace=openshift-file-integrity, seen from https://bugzilla.redhat.com/attachment.cgi?id=1894947

Comment 9 Matt Rogers 2022-07-06 16:22:03 UTC
I see what happened: The fix for the bug merged right after our 0.1.26 release commit.

https://code.engineering.redhat.com/gerrit/c/openshift-file-integrity/+/418181 updated our source to e0f24ac1804d10c08da56acf606c1ba67ae5c709

https://github.com/openshift/file-integrity-operator/commits/master shows the fix merge b188b1b7afd958eca33364e28e1327f8d16e16f2 which comes after in the git history.

We'll increment the release to 0.1.27 and rebuild.

Comment 12 xiyuan 2022-07-19 06:44:46 UTC
verification pass with payload 4.12.0-0.nightly-2022-07-17-215842 and file-integrity-operator.v0.1.28
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-17-215842   True        False         32m     Cluster version is 4.12.0-0.nightly-2022-07-17-215842
$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-trbq9   file-integrity-operator.v0.1.28   Automatic   true
$ oc get csv
NAME                              DISPLAY                            VERSION   REPLACES   PHASE
elasticsearch-operator.v5.5.0     OpenShift Elasticsearch Operator   5.5.0                Succeeded
file-integrity-operator.v0.1.28   File Integrity Operator            0.1.28               Succeeded

Verification steps:
1. install file-integrity-operator.v0.1.28
2. Create fileintegrity:
$ oc apply -f -<<EOF
apiVersion: fileintegrity.openshift.io/v1alpha1
kind: FileIntegrity
  name: example-fileintegrity
    gracePeriod: 20
    maxBackups: 5
  debug: true
fileintegrity.fileintegrity.openshift.io/example-fileintegrity created
$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
3. Trigger fileintegrity check faillure:
$ oc debug node/xiyuan19-2-wmvpn-worker-us-east-1a-p7w76 -- chroot /host mkdir /root/test
Starting pod/xiyuan19-2-wmvpn-worker-us-east-1a-p7w76-debug ...
To use host binaries, run `chroot /host`

Removing debug pod ...
$ oc get fileintegritynodestatus
NAME                                                             NODE                                       STATUS
example-fileintegrity-xiyuan19-2-wmvpn-master-0                  xiyuan19-2-wmvpn-master-0                  Succeeded
example-fileintegrity-xiyuan19-2-wmvpn-master-1                  xiyuan19-2-wmvpn-master-1                  Succeeded
example-fileintegrity-xiyuan19-2-wmvpn-master-2                  xiyuan19-2-wmvpn-master-2                  Succeeded
example-fileintegrity-xiyuan19-2-wmvpn-worker-us-east-1a-p7w76   xiyuan19-2-wmvpn-worker-us-east-1a-p7w76   Failed
example-fileintegrity-xiyuan19-2-wmvpn-worker-us-east-1b-5x64v   xiyuan19-2-wmvpn-worker-us-east-1b-5x64v   Succeeded
example-fileintegrity-xiyuan19-2-wmvpn-worker-us-east-1b-gz6v6   xiyuan19-2-wmvpn-worker-us-east-1b-gz6v6   Succeeded
$ oc get cm
NAME                                                                         DATA   AGE
962a0cf2.openshift.io                                                        0      14m
aide-example-fileintegrity-xiyuan19-2-wmvpn-worker-us-east-1a-p7w76-failed   1      88s
aide-pause                                                                   1      6m10s
aide-reinit                                                                  1      6m10s
example-fileintegrity                                                        1      6m10s
kube-root-ca.crt                                                             1      15m
openshift-service-ca.crt                                                     1      15m

$ oc extract cm/aide-example-fileintegrity-xiyuan19-2-wmvpn-worker-us-east-1a-p7w76-failed --confirm 
$ cat integritylog 

Start timestamp: 2022-07-19 06:14:35 +0000 (AIDE 0.16)
AIDE found differences between database and filesystem!!

  Total number of entries:	35766
  Added entries:		1
  Removed entries:		0
  Changed entries:		0

Added entries:

d++++++++++++++++: /hostroot/root/test

The attributes of the (uncompressed) database(s):

  MD5      : H6vsxWQgCXCONNpNAsbK8A==
  SHA1     : 7izTbkVOjc1dKr26GYzBI0VsGvA=
  RMD160   : VkTQP+6k8oYdu3PddmF50vp3NCw=
  TIGER    : L34A9vnYnQkrqUFgvToKaUn4kQ35DGw7
  SHA256   : nxtcnEDrMBVgzwhysqTLpBQIdeVsOiPg
  SHA512   : OQun2sGxusc97s/DRcoZ21FffQGPyaAb

End timestamp: 2022-07-19 06:15:05 +0000 (run time: 0m 30s)

4. check on GUI:
Details seens from https://bugzilla.redhat.com/attachment.cgi?id=1898038

Comment 15 errata-xmlrpc 2022-08-02 08:17:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift File Integrity Operator bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.