Bug 1994257

Summary: Audit errors alert not created
Product: OpenShift Container Platform Reporter: Juan Antonio Osorio <josorior>
Component: kube-apiserverAssignee: Juan Antonio Osorio <josorior>
Status: CLOSED ERRATA QA Contact: Rahul Gangwar <rgangwar>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: aos-bugs, mfojtik, rgangwar, sttts, xxia
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Antonio Osorio 2021-08-17 07:24:49 UTC
Description of problem:

While the deployment ships with many useful alerts, there currently isn't a way of knowing if the apiserver is failing to write audit logs. This is needed in some deployments for compliance reasons. One currently has to create an alert manually to do this.

Version-Release number of selected component (if applicable):
All


Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1166 originally added this alert, but due to an oversight the alert wasn't being created.

Comment 2 Rahul Gangwar 2021-08-19 04:11:43 UTC

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-18-144658   True        False         10m     Cluster version is 4.9.0-0.nightly-2021-08-18-144658

oc -n openshift-kube-apiserver get prometheusrule audit-errors -o yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2021-08-19T03:38:29Z"
  generation: 1
  managedFields:
  - apiVersion: monitoring.coreos.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:groups: {}
    manager: cluster-kube-apiserver-operator
    operation: Update
    time: "2021-08-19T03:38:29Z"
  name: audit-errors
  namespace: openshift-kube-apiserver
  resourceVersion: "6217"
  uid: ac9f627f-d493-4c5f-b691-a0eae17f6799
spec:
  groups:
  - name: apiserver-audit
    rules:
    - alert: AuditLogError
      annotations:
        description: An API Server had an error writing to an audit log.
        summary: |-
          An API Server instance was unable to write audit logs. This could be
          triggered by the node running out of space, or a malicious actor
          tampering with the audit logs.
      expr: |
        sum by (apiserver,instance)(rate(apiserver_audit_error_total{apiserver=~".+-apiserver"}[5m])) / sum by (apiserver,instance) (rate(apiserver_audit_event_total{apiserver=~".+-apiserver"}[5m])) > 0
      for: 1m
      labels:
        severity: warning

Comment 5 errata-xmlrpc 2021-10-18 17:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759