Bug 1844957 - PrometheusRule would ignore the invalid for/labels/annotations key
Summary: PrometheusRule would ignore the invalid for/labels/annotations key
Keywords:
Status: CLOSED DUPLICATE of bug 1848408
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-08 06:19 UTC by Junqi Zhao
Modified: 2020-08-11 10:25 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-11 10:25:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
monitoring crd file (121.37 KB, application/gzip)
2020-06-19 01:54 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2020-06-08 06:19:55 UTC
Description of problem:
# oc get ValidatingWebhookConfiguration prometheusrules.openshift.io
NAME                           WEBHOOKS   AGE
prometheusrules.openshift.io   1          4h27m

create invalid PrometheusRule, example, invalid keys: FOR, LABELS,ANNOTATIONS
# oc -n test create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: story-rules
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: HighRequestLatency
      expr: request_latency_seconds:mean5m{job="myjob"} > 0.5
      FOR: 5m
      LABELS:
        severity: warning
      ANNOTATIONS:
        message: This is an alert meant to ensure that the entire alerting pipeline is functional.
EOF

PrometheusRule would ignore the invalid for/labels/annotations key
# oc -n test get PrometheusRule story-rules -oyaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2020-06-08T06:16:35Z"
  generation: 1
  managedFields:
  - apiVersion: monitoring.coreos.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:groups: {}
    manager: oc
    operation: Update
    time: "2020-06-08T06:16:35Z"
  name: story-rules
  namespace: test
  resourceVersion: "169271"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/test/prometheusrules/story-rules
  uid: 4f57d8ae-f7fd-4528-822e-10bf3581cc69
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: HighRequestLatency
      expr: request_latency_seconds:mean5m{job="myjob"} > 0.5

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-06-07-065515

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
ignored the invalid for/labels/annotations key

Expected results:
should reject prometheusrules

Additional info:

Comment 1 Paul Gier 2020-06-17 19:54:39 UTC
I was not able to reproduce this using clusterbot.  When I tried applying the example rule against a 4.6 cluster, the API server correctly rejected the rule.

[pgier@pgier-laptop ~]$ k create -f - << EOF
> apiVersion: monitoring.coreos.com/v1
> kind: PrometheusRule
> metadata:
>   name: story-rules
> spec:
>   groups:
>   - name: alerting rules
>     rules:
>     - alert: HighRequestLatency
>       expr: request_latency_seconds:mean5m{job="myjob"} > 0.5
>       FOR: 5m
> EOF
error: error validating "STDIN": error validating data: ValidationError(PrometheusRule.spec.groups[0].rules[0]): unknown field "FOR" in com.coreos.monitoring.v1.PrometheusRule.spec.groups.rules; if you choose to ignore these errors, turn validation off with --validate=false

Note that this error is coming from the API server validating against the CRD, and not coming from the PrometheusRule validating webhook.  I'm wondering if your test server had some difference in the CRD?

Comment 2 Junqi Zhao 2020-06-19 01:52:57 UTC
(In reply to Paul Gier from comment #1)
> I was not able to reproduce this using clusterbot.  When I tried applying
> the example rule against a 4.6 cluster, the API server correctly rejected
> the rule.
> 
> [pgier@pgier-laptop ~]$ k create -f - << EOF
> > apiVersion: monitoring.coreos.com/v1
> > kind: PrometheusRule
> > metadata:
> >   name: story-rules
> > spec:
> >   groups:
> >   - name: alerting rules
> >     rules:
> >     - alert: HighRequestLatency
> >       expr: request_latency_seconds:mean5m{job="myjob"} > 0.5
> >       FOR: 5m
> > EOF
> error: error validating "STDIN": error validating data:
> ValidationError(PrometheusRule.spec.groups[0].rules[0]): unknown field "FOR"
> in com.coreos.monitoring.v1.PrometheusRule.spec.groups.rules; if you choose
> to ignore these errors, turn validation off with --validate=false
> 
> Note that this error is coming from the API server validating against the
> CRD, and not coming from the PrometheusRule validating webhook.  I'm
> wondering if your test server had some difference in the CRD?

followed your steps, there is not error throw out, and create prometheusrule successfully, and the PrometheusRule removed the invalid setting `FOR`
attach the monitoring crd files

Comment 3 Junqi Zhao 2020-06-19 01:54:03 UTC
Created attachment 1698027 [details]
monitoring crd file

Comment 4 Paul Gier 2020-06-24 18:46:53 UTC
I applied the attached CRD (just the prometheusrule one) in minikube and tried to create the prometheus rule, and the API server correctly prevented the rule from being created with an error similar to the previous one.  So I'm still not sure why the rule is created without error in your cluster.

@junqi if you try a similar change (lower case to upper case) on other custom resources such as an alertmanager or servicemonitor are those also created without error or is it just the prometheusrule that has this issue?

The other thing I'm thinking we can try is after seeing the failure, delete the prometheusrule validatingwebhookconfiguration just to see if that has any effect.  I don't think it should, but at least we can rule that out.

Comment 5 Junqi Zhao 2020-06-28 06:14:45 UTC
(In reply to Paul Gier from comment #4)
> I applied the attached CRD (just the prometheusrule one) in minikube and
> tried to create the prometheus rule, and the API server correctly prevented
> the rule from being created with an error similar to the previous one.  So
> I'm still not sure why the rule is created without error in your cluster.
> 
> @junqi if you try a similar change (lower case to upper case) on other
> custom resources such as an alertmanager or servicemonitor are those also
> created without error or is it just the prometheusrule that has this issue?
> 
> The other thing I'm thinking we can try is after seeing the failure, delete
> the prometheusrule validatingwebhookconfiguration just to see if that has
> any effect.  I don't think it should, but at least we can rule that out.

create ServiceMonitor with capitalized letter "INTERVAL", it is also ignored in the created ServiceMonitor.
Delete ValidatingWebhookConfiguration/prometheusrules.openshift.io, we still can create PrometheusRule successfully with the settings in Comment 2.
maybe next time, I could preserve a cluster, then you can check on our cluster

#oc -n test create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus-example-monitor
  name: prometheus-example-monitor
spec:
  endpoints:
  - INTERVAL: 30s
    port: web
    scheme: http
  namespaceSelector:
    matchNames:
    - test
  selector:
    matchLabels:
      app: prometheus-example-app
*******************
# oc -n test get ServiceMonitor/prometheus-example-monitor -oyaml
...
spec:
  endpoints:
  - port: web
    scheme: http
  namespaceSelector:
    matchNames:
    - test
  selector:
    matchLabels:
      app: prometheus-example-app
...

Comment 7 Paul Gier 2020-06-29 15:39:51 UTC
I should have followed your initial instructions more carefully.  I was using `kubectl` instead of `oc` when testing.  Apparently, kubectl defaults to the option `--validate=true` while oc defaults to `--validate=false`.  So the error is caught by kubectl but not by oc.  This doesn't seem to be related to monitoring, and maybe the difference can be explained by the api server team.

Comment 8 Junqi Zhao 2020-06-30 06:39:40 UTC
(In reply to Paul Gier from comment #7)
> I should have followed your initial instructions more carefully.  I was
> using `kubectl` instead of `oc` when testing.  Apparently, kubectl defaults
> to the option `--validate=true` while oc defaults to `--validate=false`.  So
> the error is caught by kubectl but not by oc.  This doesn't seem to be
> related to monitoring, and maybe the difference can be explained by the api
> server team.

yes, indeed, thanks for debug

Comment 10 Stefan Schimanski 2020-08-03 10:04:06 UTC
This is due to missing client-side validation in oc.

kube-apiserver prunes objects, but never rejects them in case of unknown fields. This is done for compatibility reasons between different client and server versions.

Reassigning.

Comment 11 Maciej Szulik 2020-08-11 10:25:28 UTC

*** This bug has been marked as a duplicate of bug 1848408 ***


Note You need to log in before you can comment on or make changes to this bug.