Description of problem: # oc get ValidatingWebhookConfiguration prometheusrules.openshift.io NAME WEBHOOKS AGE prometheusrules.openshift.io 1 4h27m create invalid PrometheusRule, example, invalid keys: FOR, LABELS,ANNOTATIONS # oc -n test create -f - << EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: story-rules spec: groups: - name: alerting rules rules: - alert: HighRequestLatency expr: request_latency_seconds:mean5m{job="myjob"} > 0.5 FOR: 5m LABELS: severity: warning ANNOTATIONS: message: This is an alert meant to ensure that the entire alerting pipeline is functional. EOF PrometheusRule would ignore the invalid for/labels/annotations key # oc -n test get PrometheusRule story-rules -oyaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2020-06-08T06:16:35Z" generation: 1 managedFields: - apiVersion: monitoring.coreos.com/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:groups: {} manager: oc operation: Update time: "2020-06-08T06:16:35Z" name: story-rules namespace: test resourceVersion: "169271" selfLink: /apis/monitoring.coreos.com/v1/namespaces/test/prometheusrules/story-rules uid: 4f57d8ae-f7fd-4528-822e-10bf3581cc69 spec: groups: - name: alerting rules rules: - alert: HighRequestLatency expr: request_latency_seconds:mean5m{job="myjob"} > 0.5 Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-06-07-065515 How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: ignored the invalid for/labels/annotations key Expected results: should reject prometheusrules Additional info:
I was not able to reproduce this using clusterbot. When I tried applying the example rule against a 4.6 cluster, the API server correctly rejected the rule. [pgier@pgier-laptop ~]$ k create -f - << EOF > apiVersion: monitoring.coreos.com/v1 > kind: PrometheusRule > metadata: > name: story-rules > spec: > groups: > - name: alerting rules > rules: > - alert: HighRequestLatency > expr: request_latency_seconds:mean5m{job="myjob"} > 0.5 > FOR: 5m > EOF error: error validating "STDIN": error validating data: ValidationError(PrometheusRule.spec.groups[0].rules[0]): unknown field "FOR" in com.coreos.monitoring.v1.PrometheusRule.spec.groups.rules; if you choose to ignore these errors, turn validation off with --validate=false Note that this error is coming from the API server validating against the CRD, and not coming from the PrometheusRule validating webhook. I'm wondering if your test server had some difference in the CRD?
(In reply to Paul Gier from comment #1) > I was not able to reproduce this using clusterbot. When I tried applying > the example rule against a 4.6 cluster, the API server correctly rejected > the rule. > > [pgier@pgier-laptop ~]$ k create -f - << EOF > > apiVersion: monitoring.coreos.com/v1 > > kind: PrometheusRule > > metadata: > > name: story-rules > > spec: > > groups: > > - name: alerting rules > > rules: > > - alert: HighRequestLatency > > expr: request_latency_seconds:mean5m{job="myjob"} > 0.5 > > FOR: 5m > > EOF > error: error validating "STDIN": error validating data: > ValidationError(PrometheusRule.spec.groups[0].rules[0]): unknown field "FOR" > in com.coreos.monitoring.v1.PrometheusRule.spec.groups.rules; if you choose > to ignore these errors, turn validation off with --validate=false > > Note that this error is coming from the API server validating against the > CRD, and not coming from the PrometheusRule validating webhook. I'm > wondering if your test server had some difference in the CRD? followed your steps, there is not error throw out, and create prometheusrule successfully, and the PrometheusRule removed the invalid setting `FOR` attach the monitoring crd files
Created attachment 1698027 [details] monitoring crd file
I applied the attached CRD (just the prometheusrule one) in minikube and tried to create the prometheus rule, and the API server correctly prevented the rule from being created with an error similar to the previous one. So I'm still not sure why the rule is created without error in your cluster. @junqi if you try a similar change (lower case to upper case) on other custom resources such as an alertmanager or servicemonitor are those also created without error or is it just the prometheusrule that has this issue? The other thing I'm thinking we can try is after seeing the failure, delete the prometheusrule validatingwebhookconfiguration just to see if that has any effect. I don't think it should, but at least we can rule that out.
(In reply to Paul Gier from comment #4) > I applied the attached CRD (just the prometheusrule one) in minikube and > tried to create the prometheus rule, and the API server correctly prevented > the rule from being created with an error similar to the previous one. So > I'm still not sure why the rule is created without error in your cluster. > > @junqi if you try a similar change (lower case to upper case) on other > custom resources such as an alertmanager or servicemonitor are those also > created without error or is it just the prometheusrule that has this issue? > > The other thing I'm thinking we can try is after seeing the failure, delete > the prometheusrule validatingwebhookconfiguration just to see if that has > any effect. I don't think it should, but at least we can rule that out. create ServiceMonitor with capitalized letter "INTERVAL", it is also ignored in the created ServiceMonitor. Delete ValidatingWebhookConfiguration/prometheusrules.openshift.io, we still can create PrometheusRule successfully with the settings in Comment 2. maybe next time, I could preserve a cluster, then you can check on our cluster #oc -n test create -f - << EOF apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: prometheus-example-monitor name: prometheus-example-monitor spec: endpoints: - INTERVAL: 30s port: web scheme: http namespaceSelector: matchNames: - test selector: matchLabels: app: prometheus-example-app ******************* # oc -n test get ServiceMonitor/prometheus-example-monitor -oyaml ... spec: endpoints: - port: web scheme: http namespaceSelector: matchNames: - test selector: matchLabels: app: prometheus-example-app ...
I should have followed your initial instructions more carefully. I was using `kubectl` instead of `oc` when testing. Apparently, kubectl defaults to the option `--validate=true` while oc defaults to `--validate=false`. So the error is caught by kubectl but not by oc. This doesn't seem to be related to monitoring, and maybe the difference can be explained by the api server team.
(In reply to Paul Gier from comment #7) > I should have followed your initial instructions more carefully. I was > using `kubectl` instead of `oc` when testing. Apparently, kubectl defaults > to the option `--validate=true` while oc defaults to `--validate=false`. So > the error is caught by kubectl but not by oc. This doesn't seem to be > related to monitoring, and maybe the difference can be explained by the api > server team. yes, indeed, thanks for debug
This is due to missing client-side validation in oc. kube-apiserver prunes objects, but never rejects them in case of unknown fields. This is done for compatibility reasons between different client and server versions. Reassigning.
*** This bug has been marked as a duplicate of bug 1848408 ***