Bug 1940392
| Summary: | [sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation [Suite:openshift/conformance/parallel] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Nikolaos Leandros Moraitis <nmoraiti> |
| Component: | kube-controller-manager | Assignee: | Jan Chaloupka <jchaloup> |
| Status: | CLOSED DUPLICATE | QA Contact: | zhou ying <yinzhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.5 | CC: | alegrand, anpicker, aos-bugs, erooth, kakkoyun, lcosic, mfojtik, pkrupa, spasquie, surbania |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation [Suite:openshift/conformance/parallel]
|
|
| Last Closed: | 2021-03-22 08:53:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nikolaos Leandros Moraitis
2021-03-18 11:02:19 UTC
Looking at the prometheus logs [2], the rule that triggers evaluation failures is "PodDisruptionBudgetLimit":
level=warn ts=2021-03-18T07:39:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n severity: warning\nannotations:\n message: The pod disruption budget is preventing further disruption to pods because\n it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"e2e-k8s-service-lb-available-2118\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.0.7:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-2118\", pod=\"kube-state-metrics-8676f85877-kqzcb\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.23:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-2118\", pod=\"kube-state-metrics-8676f85877-ffc5j\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
Reassigning to the kube-controller-manager component since the alert is shipped by the cluster-kube-controller-manager-operator [2].
[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372432196983328768/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/pods/openshift-monitoring_prometheus-k8s-0_prometheus.log
[2] https://github.com/openshift/cluster-kube-controller-manager-operator/blob/aa8461632b6e78236e6d84659b97f511e8e55632/manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml#L30-L37
*** Bug 1940393 has been marked as a duplicate of this bug. *** Some tests are failing to pull centos7 image
```
Mar 13 17:57:35.432: INFO: At 2021-03-13 17:54:38 +0000 UTC - event for execpods2mk9: {kubelet compute-1} Failed: Failed to pull image "centos:7": rpc error: code = Unknown desc = Error determining manifest MIME type for docker://centos:7: Error reading manifest sha256:e4ca2ed0202e76be184e75fb26d14bf974193579039d5573fb2348664deef76e in docker.io/library/centos: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
```
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-vsphere-upi/1372839231512121344
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-vsphere-upi/1371993470729719808
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-vsphere-upi/1370785635849211904
From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372797703263621120/artifacts/e2e-aws-workers-rhel7/gather-extra/artifacts/pods/openshift-monitoring_prometheus-k8s-0_prometheus.log:
```
level=warn ts=2021-03-19T07:26:05.311Z caller=manager.go:598 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"} * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-171-164.us-west-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-159-177.us-west-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
```
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372797703263621120
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372616311028322304
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372435340991664128
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372344750400606208
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372254044298416128
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1372072739900231680
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1371801130840887296
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1371710530372243456
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1371438694300389376
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1371257483598761984
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1371076289380749312
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1370985694431809536
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1370895099193462784
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1370713924474769408
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-workers-rhel7/1370623334252810240
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-ovn-upgrade/1372238974067675136
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-ovn-upgrade/1371142819141390336
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-ovn-upgrade/1371100478640754688
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-ovn-upgrade/1370928518262689792
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-gcp-ovn-upgrade/1370766260731645952
From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372795443267768320/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/pods/openshift-monitoring_prometheus-k8s-0_prometheus.log
```
level=warn ts=2021-03-19T07:54:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n severity: warning\nannotations:\n message: The pod disruption budget is preventing further disruption to pods because\n it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"e2e-k8s-service-lb-available-697\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-lx9dj\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.128.2.16:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-x4d7v\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-03-19T07:54:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n severity: critical\nannotations:\n message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"e2e-k8s-service-lb-available-697\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-lx9dj\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.128.2.16:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-x4d7v\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-03-19T07:55:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n severity: warning\nannotations:\n message: The pod disruption budget is preventing further disruption to pods because\n it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"e2e-k8s-service-lb-available-697\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-lx9dj\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.128.2.16:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-x4d7v\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-03-19T07:55:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n severity: critical\nannotations:\n message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"e2e-k8s-service-lb-available-697\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-lx9dj\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.128.2.16:8443\", job=\"kube-state-metrics\", namespace=\"e2e-k8s-service-lb-available-697\", pod=\"kube-state-metrics-8676f85877-x4d7v\", poddisruptionbudget=\"service-test\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
```
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372795443267768320
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372432196983328768
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372341401290805248
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1371887414431191040
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1371796655946338304
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1371615105338314752
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1370616835891793920
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1372732399577731072
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1372641596482260992
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1372550763884056576
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1371461299535351808
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1371189192205275136
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1371098462539485184
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1370826126393348096
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-azure-ovn-upgrade/1370463116701208576
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/1372462035782078464
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/1372099413719126016
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/1371374007357542400
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-aws-ovn-upgrade/1371303583558930432
found duplicate series for the match group {namespace="e2e-k8s-service-lb-available-697", poddisruptionbudget="service-test", service="kube-state-metrics"} on the right hand-side of the operation:
[
{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.9:8443", job="kube-state-metrics", namespace="e2e-k8s-service-lb-available-697", pod="kube-state-metrics-8676f85877-lx9dj", poddisruptionbudget="service-test", service="kube-state-metrics"},
{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.128.2.16:8443", job="kube-state-metrics", namespace="e2e-k8s-service-lb-available-697", pod="kube-state-metrics-8676f85877-x4d7v", poddisruptionbudget="service-test", service="kube-state-metrics"}
];
many-to-many matching not allowed: matching labels must be unique on one side
The difference between the two series are "instance" and "pod" fields.
From the upgrade logs https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-gcp-ovn-upgrade/1372795443267768320/build-log.txt:
```
Mar 19 07:29:48.275 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ reason/Created
Mar 19 07:29:48.285 I ns/openshift-monitoring replicaset/kube-state-metrics-8676f85877 reason/SuccessfulCreate Created pod: kube-state-metrics-8676f85877-lx9dj
Mar 19 07:29:48.309 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl reason/Scheduled
Mar 19 07:29:50.888 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Pulling image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:45f9dff539d7a43efd674bd82f95d0515f5d6c26f47eba6250001a40eab40625
Mar 19 07:29:52.815 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:45f9dff539d7a43efd674bd82f95d0515f5d6c26f47eba6250001a40eab40625
Mar 19 07:29:52.987 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Created
Mar 19 07:29:53.019 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Started
Mar 19 07:29:53.025 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:1aa5bb03d0485ec2db2c7871a1eeaef83e9eabf7e9f1bc2c841cf1a759817c99
Mar 19 07:29:53.242 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/Created
Mar 19 07:29:53.284 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/Started
Mar 19 07:29:53.290 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:1aa5bb03d0485ec2db2c7871a1eeaef83e9eabf7e9f1bc2c841cf1a759817c99
Mar 19 07:29:53.527 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/Created
Mar 19 07:29:53.564 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/Started
Mar 19 07:29:54.150 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/Ready
Mar 19 07:29:54.150 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Ready
Mar 19 07:29:54.150 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/Ready
Mar 19 07:31:11.365 E ns/openshift-ovn-kubernetes pod/ovnkube-node-l2c6r node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/ovn-controller container exited with code 143 (Error): .0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:25Z|00388|poll_loop|INFO|wakeup due to [POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (86% CPU usage)\n2021-03-19T07:29:25Z|00389|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:25Z|00390|poll_loop|INFO|wakeup due to [POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (86% CPU usage)\n2021-03-19T07:29:26Z|00391|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:26Z|00392|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:26Z|00393|poll_loop|INFO|wakeup due to [POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (86% CPU usage)\n2021-03-19T07:29:26Z|00394|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:26Z|00395|poll_loop|INFO|wakeup due to [POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (86% CPU usage)\n2021-03-19T07:29:26Z|00396|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:27Z|00397|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:27Z|00398|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (10.0.32.4:38062<->10.0.0.4:9642) at lib/stream-ssl.c:832 (86% CPU usage)\n2021-03-19T07:29:48Z|00399|binding|INFO|Claiming lport openshift-monitoring_kube-state-metrics-8676f85877-lx9dj for this chassis.\n2021-03-19T07:29:48Z|00400|binding|INFO|openshift-monitoring_kube-state-metrics-8676f85877-lx9dj: Claiming dynamic\n2021-03-19T07:31:10Z|00001|fatal_signal(urcu1)|WARN|terminating with signal 15 (Terminated)\n
Mar 19 07:53:09.087 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl reason/GracefulDelete in 30s
Mar 19 07:53:09.235 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ reason/Created
Mar 19 07:53:09.252 I ns/openshift-monitoring replicaset/kube-state-metrics-8676f85877 reason/SuccessfulCreate Created pod: kube-state-metrics-8676f85877-x4d7v
Mar 19 07:53:09.269 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx reason/Scheduled
Mar 19 07:53:10.075 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/Killing
Mar 19 07:53:10.247 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/Killing
Mar 19 07:53:10.452 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/Killing
Mar 19 07:53:12.667 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl invariant violation (bug): pod should not transition Running->Pending even when terminated
Mar 19 07:53:12.667 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-main reason/NotReady
Mar 19 07:53:12.667 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-state-metrics reason/NotReady
Mar 19 07:53:12.667 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl container/kube-rbac-proxy-self reason/NotReady
Mar 19 07:53:13.376 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v reason/AddedInterface Add eth0 [10.128.2.16/23]
Mar 19 07:53:14.176 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-state-metrics reason/Pulling image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:45f9dff539d7a43efd674bd82f95d0515f5d6c26f47eba6250001a40eab40625
Mar 19 07:53:18.279 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl pod has been pending longer than a minute
Mar 19 07:53:18.398 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl reason/Deleted
Mar 19 07:53:19.467 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-state-metrics reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:45f9dff539d7a43efd674bd82f95d0515f5d6c26f47eba6250001a40eab40625
Mar 19 07:53:20.021 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-state-metrics reason/Created
Mar 19 07:53:20.114 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-state-metrics reason/Started
Mar 19 07:53:20.124 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-main reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:1aa5bb03d0485ec2db2c7871a1eeaef83e9eabf7e9f1bc2c841cf1a759817c99
Mar 19 07:53:20.868 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-main reason/Created
Mar 19 07:53:21.266 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-main reason/Started
Mar 19 07:53:21.666 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-self reason/Pulled image/registry.ci.openshift.org/ocp/4.5-2021-03-15-063422@sha256:1aa5bb03d0485ec2db2c7871a1eeaef83e9eabf7e9f1bc2c841cf1a759817c99
Mar 19 07:53:21.719 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-main reason/Ready
Mar 19 07:53:21.719 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-self reason/Ready
Mar 19 07:53:21.719 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-state-metrics reason/Ready
Mar 19 07:53:22.094 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-self reason/Created
Mar 19 07:53:22.110 I ns/openshift-monitoring pod/kube-state-metrics-8676f85877-x4d7v node/ci-op-ywb8j6hx-96e65-g97zc-worker-c-8kzbx container/kube-rbac-proxy-self reason/Started
```
Not familiar with the series and the reason why "many-to-many matching not allowed: matching labels must be unique on one side" is reported. Though the following log looks suspicous:
```
Mar 19 07:53:12.667 W ns/openshift-monitoring pod/kube-state-metrics-8676f85877-lx9dj node/ci-op-ywb8j6hx-96e65-g97zc-worker-d-6j9kl invariant violation (bug): pod should not transition Running->Pending even when terminated
```
Simon, can you help me understand what "many-to-many matching not allowed: matching labels must be unique on one side" error is about? @Jan sure. The alert expression uses "on(namespace, poddisruptionbudget, service) kube_poddisruptionbudget_status_desired_healthy" which means that the query will fail if kube_poddisruptionbudget_status_desired_healthy returns several series for the same (namespace, poddisruptionbudget, service) tuple. It happens here because the kube-state-metrics pod has been redeployed and for a short period of time, metrics from the old and new instances coexist (they're identical except for the "instance" and "pod" labels). Looking more closely, it happens to be a duplicate of bug 1806640 which has been fixed in 4.5 and onwards. The bug exists for this job because it's a 4.5 > 4.4 downgrade! I guess you want to close it then :) [1] https://github.com/openshift/cluster-kube-controller-manager-operator/commit/265b5a6a56960443048a029c67bb0d061adb9e25 Thanks for the news!!! *** This bug has been marked as a duplicate of bug 1806640 *** |