Bug 1909874 - cluster monitoring operator pods failing to start
Summary: cluster monitoring operator pods failing to start
Keywords:
Status: CLOSED DUPLICATE of bug 1906836
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-21 21:32 UTC by Ben Parees
Modified: 2020-12-22 08:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
[sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes
Last Closed: 2020-12-22 08:53:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Parees 2020-12-21 21:32:02 UTC
https://search.ci.openshift.org/?search=%5C%5Bsig-arch%5C%5D+Managed+cluster+should+have+no+crashlooping+pods+in+core+namespaces+over+four+minutes&maxAge=168h&context=1&type=junit&name=%5Erelease.*4.6&maxMatches=5&maxBytes=20971520&groupBy=job

shows a number of failures caused by monitoring pods, such as:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-ovn-4.6/1339670137233477632

fail [github.com/openshift/origin/test/extended/operators/cluster.go:151]: Expected
    <[]string | len:1, cap:1>: [
        "Pod openshift-monitoring/cluster-monitoring-operator-55554cdb4b-z46dn was pending entire time: unknown error",
    ]
to be empty


the failure appears to be related to the kube-rbac-proxy container:
{
      "name": "kube-rbac-proxy",
      "state": {
        "waiting": {
          "reason": "CreateContainerConfigError",
          "message": "container has runAsNonRoot and image has non-numeric user (nobody), cannot verify user is non-root"
        }
      },
      "lastState": {},
      "ready": false,
      "restartCount": 0,
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5d3f179d92e0fca445f69c41bb9763b2f2d9d37356621953d9ab8d71e22b2c5",
      "imageID": "",
      "started": false
    }


though i also see this:
Dec 17 21:26:52.580: INFO: prometheus-k8s-1[openshift-monitoring].container[prometheus]=level=error ts=2020-12-17T21:07:51.480Z caller=main.go:290 msg="Error loading config (--config.file=/etc/prometheus/config_out/prometheus.env.yaml)" err="open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"

Comment 2 Damien Grisonnet 2020-12-22 08:53:49 UTC
I think it's safe to close this bug as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1906836.

*** This bug has been marked as a duplicate of bug 1906836 ***


Note You need to log in before you can comment on or make changes to this bug.