Bug 2063047
| Summary: | Configuring a full-path query log file in CMO breaks Prometheus with the latest version of the operator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> |
| Component: | Monitoring | Assignee: | Joao Marcal <jmarcal> |
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.11 | CC: | amuller, anpicker, aos-bugs, hongyli |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:53:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Because the root filesystem is read-only, a emptyDir volume needs to be provisioned by CMO if the query log file is a full path (as explained in the release CHANGELOG [1]). CMO should automatically add the volume + mount it asthe queryLogFile's directory except for the following edge cases: * when the queryLogFile's directory starts with "/dev" (e.g. "/dev/stdout") because this destination is writable. * when the queryLogFile's directory starts with "/prometheus" (e.g. "/prometheus/query.log") because it is already writable (TSDB storage directory). * when the queryLogFile's directory starts with "/" (e.g. "/query.log"), this should be rejected since we don't want to mount a volume at the root location. [1] https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.55.0 Test with payload 4.11.0-0.nightly-2022-03-18-003836
% oc apply -f - <<EOF
heredoc> apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |-
prometheusK8s:
queryLogFile: /tmp/test.log
heredoc> EOF
tail /tmp/test.log
{"params":{"end":"2022-03-18T02:56:46.864Z","query":"min_over_time(prometheus_operator_managed_resources{job=\"prometheus-operator\",namespace=~\"openshift-monitoring|openshift-user-workload-monitoring\",state=\"rejected\"}[5m]) > 0","start":"2022-03-18T02:56:46.864Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-operator-rules-1906bb0c-447d-4527-96e9-2f9d29cb61c3.yaml","name":"prometheus-operator"},"stats":{"timings":{"evalTotalTime":0.00013654,"resultSortTime":0,"queryPreparationTime":0.000072124,"innerEvalTime":0.000054378,"execQueueTime":0.000008232,"execTotalTime":0.000149892}},"ts":"2022-03-18T02:56:46.868Z"}
{"params":{"end":"2022-03-18T02:56:47.019Z","query":"kube_pod_status_ready{condition=\"true\",namespace=\"openshift-cluster-node-tuning-operator\"} == 0","start":"2022-03-18T02:56:47.019Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-cluster-node-tuning-operator-node-tuning-operator-c9a433ca-1024-468b-82ef-c04e973d1c8e.yaml","name":"node-tuning-operator.rules"},"stats":{"timings":{"evalTotalTime":0.000189451,"resultSortTime":0,"queryPreparationTime":0.000092864,"innerEvalTime":0.000089069,"execQueueTime":0.000011035,"execTotalTime":0.000206109}},"ts":"2022-03-18T02:56:47.020Z"}
% oc apply -f - <<EOF
heredoc> apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |-
prometheusK8s:
queryLogFile: /var/test.log
heredoc> EOF
tail /var/test.log
{"params":{"end":"2022-03-18T02:59:53.532Z","query":"cco_credentials_requests_conditions{condition=\"InsufficientCloudCreds\"} > 0","start":"2022-03-18T02:59:53.532Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-cloud-credential-operator-cloud-credential-operator-alerts-f3c8c22c-1897-4f4f-80a3-642504067996.yaml","name":"CloudCredentialOperator"},"stats":{"timings":{"evalTotalTime":0.000089687,"resultSortTime":0,"queryPreparationTime":0.000059958,"innerEvalTime":0.00002176,"execQueueTime":0.000008778,"execTotalTime":0.000104447}},"ts":"2022-03-18T02:59:53.558Z"}
% oc apply -f - <<EOF
heredoc> apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |-
prometheusK8s:
queryLogFile: /dev/test.log
heredoc> EOF
% oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator
-----
W0318 03:03:37.459612 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: query log file can't be stored on a new file on the dev directory: invalid value for config
% oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |-
prometheusK8s:
queryLogFile: /test.log
EOF
% oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator
-----
W0318 03:07:23.079970 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: query log file can't be stored on the root directory: invalid value for config
% oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
prometheus:
queryLogFile: /tmp/test.log
heredoc> EOF
% oc -n openshift-user-workload-monitoring get pod
NAME READY STATUS RESTARTS AGE
prometheus-operator-5fc4b476dc-x5zc2 2/2 Running 0 4m28s
prometheus-user-workload-0 4/5 Running 0 8s
prometheus-user-workload-1 5/5 Running 0 26s
thanos-ruler-user-workload-0 3/3 Running 0 4m18s
thanos-ruler-user-workload-1 3/3 Running 0 4m18s
hongyli@hongyli-mac Downloads % oc -n openshift-user-workload-monitoring rsh prometheus-user-workload-0
sh-4.4$ tail -f /tmp/test.log
no result
When configure an invalid directory, the error information in log file are not correct
% oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
prometheus:
queryLogFile: /test.log
EOF
% oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator
-----
E0318 07:34:42.684084 1 operator.go:537] Syncing "openshift-monitoring/cluster-monitoring-config" failed
E0318 07:34:42.684167 1 operator.go:538] sync "openshift-monitoring/cluster-monitoring-config" failed: the User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap could not be parsed: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig
W0318 07:35:30.756558 1 operator.go:781] Error creating User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap. Error: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig
I0318 07:35:30.756586 1 operator.go:681] ClusterOperator reconciliation failed (attempt 34), retrying.
W0318 07:35:30.756592 1 operator.go:684] Updating ClusterOperator status to failed after 34 attempts.
E0318 07:35:30.773779 1 operator.go:537] Syncing "openshift-monitoring/cluster-monitoring-config" failed
E0318 07:35:30.773809 1 operator.go:538] sync "openshift-monitoring/cluster-monitoring-config" failed: the User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap could not be parsed: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig
I0318 07:38:15.576292 1 operator.go:509] Triggering an update due to ConfigMap or Secret: openshift-user-workload-monitoring/user-workload-monitoring-config
Facing 2 issues now: When configure valid path for user workload query log, failed to see any query log, though I done many prometheus-example app queries by exploring dashboard or query 'version' directly. When configure invalid path for user workload query log, error information is log file are not correct So I reopened the bug. There is another critical issue, when configure query log file as the following, cause prometheus-k8s pod to crash
% oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
prometheusK8s:
queryLogFile: ./lhy/test.log
EOF
% oc -n openshift-monitoring get pod|grep prometheus-k8s
prometheus-k8s-0 6/6 Running 0 15m
prometheus-k8s-1 5/6 CrashLoopBackOff 5 (39s ago) 3m39s
regarding comment 8, this should be fixed by bumping to the latest version of prometheus-operator (https://github.com/openshift/prometheus-operator/pull/162) that merged 1 hour ago. Regarding the first issue:
=================
jmarcal ~ → oc get cm user-workload-monitoring-config -o yaml | cat
apiVersion: v1
data:
config.yaml: |
prometheus:
queryLogFile: /tmp/test.log
kind: ConfigMap
metadata:
creationTimestamp: "2022-03-18T08:48:53Z"
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
resourceVersion: "46245"
uid: 0f341905-4ff6-43fd-a3a2-0eb5b9b67ff5
jmarcal ~ → oc rsh prometheus-user-workload-1
sh-4.4$ curl --data-urlencode 'query=up{job="alertmanager"}' 127.0.0.1:9090/api/v1/query
sh-4.4$ cat /tmp/test.log
{"httpRequest":{"clientIP":"127.0.0.1","method":"POST","path":"/api/v1/query"},"params":{"end":"2022-03-18T09:35:15.992Z","query":"up{job=\"alertmanager\"}","start":"2022-03-18T09:35:15.992Z","step":0},"stats":{"timings":{"evalTotalTime":0.000053284,"resultSortTime":0,"queryPreparationTime":0.000036362,"innerEvalTime":0.000010217,"execQueueTime":0.000013256,"execTotalTime":0.000081297}},"ts":"2022-03-18T09:35:15.993Z"}
=================
So from my side everything seems to be working accordingly. From what Simon told me for user-workload we need to hit the Promethues API query endpoint directly because going through Thanos query wouldn't trigger request logging.
Regarding the second issue:
=================
jmarcal ~ → oc get cm user-workload-monitoring-config -o yaml | cat
apiVersion: v1
data:
config.yaml: |
prometheus:
queryLogFile: /test.log
kind: ConfigMap
metadata:
creationTimestamp: "2022-03-18T08:48:53Z"
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
resourceVersion: "52109"
uid: 0f341905-4ff6-43fd-a3a2-0eb5b9b67ff5
jmarcal ~ → oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-9z7c8 -c cluster-monitoring-operator | tail
I0318 09:50:03.950181 1 tasks.go:74] ran task 9 of 14: Updating openshift-state-metrics
I0318 09:50:03.989334 1 tasks.go:74] ran task 8 of 14: Updating kube-state-metrics
I0318 09:50:04.057089 1 tasks.go:74] ran task 7 of 14: Updating node-exporter
I0318 09:50:04.168306 1 tasks.go:74] ran task 11 of 14: Updating Telemeter client
W0318 09:50:04.249201 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: initializing UserWorkload Prometheus object failed: query log file can't be stored on the root directory: invalid value for config
I0318 09:50:04.312247 1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter
I0318 09:50:04.989025 1 tasks.go:74] ran task 1 of 14: Updating user workload Prometheus Operator
I0318 09:50:05.574858 1 tasks.go:74] ran task 3 of 14: Updating Grafana
I0318 09:50:07.315837 1 tasks.go:74] ran task 12 of 14: Updating Thanos Querier
I0318 09:50:14.212797 1 tasks.go:74] ran task 6 of 14: Updating Alertmanager
=================
So from my side it seems like the YAML you used for some reason was not valid and that caused the error message you saw.
Test again, configuring a full-path query log file works well for user-workload-monitoring @jamarcal, thanks for your checking. Test with payload 4.11.0-0.nightly-2022-03-20-160505
promtheus-operator's version is 0.55.0
oc -n openshift-monitoring logs prometheus-operator-5cb86ff95c-xglgp
level=info ts=2022-03-21T02:02:52.099203159Z caller=main.go:220 msg="Starting Prometheus Operator" version="(version=0.55.0, branch=rhaos-4.11-rhel-8, revision=5d799b9)
oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
prometheusK8s:
queryLogFile: ./lhy/test.log
EOF
oc -n openshift-monitoring get pod |grep prometheus-k8s
prometheus-k8s-0 6/6 Running 0 46m
prometheus-k8s-1 5/6 CrashLoopBackOff 6 (5m6s ago) 10m
$ oc -n openshift-monitoring describe pod prometheus-k8s-1
Name: prometheus-k8s-1
Namespace: openshift-monitoring
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: ip-10-0-153-240.us-east-2.compute.internal/10.0.153.240
---------
Containers:
prometheus:
Container ID: cri-o://ff7b03f6a64e625d3eddb2b9fb8638a0bcbc3582aacb1a82286d57ab8dd0aeab
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7
Port: <none>
Host Port: <none>
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.time=15d
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--storage.tsdb.path=/prometheus
--web.enable-lifecycle
--web.external-url=https:/console-openshift-console.apps.hongyli-0321.qe.devcluster.openshift.com/monitoring
--web.route-prefix=/
--web.listen-address=127.0.0.1:9090
--web.config.file=/etc/prometheus/web_config/web-config.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: msg="Using pod service account via in-cluster config"
ts=2022-03-21T02:59:25.993Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-03-21T02:59:25.994Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-03-21T02:59:25.995Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-03-21T02:59:25.996Z caller=kubernetes.go:313 level=info component="discovery manager notify" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2022-03-21T02:59:26.132Z caller=main.go:815 level=info msg="Stopping scrape discovery manager..."
ts=2022-03-21T02:59:26.132Z caller=main.go:829 level=info msg="Stopping notify discovery manager..."
ts=2022-03-21T02:59:26.132Z caller=main.go:851 level=info msg="Stopping scrape manager..."
ts=2022-03-21T02:59:26.132Z caller=main.go:811 level=info msg="Scrape discovery manager stopped"
ts=2022-03-21T02:59:26.133Z caller=main.go:825 level=info msg="Notify discovery manager stopped"
ts=2022-03-21T02:59:26.134Z caller=manager.go:945 level=info component="rule manager" msg="Stopping rule manager..."
ts=2022-03-21T02:59:26.134Z caller=main.go:845 level=info msg="Scrape manager stopped"
ts=2022-03-21T02:59:26.134Z caller=manager.go:955 level=info component="rule manager" msg="Rule manager stopped"
ts=2022-03-21T02:59:26.135Z caller=notifier.go:600 level=info component=notifier msg="Stopping notification manager..."
ts=2022-03-21T02:59:26.135Z caller=main.go:1071 level=info msg="Notifier
Exit Code: 1
Started: Mon, 21 Mar 2022 10:59:25 +0800
Finished: Mon, 21 Mar 2022 10:59:26 +0800
Ready: False
Restart Count: 7
Requests:
cpu: 70m
memory: 1Gi
Liveness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6
Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3
Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=15s #success=1 #failure=60
Environment: <none>
Mounts:
/etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
/etc/prometheus/certs from tls-assets (ro)
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro)
/etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro)
/etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/etc/prometheus/secrets/kube-etcd-client-certs from secret-kube-etcd-client-certs (ro)
/etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
/etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro)
/etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro)
/etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro)
/etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/prometheus from prometheus-k8s-db (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
lhy from query-log (rw)
config-reloader:
Container ID: cri-o://c180d9ed73e68e40ce64bb0571bec00f0038b5773cd9ab39c09f7a0319ab96e0
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db
Port: <none>
Host Port: <none>
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=localhost:8080
--reload-url=http://localhost:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
State: Running
Started: Mon, 21 Mar 2022 10:48:39 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 1m
memory: 10Mi
Environment:
POD_NAME: prometheus-k8s-1 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
thanos-sidecar:
Container ID: cri-o://163d59de54f18f3963a71ecad205578a1cf53d3ab23ee9c0a3093b3ec1a23ed3
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e
Ports: 10902/TCP, 10901/TCP
Host Ports: 0/TCP, 0/TCP
Args:
sidecar
--prometheus.url=http://localhost:9090/
--tsdb.path=/prometheus
--http-address=127.0.0.1:10902
--grpc-server-tls-cert=/etc/tls/grpc/server.crt
--grpc-server-tls-key=/etc/tls/grpc/server.key
--grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt
State: Running
Started: Mon, 21 Mar 2022 10:48:40 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 1m
memory: 25Mi
Environment: <none>
Mounts:
/etc/tls/grpc from secret-grpc-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
prometheus-proxy:
Container ID: cri-o://f74fe302e5390e163b9e205b627dec153cd14b324f215a33a8d35d8c72dbe297
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e
Port: 9091/TCP
Host Port: 0/TCP
Args:
-provider=openshift
-https-address=:9091
-http-address=
-email-domain=*
-upstream=http://localhost:9090
-openshift-service-account=prometheus-k8s
-openshift-sar={"resource": "namespaces", "verb": "get"}
-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
-tls-cert=/etc/tls/private/tls.crt
-tls-key=/etc/tls/private/tls.key
-client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
-cookie-secret-file=/etc/proxy/secrets/session_secret
-openshift-ca=/etc/pki/tls/cert.pem
-openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
State: Running
Started: Mon, 21 Mar 2022 10:48:40 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 1m
memory: 20Mi
Environment:
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY:
Mounts:
/etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
/etc/proxy/secrets from secret-prometheus-k8s-proxy (rw)
/etc/tls/private from secret-prometheus-k8s-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
kube-rbac-proxy:
Container ID: cri-o://d719ca84e6a03ae493ee25aa5d1152763e2a0b3cc10980e9691b1e62530a83b6
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840
Port: 9092/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:9092
--upstream=http://127.0.0.1:9090
--allow-paths=/metrics
--config-file=/etc/kube-rbac-proxy/config.yaml
--tls-cert-file=/etc/tls/private/tls.crt
--tls-private-key-file=/etc/tls/private/tls.key
--client-ca-file=/etc/tls/client/client-ca.crt
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
--logtostderr=true
--v=10
--tls-min-version=VersionTLS12
State: Running
Started: Mon, 21 Mar 2022 10:48:40 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 1m
memory: 15Mi
Environment: <none>
Mounts:
/etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
/etc/tls/client from configmap-metrics-client-ca (ro)
/etc/tls/private from secret-prometheus-k8s-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
kube-rbac-proxy-thanos:
Container ID: cri-o://1ca96f73d5547c7ab6a498de1625df7b96dfcd864f10b6267f1d1319ae0428fc
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840
Port: 10902/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=[$(POD_IP)]:10902
--upstream=http://127.0.0.1:10902
--tls-cert-file=/etc/tls/private/tls.crt
--tls-private-key-file=/etc/tls/private/tls.key
--client-ca-file=/etc/tls/client/client-ca.crt
--config-file=/etc/kube-rbac-proxy/config.yaml
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
--allow-paths=/metrics
--logtostderr=true
--tls-min-version=VersionTLS12
--client-ca-file=/etc/tls/client/client-ca.crt
State: Running
Started: Mon, 21 Mar 2022 10:48:40 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 1m
memory: 10Mi
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
/etc/tls/client from metrics-client-ca (ro)
/etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s
Optional: false
tls-assets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: prometheus-k8s-tls-assets-0
SecretOptionalName: <nil>
config-out:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
prometheus-k8s-rulefiles-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-k8s-rulefiles-0
Optional: false
web-config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-web-config
Optional: false
secret-kube-etcd-client-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kube-etcd-client-certs
Optional: false
secret-prometheus-k8s-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-tls
Optional: false
secret-prometheus-k8s-proxy:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-proxy
Optional: false
secret-prometheus-k8s-thanos-sidecar-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-thanos-sidecar-tls
Optional: false
secret-kube-rbac-proxy:
Type: Secret (a volume populated by a Secret)
SecretName: kube-rbac-proxy
Optional: false
secret-metrics-client-certs:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-client-certs
Optional: false
configmap-serving-certs-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: serving-certs-ca-bundle
Optional: false
configmap-kubelet-serving-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kubelet-serving-ca-bundle
Optional: false
configmap-metrics-client-ca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: metrics-client-ca
Optional: false
prometheus-k8s-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
query-log:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
metrics-client-ca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: metrics-client-ca
Optional: false
secret-grpc-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-grpc-tls-8la0v33v3r7hi
Optional: false
prometheus-trusted-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-trusted-ca-bundle-2rsonso43rc5p
Optional: true
kube-api-access-h45dn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-153-240.us-east-2.compute.internal
Normal AddedInterface 11m multus Add eth0 [10.129.2.17/23] from openshift-sdn
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db" already present on machine
Normal Created 11m kubelet Created container init-config-reloader
Normal Started 11m kubelet Started container init-config-reloader
Normal Created 11m kubelet Created container config-reloader
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e" already present on machine
Normal Started 11m kubelet Started container config-reloader
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db" already present on machine
Normal Created 11m kubelet Created container thanos-sidecar
Normal Started 11m (x2 over 11m) kubelet Started container prometheus
Normal Created 11m (x2 over 11m) kubelet Created container prometheus
Normal Pulled 11m (x2 over 11m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7" already present on machine
Normal Started 11m kubelet Started container thanos-sidecar
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e" already present on machine
Normal Created 11m kubelet Created container prometheus-proxy
Normal Started 11m kubelet Started container prometheus-proxy
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840" already present on machine
Normal Created 11m kubelet Created container kube-rbac-proxy
Normal Started 11m kubelet Started container kube-rbac-proxy
Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840" already present on machine
Normal Created 11m kubelet Created container kube-rbac-proxy-thanos
Normal Started 11m kubelet Started container kube-rbac-proxy-thanos
Warning BackOff 78s (x64 over 11m) kubelet Back-off restarting failed container
There is not abnormal log in cluster monitoring operator and prometheus operator
When configure query logfile as ./lhy/test.log , still face pod crash issue Right, I think that to be error-proof, CMO should forbid relative paths for queryLogFile. Test with payload 4.11.0-0.nightly-2022-04-01-172551, when config relative path % oc -n openshift-monitoring logs cluster-monitoring-operator-5dd6f54457-6hj7g cluster-monitoring-operator ----- W0402 01:57:22.591131 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: relative paths to query log file are not supported: invalid value for config Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: When passing the following configuration to CMO, the deployment of Prometheus fails with the latest upstream version of the operator (v0.55.0) because it enforces a root read-only filesystem. apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /tmp/test.log Version-Release number of selected component (if applicable): 4.11 How reproducible: Always Steps to Reproduce: 1. Apply the configmap from above 2. 3. Actual results: CMO gets degraded. Expected results: CMO doesn't report degraded. Additional info: PR bumping the Prometheus operator version to v0.55.0 => https://github.com/openshift/prometheus-operator/pull/162 Failed job => https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_prometheus-operator/162/pull-ci-openshift-prometheus-operator-master-e2e-agnostic-cmo/1502153637059629056 The same fix needs to be done for UWM Prometheus.