Description of problem: When passing the following configuration to CMO, the deployment of Prometheus fails with the latest upstream version of the operator (v0.55.0) because it enforces a root read-only filesystem. apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /tmp/test.log Version-Release number of selected component (if applicable): 4.11 How reproducible: Always Steps to Reproduce: 1. Apply the configmap from above 2. 3. Actual results: CMO gets degraded. Expected results: CMO doesn't report degraded. Additional info: PR bumping the Prometheus operator version to v0.55.0 => https://github.com/openshift/prometheus-operator/pull/162 Failed job => https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_prometheus-operator/162/pull-ci-openshift-prometheus-operator-master-e2e-agnostic-cmo/1502153637059629056 The same fix needs to be done for UWM Prometheus.
Because the root filesystem is read-only, a emptyDir volume needs to be provisioned by CMO if the query log file is a full path (as explained in the release CHANGELOG [1]). CMO should automatically add the volume + mount it asthe queryLogFile's directory except for the following edge cases: * when the queryLogFile's directory starts with "/dev" (e.g. "/dev/stdout") because this destination is writable. * when the queryLogFile's directory starts with "/prometheus" (e.g. "/prometheus/query.log") because it is already writable (TSDB storage directory). * when the queryLogFile's directory starts with "/" (e.g. "/query.log"), this should be rejected since we don't want to mount a volume at the root location. [1] https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.55.0
Test with payload 4.11.0-0.nightly-2022-03-18-003836 % oc apply -f - <<EOF heredoc> apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /tmp/test.log heredoc> EOF tail /tmp/test.log {"params":{"end":"2022-03-18T02:56:46.864Z","query":"min_over_time(prometheus_operator_managed_resources{job=\"prometheus-operator\",namespace=~\"openshift-monitoring|openshift-user-workload-monitoring\",state=\"rejected\"}[5m]) > 0","start":"2022-03-18T02:56:46.864Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-operator-rules-1906bb0c-447d-4527-96e9-2f9d29cb61c3.yaml","name":"prometheus-operator"},"stats":{"timings":{"evalTotalTime":0.00013654,"resultSortTime":0,"queryPreparationTime":0.000072124,"innerEvalTime":0.000054378,"execQueueTime":0.000008232,"execTotalTime":0.000149892}},"ts":"2022-03-18T02:56:46.868Z"} {"params":{"end":"2022-03-18T02:56:47.019Z","query":"kube_pod_status_ready{condition=\"true\",namespace=\"openshift-cluster-node-tuning-operator\"} == 0","start":"2022-03-18T02:56:47.019Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-cluster-node-tuning-operator-node-tuning-operator-c9a433ca-1024-468b-82ef-c04e973d1c8e.yaml","name":"node-tuning-operator.rules"},"stats":{"timings":{"evalTotalTime":0.000189451,"resultSortTime":0,"queryPreparationTime":0.000092864,"innerEvalTime":0.000089069,"execQueueTime":0.000011035,"execTotalTime":0.000206109}},"ts":"2022-03-18T02:56:47.020Z"} % oc apply -f - <<EOF heredoc> apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /var/test.log heredoc> EOF tail /var/test.log {"params":{"end":"2022-03-18T02:59:53.532Z","query":"cco_credentials_requests_conditions{condition=\"InsufficientCloudCreds\"} > 0","start":"2022-03-18T02:59:53.532Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-cloud-credential-operator-cloud-credential-operator-alerts-f3c8c22c-1897-4f4f-80a3-642504067996.yaml","name":"CloudCredentialOperator"},"stats":{"timings":{"evalTotalTime":0.000089687,"resultSortTime":0,"queryPreparationTime":0.000059958,"innerEvalTime":0.00002176,"execQueueTime":0.000008778,"execTotalTime":0.000104447}},"ts":"2022-03-18T02:59:53.558Z"} % oc apply -f - <<EOF heredoc> apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /dev/test.log heredoc> EOF % oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator ----- W0318 03:03:37.459612 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: query log file can't be stored on a new file on the dev directory: invalid value for config % oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: queryLogFile: /test.log EOF % oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator ----- W0318 03:07:23.079970 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: query log file can't be stored on the root directory: invalid value for config
% oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: queryLogFile: /tmp/test.log heredoc> EOF % oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-5fc4b476dc-x5zc2 2/2 Running 0 4m28s prometheus-user-workload-0 4/5 Running 0 8s prometheus-user-workload-1 5/5 Running 0 26s thanos-ruler-user-workload-0 3/3 Running 0 4m18s thanos-ruler-user-workload-1 3/3 Running 0 4m18s hongyli@hongyli-mac Downloads % oc -n openshift-user-workload-monitoring rsh prometheus-user-workload-0 sh-4.4$ tail -f /tmp/test.log no result
When configure an invalid directory, the error information in log file are not correct % oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: queryLogFile: /test.log EOF % oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-xq4s6 cluster-monitoring-operator ----- E0318 07:34:42.684084 1 operator.go:537] Syncing "openshift-monitoring/cluster-monitoring-config" failed E0318 07:34:42.684167 1 operator.go:538] sync "openshift-monitoring/cluster-monitoring-config" failed: the User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap could not be parsed: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig W0318 07:35:30.756558 1 operator.go:781] Error creating User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap. Error: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig I0318 07:35:30.756586 1 operator.go:681] ClusterOperator reconciliation failed (attempt 34), retrying. W0318 07:35:30.756592 1 operator.go:684] Updating ClusterOperator status to failed after 34 attempts. E0318 07:35:30.773779 1 operator.go:537] Syncing "openshift-monitoring/cluster-monitoring-config" failed E0318 07:35:30.773809 1 operator.go:538] sync "openshift-monitoring/cluster-monitoring-config" failed: the User Workload Configuration from "config.yaml" key in the "openshift-user-workload-monitoring/user-workload-monitoring-config" ConfigMap could not be parsed: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field UserWorkloadConfiguration.prometheus of type manifests.PrometheusRestrictedConfig I0318 07:38:15.576292 1 operator.go:509] Triggering an update due to ConfigMap or Secret: openshift-user-workload-monitoring/user-workload-monitoring-config
Facing 2 issues now: When configure valid path for user workload query log, failed to see any query log, though I done many prometheus-example app queries by exploring dashboard or query 'version' directly. When configure invalid path for user workload query log, error information is log file are not correct So I reopened the bug.
There is another critical issue, when configure query log file as the following, cause prometheus-k8s pod to crash % oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true prometheusK8s: queryLogFile: ./lhy/test.log EOF % oc -n openshift-monitoring get pod|grep prometheus-k8s prometheus-k8s-0 6/6 Running 0 15m prometheus-k8s-1 5/6 CrashLoopBackOff 5 (39s ago) 3m39s
regarding comment 8, this should be fixed by bumping to the latest version of prometheus-operator (https://github.com/openshift/prometheus-operator/pull/162) that merged 1 hour ago.
Regarding the first issue: ================= jmarcal ~ → oc get cm user-workload-monitoring-config -o yaml | cat apiVersion: v1 data: config.yaml: | prometheus: queryLogFile: /tmp/test.log kind: ConfigMap metadata: creationTimestamp: "2022-03-18T08:48:53Z" name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring resourceVersion: "46245" uid: 0f341905-4ff6-43fd-a3a2-0eb5b9b67ff5 jmarcal ~ → oc rsh prometheus-user-workload-1 sh-4.4$ curl --data-urlencode 'query=up{job="alertmanager"}' 127.0.0.1:9090/api/v1/query sh-4.4$ cat /tmp/test.log {"httpRequest":{"clientIP":"127.0.0.1","method":"POST","path":"/api/v1/query"},"params":{"end":"2022-03-18T09:35:15.992Z","query":"up{job=\"alertmanager\"}","start":"2022-03-18T09:35:15.992Z","step":0},"stats":{"timings":{"evalTotalTime":0.000053284,"resultSortTime":0,"queryPreparationTime":0.000036362,"innerEvalTime":0.000010217,"execQueueTime":0.000013256,"execTotalTime":0.000081297}},"ts":"2022-03-18T09:35:15.993Z"} ================= So from my side everything seems to be working accordingly. From what Simon told me for user-workload we need to hit the Promethues API query endpoint directly because going through Thanos query wouldn't trigger request logging. Regarding the second issue: ================= jmarcal ~ → oc get cm user-workload-monitoring-config -o yaml | cat apiVersion: v1 data: config.yaml: | prometheus: queryLogFile: /test.log kind: ConfigMap metadata: creationTimestamp: "2022-03-18T08:48:53Z" name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring resourceVersion: "52109" uid: 0f341905-4ff6-43fd-a3a2-0eb5b9b67ff5 jmarcal ~ → oc -n openshift-monitoring logs cluster-monitoring-operator-69d4486df9-9z7c8 -c cluster-monitoring-operator | tail I0318 09:50:03.950181 1 tasks.go:74] ran task 9 of 14: Updating openshift-state-metrics I0318 09:50:03.989334 1 tasks.go:74] ran task 8 of 14: Updating kube-state-metrics I0318 09:50:04.057089 1 tasks.go:74] ran task 7 of 14: Updating node-exporter I0318 09:50:04.168306 1 tasks.go:74] ran task 11 of 14: Updating Telemeter client W0318 09:50:04.249201 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: initializing UserWorkload Prometheus object failed: query log file can't be stored on the root directory: invalid value for config I0318 09:50:04.312247 1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter I0318 09:50:04.989025 1 tasks.go:74] ran task 1 of 14: Updating user workload Prometheus Operator I0318 09:50:05.574858 1 tasks.go:74] ran task 3 of 14: Updating Grafana I0318 09:50:07.315837 1 tasks.go:74] ran task 12 of 14: Updating Thanos Querier I0318 09:50:14.212797 1 tasks.go:74] ran task 6 of 14: Updating Alertmanager ================= So from my side it seems like the YAML you used for some reason was not valid and that caused the error message you saw.
Test again, configuring a full-path query log file works well for user-workload-monitoring @jamarcal, thanks for your checking.
Test with payload 4.11.0-0.nightly-2022-03-20-160505 promtheus-operator's version is 0.55.0 oc -n openshift-monitoring logs prometheus-operator-5cb86ff95c-xglgp level=info ts=2022-03-21T02:02:52.099203159Z caller=main.go:220 msg="Starting Prometheus Operator" version="(version=0.55.0, branch=rhaos-4.11-rhel-8, revision=5d799b9) oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true prometheusK8s: queryLogFile: ./lhy/test.log EOF oc -n openshift-monitoring get pod |grep prometheus-k8s prometheus-k8s-0 6/6 Running 0 46m prometheus-k8s-1 5/6 CrashLoopBackOff 6 (5m6s ago) 10m $ oc -n openshift-monitoring describe pod prometheus-k8s-1 Name: prometheus-k8s-1 Namespace: openshift-monitoring Priority: 2000000000 Priority Class Name: system-cluster-critical Node: ip-10-0-153-240.us-east-2.compute.internal/10.0.153.240 --------- Containers: prometheus: Container ID: cri-o://ff7b03f6a64e625d3eddb2b9fb8638a0bcbc3582aacb1a82286d57ab8dd0aeab Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7 Port: <none> Host Port: <none> Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --storage.tsdb.retention.time=15d --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --web.enable-lifecycle --web.external-url=https:/console-openshift-console.apps.hongyli-0321.qe.devcluster.openshift.com/monitoring --web.route-prefix=/ --web.listen-address=127.0.0.1:9090 --web.config.file=/etc/prometheus/web_config/web-config.yaml State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: msg="Using pod service account via in-cluster config" ts=2022-03-21T02:59:25.993Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2022-03-21T02:59:25.994Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2022-03-21T02:59:25.995Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2022-03-21T02:59:25.996Z caller=kubernetes.go:313 level=info component="discovery manager notify" discovery=kubernetes msg="Using pod service account via in-cluster config" ts=2022-03-21T02:59:26.132Z caller=main.go:815 level=info msg="Stopping scrape discovery manager..." ts=2022-03-21T02:59:26.132Z caller=main.go:829 level=info msg="Stopping notify discovery manager..." ts=2022-03-21T02:59:26.132Z caller=main.go:851 level=info msg="Stopping scrape manager..." ts=2022-03-21T02:59:26.132Z caller=main.go:811 level=info msg="Scrape discovery manager stopped" ts=2022-03-21T02:59:26.133Z caller=main.go:825 level=info msg="Notify discovery manager stopped" ts=2022-03-21T02:59:26.134Z caller=manager.go:945 level=info component="rule manager" msg="Stopping rule manager..." ts=2022-03-21T02:59:26.134Z caller=main.go:845 level=info msg="Scrape manager stopped" ts=2022-03-21T02:59:26.134Z caller=manager.go:955 level=info component="rule manager" msg="Rule manager stopped" ts=2022-03-21T02:59:26.135Z caller=notifier.go:600 level=info component=notifier msg="Stopping notification manager..." ts=2022-03-21T02:59:26.135Z caller=main.go:1071 level=info msg="Notifier Exit Code: 1 Started: Mon, 21 Mar 2022 10:59:25 +0800 Finished: Mon, 21 Mar 2022 10:59:26 +0800 Ready: False Restart Count: 7 Requests: cpu: 70m memory: 1Gi Liveness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6 Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3 Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=15s #success=1 #failure=60 Environment: <none> Mounts: /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro) /etc/prometheus/certs from tls-assets (ro) /etc/prometheus/config_out from config-out (ro) /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro) /etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro) /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/secrets/kube-etcd-client-certs from secret-kube-etcd-client-certs (ro) /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro) /etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro) /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro) /etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro) /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /prometheus from prometheus-k8s-db (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) lhy from query-log (rw) config-reloader: Container ID: cri-o://c180d9ed73e68e40ce64bb0571bec00f0038b5773cd9ab39c09f7a0319ab96e0 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db Port: <none> Host Port: <none> Command: /bin/prometheus-config-reloader Args: --listen-address=localhost:8080 --reload-url=http://localhost:9090/-/reload --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Running Started: Mon, 21 Mar 2022 10:48:39 +0800 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: prometheus-k8s-1 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) thanos-sidecar: Container ID: cri-o://163d59de54f18f3963a71ecad205578a1cf53d3ab23ee9c0a3093b3ec1a23ed3 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e Ports: 10902/TCP, 10901/TCP Host Ports: 0/TCP, 0/TCP Args: sidecar --prometheus.url=http://localhost:9090/ --tsdb.path=/prometheus --http-address=127.0.0.1:10902 --grpc-server-tls-cert=/etc/tls/grpc/server.crt --grpc-server-tls-key=/etc/tls/grpc/server.key --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt State: Running Started: Mon, 21 Mar 2022 10:48:40 +0800 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 25Mi Environment: <none> Mounts: /etc/tls/grpc from secret-grpc-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) prometheus-proxy: Container ID: cri-o://f74fe302e5390e163b9e205b627dec153cd14b324f215a33a8d35d8c72dbe297 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e Port: 9091/TCP Host Port: 0/TCP Args: -provider=openshift -https-address=:9091 -http-address= -email-domain=* -upstream=http://localhost:9090 -openshift-service-account=prometheus-k8s -openshift-sar={"resource": "namespaces", "verb": "get"} -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} -tls-cert=/etc/tls/private/tls.crt -tls-key=/etc/tls/private/tls.key -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token -cookie-secret-file=/etc/proxy/secrets/session_secret -openshift-ca=/etc/pki/tls/cert.pem -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt State: Running Started: Mon, 21 Mar 2022 10:48:40 +0800 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 20Mi Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro) /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) kube-rbac-proxy: Container ID: cri-o://d719ca84e6a03ae493ee25aa5d1152763e2a0b3cc10980e9691b1e62530a83b6 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840 Port: 9092/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9092 --upstream=http://127.0.0.1:9090 --allow-paths=/metrics --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --logtostderr=true --v=10 --tls-min-version=VersionTLS12 State: Running Started: Mon, 21 Mar 2022 10:48:40 +0800 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw) /etc/tls/client from configmap-metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) kube-rbac-proxy-thanos: Container ID: cri-o://1ca96f73d5547c7ab6a498de1625df7b96dfcd864f10b6267f1d1319ae0428fc Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840 Port: 10902/TCP Host Port: 0/TCP Args: --secure-listen-address=[$(POD_IP)]:10902 --upstream=http://127.0.0.1:10902 --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --allow-paths=/metrics --logtostderr=true --tls-min-version=VersionTLS12 --client-ca-file=/etc/tls/client/client-ca.crt State: Running Started: Mon, 21 Mar 2022 10:48:40 +0800 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_IP: (v1:status.podIP) Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw) /etc/tls/client from metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h45dn (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s Optional: false tls-assets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: prometheus-k8s-tls-assets-0 SecretOptionalName: <nil> config-out: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> prometheus-k8s-rulefiles-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-k8s-rulefiles-0 Optional: false web-config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-web-config Optional: false secret-kube-etcd-client-certs: Type: Secret (a volume populated by a Secret) SecretName: kube-etcd-client-certs Optional: false secret-prometheus-k8s-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-tls Optional: false secret-prometheus-k8s-proxy: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-proxy Optional: false secret-prometheus-k8s-thanos-sidecar-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-thanos-sidecar-tls Optional: false secret-kube-rbac-proxy: Type: Secret (a volume populated by a Secret) SecretName: kube-rbac-proxy Optional: false secret-metrics-client-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-client-certs Optional: false configmap-serving-certs-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: serving-certs-ca-bundle Optional: false configmap-kubelet-serving-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: kubelet-serving-ca-bundle Optional: false configmap-metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false prometheus-k8s-db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> query-log: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false secret-grpc-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-grpc-tls-8la0v33v3r7hi Optional: false prometheus-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-trusted-ca-bundle-2rsonso43rc5p Optional: true kube-api-access-h45dn: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-153-240.us-east-2.compute.internal Normal AddedInterface 11m multus Add eth0 [10.129.2.17/23] from openshift-sdn Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db" already present on machine Normal Created 11m kubelet Created container init-config-reloader Normal Started 11m kubelet Started container init-config-reloader Normal Created 11m kubelet Created container config-reloader Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419ae6e3ddc102d59063407ab6d287f3c5a5fcddca921b4fe5c09f515eb1a72e" already present on machine Normal Started 11m kubelet Started container config-reloader Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b25becec8ff8c04d449d7133fb54b2b082f1bc779dc0a289013cbe9e9dd87db" already present on machine Normal Created 11m kubelet Created container thanos-sidecar Normal Started 11m (x2 over 11m) kubelet Started container prometheus Normal Created 11m (x2 over 11m) kubelet Created container prometheus Normal Pulled 11m (x2 over 11m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81ea1941f7a902c68c696e99afa860c41ac7fe0ab0c209e79cc2a7855cdbd5b7" already present on machine Normal Started 11m kubelet Started container thanos-sidecar Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e1e0560f81cde0731eeb20f6332ee56853cc08652e2212511b02d53d1a97bc8e" already present on machine Normal Created 11m kubelet Created container prometheus-proxy Normal Started 11m kubelet Started container prometheus-proxy Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840" already present on machine Normal Created 11m kubelet Created container kube-rbac-proxy Normal Started 11m kubelet Started container kube-rbac-proxy Normal Pulled 11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eae733a92bbd6fb71e26105cca172aceb5a72ccf38daf6f8115a3243edcaf840" already present on machine Normal Created 11m kubelet Created container kube-rbac-proxy-thanos Normal Started 11m kubelet Started container kube-rbac-proxy-thanos Warning BackOff 78s (x64 over 11m) kubelet Back-off restarting failed container There is not abnormal log in cluster monitoring operator and prometheus operator
When configure query logfile as ./lhy/test.log , still face pod crash issue
Right, I think that to be error-proof, CMO should forbid relative paths for queryLogFile.
Test with payload 4.11.0-0.nightly-2022-04-01-172551, when config relative path % oc -n openshift-monitoring logs cluster-monitoring-operator-5dd6f54457-6hj7g cluster-monitoring-operator ----- W0402 01:57:22.591131 1 tasks.go:71] task 4 of 14: Updating Prometheus-k8s failed: initializing Prometheus object failed: relative paths to query log file are not supported: invalid value for config
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069