Created attachment 1699481 [details] openshift-user-workload-monitoring dump file Description of problem: enable enableUserWorkload and set logLevel for prometheusOperator/prometheus/thanosRuler, logLevel for prometheusOperator/thanosRuler don't work, only work for prometheus-user-workload pods, more details please see the attached dump file ****************************** apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true ****************************** apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheusOperator: logLevel: error prometheus: logLevel: warn retention: 48h thanosRuler: logLevel: info ****************************** # oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml | grep "log.level" - --log.level=warn # oc -n openshift-user-workload-monitoring get sts thanos-ruler-user-workload -oyaml | grep "log" terminationMessagePath: /dev/termination-log terminationMessagePath: /dev/termination-log terminationMessagePath: /dev/termination-log # oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep "log" - --logtostderr=true - --deny-namespaces=openshift-apiserver,openshift-apiserver-operator,openshift-authentication,openshift-authentication-operator,openshift-cloud-credential-operator,openshift-cluster-machine-approver,openshift-cluster-samples-operator,openshift-cluster-storage-operator,openshift-cluster-version,openshift-config-operator,openshift-console-operator,openshift-controller-manager,openshift-controller-manager-operator,openshift-dns,openshift-dns-operator,openshift-etcd-operator,openshift-image-registry,openshift-ingress,openshift-ingress-operator,openshift-insights,openshift-kube-apiserver,openshift-kube-apiserver-operator,openshift-kube-controller-manager,openshift-kube-controller-manager-operator,openshift-kube-scheduler,openshift-kube-scheduler-operator,openshift-kube-storage-version-migrator,openshift-kube-storage-version-migrator-operator,openshift-machine-api,openshift-machine-config-operator,openshift-marketplace,openshift-monitoring,openshift-multus,openshift-operator-lifecycle-manager,openshift-sdn,openshift-service-ca-operator,openshift-service-catalog-removed,openshift-user-workload-monitoring terminationMessagePath: /dev/termination-log - --logtostderr terminationMessagePath: /dev/termination-log # oc -n openshift-user-workload-monitoring logs prometheus-user-workload-0 -c prometheus no output, since there is not warn info Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-06-30-000342 How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info:
# oc -n openshift-user-workload-monitoring get thanosruler user-workload -oyaml | grep logLevel logLevel: info # oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep logLevel no result
logLevel setting in thanosruler doesn't injected to thanos-ruler-user-workload statefulset, and there maybe no logLevel for prometheus-operator deploy
I see that logLevel info gets set to the ThanosRuler CR instance in openshift-user-workload-monitoring namespace: logLevel: info Can you confirm that as well?
Tested with 4.6.0-0.nightly-2020-07-25-091217, set logLevel: error for prometheusOperator, "- --log-level=error" is in prometheus-operator deployment, but still see the logs which logLevel is not error *************** apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheusOperator: logLevel: error **************** # oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep "log-level" - --log-level=error # oc -n openshift-user-workload-monitoring logs $(oc -n openshift-user-workload-monitoring get po | grep prometheus-operator | awk '{print $1}') -c prometheus-operator ts=2020-07-27T01:28:08.920655206Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'." ts=2020-07-27T01:28:08.934023544Z caller=main.go:104 msg="Starting insecure server on [::]:8080"
Created attachment 1702466 [details] prometheus-operator deployment file
@Junqi can you explain what failed? Seems like the deployment file got log-level=error?
(In reply to Lili Cosic from comment #13) > @Junqi can you explain what failed? Seems like the deployment file got > log-level=error? see from # oc -n openshift-user-workload-monitoring logs $(oc -n openshift-user-workload-monitoring get po | grep prometheus-operator | awk '{print $1}') -c prometheus-operator ts=2020-07-27T01:28:08.920655206Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'." ts=2020-07-27T01:28:08.934023544Z caller=main.go:104 msg="Starting insecure server on [::]:8080" we should only see error logs if we set log-level=error
Those are always there, regardless of which log level you set as its the first two info that always needs to be logged. It is expected to be logged. It just means to allow error logs, not to deny any other log levels. Hope that makes sense? https://github.com/coreos/prometheus-operator/blob/ad3571f1e23c51277f6522dee93919ce153d1f46/cmd/operator/main.go#L208
move to VERIFIED
(In reply to Lili Cosic from comment #15) > Those are always there, regardless of which log level you set as its the > first two info that always needs to be logged. It is expected to be logged. > It just means to allow error logs, not to deny any other log levels. Hope > that makes sense? > https://github.com/coreos/prometheus-operator/blob/ > ad3571f1e23c51277f6522dee93919ce153d1f46/cmd/operator/main.go#L208 there is an exception for logLevel: error, if there is not error log, then there is not logs output ********************** apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheusOperator: logLevel: error thanosRuler: logLevel: error ********************** # oc -n openshift-user-workload-monitoring logs thanos-ruler-user-workload-0 -c thanos-ruler no result # oc -n openshift-user-workload-monitoring logs prometheus-operator-56fcff76cd-pvlwx -c prometheus-operator ts=2020-07-28T05:50:33.440295623Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'." ts=2020-07-28T05:50:33.452148767Z caller=main.go:104 msg="Starting insecure server on [::]:8080"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196