Bug 1852846 - logLevel for prometheusOperator/thanosRuler don't work
Summary: logLevel for prometheusOperator/thanosRuler don't work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Lili Cosic
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-01 12:32 UTC by Junqi Zhao
Modified: 2020-10-27 16:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:11:46 UTC
Target Upstream Version:


Attachments (Terms of Use)
openshift-user-workload-monitoring dump file (299.48 KB, application/gzip)
2020-07-01 12:32 UTC, Junqi Zhao
no flags Details
prometheus-operator deployment file (9.33 KB, text/plain)
2020-07-27 01:58 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 875 0 None closed Bug 1852846: pkg/manifests/manifests.go: Set logLevel for prometheus Operator 2020-09-14 17:30:58 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:12:05 UTC

Description Junqi Zhao 2020-07-01 12:32:15 UTC
Created attachment 1699481 [details]
openshift-user-workload-monitoring dump file

Description of problem:
enable enableUserWorkload and set logLevel for prometheusOperator/prometheus/thanosRuler, logLevel for prometheusOperator/thanosRuler don't work, only work for prometheus-user-workload pods, more details please see the attached dump file
******************************
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true
******************************
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheusOperator:
      logLevel: error
    prometheus:
      logLevel: warn
      retention: 48h
    thanosRuler:
      logLevel: info
******************************

# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml |  grep "log.level"
        - --log.level=warn

# oc -n openshift-user-workload-monitoring get sts  thanos-ruler-user-workload -oyaml |  grep "log"
        terminationMessagePath: /dev/termination-log
        terminationMessagePath: /dev/termination-log
        terminationMessagePath: /dev/termination-log

# oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml |  grep "log"
        - --logtostderr=true
        - --deny-namespaces=openshift-apiserver,openshift-apiserver-operator,openshift-authentication,openshift-authentication-operator,openshift-cloud-credential-operator,openshift-cluster-machine-approver,openshift-cluster-samples-operator,openshift-cluster-storage-operator,openshift-cluster-version,openshift-config-operator,openshift-console-operator,openshift-controller-manager,openshift-controller-manager-operator,openshift-dns,openshift-dns-operator,openshift-etcd-operator,openshift-image-registry,openshift-ingress,openshift-ingress-operator,openshift-insights,openshift-kube-apiserver,openshift-kube-apiserver-operator,openshift-kube-controller-manager,openshift-kube-controller-manager-operator,openshift-kube-scheduler,openshift-kube-scheduler-operator,openshift-kube-storage-version-migrator,openshift-kube-storage-version-migrator-operator,openshift-machine-api,openshift-machine-config-operator,openshift-marketplace,openshift-monitoring,openshift-multus,openshift-operator-lifecycle-manager,openshift-sdn,openshift-service-ca-operator,openshift-service-catalog-removed,openshift-user-workload-monitoring
        terminationMessagePath: /dev/termination-log
        - --logtostderr
        terminationMessagePath: /dev/termination-log

# oc -n openshift-user-workload-monitoring logs prometheus-user-workload-0 -c prometheus
no output, since there is not warn info

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-06-30-000342

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Junqi Zhao 2020-07-01 12:43:35 UTC
# oc -n openshift-user-workload-monitoring get thanosruler user-workload -oyaml | grep logLevel
  logLevel: info

#  oc -n openshift-user-workload-monitoring get deploy prometheus-operator  -oyaml | grep logLevel
no result

Comment 2 Junqi Zhao 2020-07-01 12:45:33 UTC
logLevel setting in thanosruler doesn't injected to thanos-ruler-user-workload statefulset, and there maybe no logLevel for prometheus-operator deploy

Comment 3 Lili Cosic 2020-07-01 13:10:40 UTC
I see that logLevel info gets set to the ThanosRuler CR instance in openshift-user-workload-monitoring namespace:
 
 logLevel: info

Can you confirm that as well?

Comment 11 Junqi Zhao 2020-07-27 01:56:49 UTC
Tested with 4.6.0-0.nightly-2020-07-25-091217, set logLevel: error for prometheusOperator, "- --log-level=error" is in prometheus-operator deployment, but still see the logs which logLevel is not error
***************
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheusOperator:
      logLevel: error
****************
# oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep "log-level"
        - --log-level=error

# oc -n openshift-user-workload-monitoring logs $(oc -n openshift-user-workload-monitoring get po | grep prometheus-operator | awk '{print $1}') -c prometheus-operator
ts=2020-07-27T01:28:08.920655206Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'."
ts=2020-07-27T01:28:08.934023544Z caller=main.go:104 msg="Starting insecure server on [::]:8080"

Comment 12 Junqi Zhao 2020-07-27 01:58:32 UTC
Created attachment 1702466 [details]
prometheus-operator deployment file

Comment 13 Lili Cosic 2020-07-27 09:08:41 UTC
@Junqi can you explain what failed? Seems like the deployment file got log-level=error?

Comment 14 Junqi Zhao 2020-07-27 09:43:14 UTC
(In reply to Lili Cosic from comment #13)
> @Junqi can you explain what failed? Seems like the deployment file got
> log-level=error?

see from
# oc -n openshift-user-workload-monitoring logs $(oc -n openshift-user-workload-monitoring get po | grep prometheus-operator | awk '{print $1}') -c prometheus-operator
ts=2020-07-27T01:28:08.920655206Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'."
ts=2020-07-27T01:28:08.934023544Z caller=main.go:104 msg="Starting insecure server on [::]:8080"

we should only see error logs if we set log-level=error

Comment 15 Lili Cosic 2020-07-27 09:46:44 UTC
Those are always there, regardless of which log level you set as its the first two info that always needs to be logged. It is expected to be logged. It just means to allow error logs, not to deny any other log levels. Hope that makes sense?
https://github.com/coreos/prometheus-operator/blob/ad3571f1e23c51277f6522dee93919ce153d1f46/cmd/operator/main.go#L208

Comment 16 Junqi Zhao 2020-07-27 10:45:23 UTC
move to VERIFIED

Comment 17 Junqi Zhao 2020-07-28 06:05:53 UTC
(In reply to Lili Cosic from comment #15)
> Those are always there, regardless of which log level you set as its the
> first two info that always needs to be logged. It is expected to be logged.
> It just means to allow error logs, not to deny any other log levels. Hope
> that makes sense?
> https://github.com/coreos/prometheus-operator/blob/
> ad3571f1e23c51277f6522dee93919ce153d1f46/cmd/operator/main.go#L208

there is an exception for logLevel: error, if there is not error log, then there is not logs output

**********************
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheusOperator:
      logLevel: error
    thanosRuler:
      logLevel: error
**********************
# oc -n openshift-user-workload-monitoring logs thanos-ruler-user-workload-0 -c thanos-ruler
no result

# oc -n openshift-user-workload-monitoring logs prometheus-operator-56fcff76cd-pvlwx -c prometheus-operator
ts=2020-07-28T05:50:33.440295623Z caller=main.go:217 msg="Starting Prometheus Operator version '0.40.0'."
ts=2020-07-28T05:50:33.452148767Z caller=main.go:104 msg="Starting insecure server on [::]:8080"

Comment 19 errata-xmlrpc 2020-10-27 16:11:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.