Description of problem:
Customer being on 3.7.44 faces the issue that the logs are full of messages like:
2018-06-19T14:11:42.817885000Z I0619 14:11:42.813407 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:49.217479000Z I0619 14:11:49.216839 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:54.231636000Z I0619 14:11:54.226316 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
I0618 11:19:51.922817 1 handlers.go:264] No metrics for container mongodb-backup in pod assistify/mongodb-backup-1528314300-dxl6l
I0618 12:01:41.487533 1 handlers.go:264] No metrics for container docker-build in pod rim-eu/rim-eu-roter-stable-105-build
I0618 12:01:41.487473 1 handlers.go:264] No metrics for container sti-build in pod wd-dev/doc-3-build
I0618 12:19:08.011249 1 handlers.go:264] No metrics for container aggregated-cms-tools-job-s0a6q in pod ecm-eu/aggregated-cms-tools-job-
While the same message (for the same pod or container) gets printed 12000 times (often but not always every 5 seconds), customer has altogether >2 million entries like this over 2 days.
The pods are short running and are completed or terminated before the error occurs.
In the logs I found some occurrences of
which should be fixed with latest errata but there are 25 entries...
No further issues seen in the logs, also he uses standard settings.
Version-Release number of selected component (if applicable):
[computer@ocp-ansible]$ oc version
features: Basic-Auth GSSAPI Kerberos SPNEGO
Steps to Reproduce:
lots of log entries as described above
no unnecessary logs anymore
no log entries
checked further erratas but found no relevant bug
Someone's trying to access Heapster (either via an HPA or the dashboard), and Heapster is logging an error message when it couldn't find any metrics for that container/pod. If the pod is short-running or terminated, that's somewhat expected behavior to see that log message. We could kill the log message, but it can be a useful debug tool if you know what you're looking for. Does the customer know what's requesting the metrics?
*** Bug 1636453 has been marked as a duplicate of this bug. ***
As a workaround with the log issue, you can set the log level for heapster to 1 or 0. This will stop you from seeing the log "No metrics for container %s in pod %s/%s".
To set the log level add `--v=1` to the replication controller for heapster under template.spec.containers.command
Can you update if the workaround suggested by Ryan is ok or do you want to go with the code change I suggested?I am planning to close this bug, if we don't have any update within a week.
It seems like the case was closed automatically because customer did not respond.
Last comment on the case was by a Red Hatter (most probably a SA for that cu), that the cu will try the suggested workaround.
You can close this bug after Christian's approval ;)
I don't think we can close this ticket. I have been working with the customer and that workaround is not working.
Customer has updated the cluster and they are now on OCP v3.9.
Can you provide the heapster logs when log-level has been set to 1?
I asked the updated logs from heapster.
Created attachment 1554127 [details]
I'm updating with the recent logs from heapster.
*** Bug 1720246 has been marked as a duplicate of this bug. ***
Moving to modified as the PR to configure the log verbosity has been merged. https://github.com/openshift/openshift-ansible/pull/11735
With it customers can set the log verbosity to 0 (default 1) if they don't want to see these messages.
openshift_metrics_heapster_log_verbosity ansible parameter is added, default value is 1.
set openshift_metrics_heapster_log_verbosity=0 could reduce the verbose
# rpm -qa | grep ansible
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.