Description of problem: Customer being on 3.7.44 faces the issue that the logs are full of messages like: 2018-06-19T14:11:42.817885000Z I0619 14:11:42.813407 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw 2018-06-19T14:11:49.217479000Z I0619 14:11:49.216839 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw 2018-06-19T14:11:54.231636000Z I0619 14:11:54.226316 1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw I0618 11:19:51.922817 1 handlers.go:264] No metrics for container mongodb-backup in pod assistify/mongodb-backup-1528314300-dxl6l I0618 12:01:41.487533 1 handlers.go:264] No metrics for container docker-build in pod rim-eu/rim-eu-roter-stable-105-build I0618 12:01:41.487473 1 handlers.go:264] No metrics for container sti-build in pod wd-dev/doc-3-build I0618 12:19:08.011249 1 handlers.go:264] No metrics for container aggregated-cms-tools-job-s0a6q in pod ecm-eu/aggregated-cms-tools-job- While the same message (for the same pod or container) gets printed 12000 times (often but not always every 5 seconds), customer has altogether >2 million entries like this over 2 days. The pods are short running and are completed or terminated before the error occurs. In the logs I found some occurrences of https://bugzilla.redhat.com/show_bug.cgi?id=1539830 which should be fixed with latest errata but there are 25 entries... No further issues seen in the logs, also he uses standard settings. Version-Release number of selected component (if applicable): [computer@ocp-ansible]$ oc version oc v3.7.44 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: lots of log entries as described above Expected results: no unnecessary logs anymore no log entries Additional info: checked further erratas but found no relevant bug
Someone's trying to access Heapster (either via an HPA or the dashboard), and Heapster is logging an error message when it couldn't find any metrics for that container/pod. If the pod is short-running or terminated, that's somewhat expected behavior to see that log message. We could kill the log message, but it can be a useful debug tool if you know what you're looking for. Does the customer know what's requesting the metrics?
*** Bug 1636453 has been marked as a duplicate of this bug. ***
As a workaround with the log issue, you can set the log level for heapster to 1 or 0. This will stop you from seeing the log "No metrics for container %s in pod %s/%s"[1]. To set the log level add `--v=1` to the replication controller for heapster under template.spec.containers.command
@Fatima @Christian, Can you update if the workaround suggested by Ryan is ok or do you want to go with the code change I suggested?I am planning to close this bug, if we don't have any update within a week.
Hi Ravi, It seems like the case was closed automatically because customer did not respond. Last comment on the case was by a Red Hatter (most probably a SA for that cu), that the cu will try the suggested workaround. You can close this bug after Christian's approval ;) Thanks, Fatima
Hi, I don't think we can close this ticket. I have been working with the customer and that workaround is not working. Customer has updated the cluster and they are now on OCP v3.9. Thank you
Andre, Can you provide the heapster logs when log-level has been set to 1?
Ravig, I asked the updated logs from heapster.
Created attachment 1554127 [details] heapster_logs
Hi, I'm updating with the recent logs from heapster.
*** Bug 1720246 has been marked as a duplicate of this bug. ***
Moving to modified as the PR to configure the log verbosity has been merged. https://github.com/openshift/openshift-ansible/pull/11735 With it customers can set the log verbosity to 0 (default 1) if they don't want to see these messages.
openshift_metrics_heapster_log_verbosity ansible parameter is added, default value is 1. set openshift_metrics_heapster_log_verbosity=0 could reduce the verbose # rpm -qa | grep ansible openshift-ansible-docs-3.9.99-1.git.0.6d3f661.el7.noarch openshift-ansible-roles-3.9.99-1.git.0.6d3f661.el7.noarch openshift-ansible-playbooks-3.9.99-1.git.0.6d3f661.el7.noarch openshift-ansible-3.9.99-1.git.0.6d3f661.el7.noarch ansible-2.4.6.0-1.el7ae.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2550
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days