Bug 1593634 - OpenShift Heapster is logging a lot of "no pod" or "no container" found messages [NEEDINFO]
Summary: OpenShift Heapster is logging a lot of "no pod" or "no container" found messages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 3.9.z
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1636453 1720246 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-21 09:26 UTC by Christian Stark
Modified: 2019-08-26 16:27 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-26 16:27:38 UTC
Target Upstream Version:
rbdiri: needinfo? (rgudimet)


Attachments (Terms of Use)
heapster_logs (7.04 KB, text/plain)
2019-04-10 08:51 UTC, Andre Costa
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1636453 None CLOSED unable to get metrics for resource cpu: no metrics returned from heapster 2019-10-08 04:44:38 UTC
Red Hat Product Errata RHBA-2019:2550 None None None 2019-08-26 16:27:42 UTC

Internal Links: 1636453

Description Christian Stark 2018-06-21 09:26:46 UTC
Description of problem:


Customer being on 3.7.44 faces the issue that the logs are full of messages like:

2018-06-19T14:11:42.817885000Z I0619 14:11:42.813407       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:49.217479000Z I0619 14:11:49.216839       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:54.231636000Z I0619 14:11:54.226316       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw

I0618 11:19:51.922817 1 handlers.go:264] No metrics for container mongodb-backup in pod assistify/mongodb-backup-1528314300-dxl6l
I0618 12:01:41.487533 1 handlers.go:264] No metrics for container docker-build in pod rim-eu/rim-eu-roter-stable-105-build
I0618 12:01:41.487473 1 handlers.go:264] No metrics for container sti-build in pod wd-dev/doc-3-build
I0618 12:19:08.011249 1 handlers.go:264] No metrics for container aggregated-cms-tools-job-s0a6q in pod ecm-eu/aggregated-cms-tools-job-


While the same message (for the same pod or container) gets printed 12000 times (often but not always every 5 seconds), customer has altogether >2 million entries like this over 2 days.

The pods are short running and are completed or terminated before the error occurs.


In the logs I found some occurrences of
https://bugzilla.redhat.com/show_bug.cgi?id=1539830
which should be fixed with latest errata but there are 25 entries...

No further issues seen in the logs, also he uses standard settings.


Version-Release number of selected component (if applicable):

[computer@ocp-ansible]$ oc version

oc v3.7.44

kubernetes v1.7.6+a08f5eeb62

features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

lots of log entries as described above

Expected results:

no unnecessary logs anymore

no log entries

Additional info:

checked further erratas but found no relevant bug

Comment 2 Solly Ross 2018-06-21 15:15:55 UTC
Someone's trying to access Heapster (either via an HPA or the dashboard), and Heapster is logging an error message when it couldn't find any metrics for that container/pod.  If the pod is short-running or terminated, that's somewhat expected behavior to see that log message.  We could kill the log message, but it can be a useful debug tool if you know what you're looking for.  Does the customer know what's requesting the metrics?

Comment 17 Ruben Vargas Palma 2019-02-20 17:24:18 UTC
*** Bug 1636453 has been marked as a duplicate of this bug. ***

Comment 20 Ryan Howe 2019-02-21 14:20:27 UTC
As a workaround with the log issue, you can set the log level for heapster to 1 or 0. This will stop you from seeing the log "No metrics for container %s in pod %s/%s"[1].

To set the log level add `--v=1` to the replication controller for heapster under template.spec.containers.command

Comment 21 ravig 2019-04-02 18:46:52 UTC
@Fatima @Christian,

Can you update if the workaround suggested by Ryan is ok or do you want to go with the code change I suggested?I am planning to close this bug, if we don't have any update within a week.

Comment 22 Fatima 2019-04-03 00:54:15 UTC
Hi Ravi,

It seems like the case was closed automatically because customer did not respond.

Last comment on the case was by a Red Hatter (most probably a SA for that cu), that the cu will try the suggested workaround.

You can close this bug after Christian's approval ;)

Thanks,
Fatima

Comment 23 Andre Costa 2019-04-03 07:40:07 UTC
Hi,

I don't think we can close this ticket. I have been working with the customer and that workaround is not working.

Customer has updated the cluster and they are now on OCP v3.9.

Thank you

Comment 24 ravig 2019-04-03 18:39:50 UTC
Andre,

Can you provide the heapster logs when log-level has been set to 1?

Comment 25 Andre Costa 2019-04-04 07:12:53 UTC
Ravig,

I asked the updated logs from heapster.

Comment 26 Andre Costa 2019-04-10 08:51:12 UTC
Created attachment 1554127 [details]
heapster_logs

Comment 27 Andre Costa 2019-04-10 08:51:48 UTC
Hi,

I'm updating with the recent logs from heapster.

Comment 30 Seth Jennings 2019-06-18 13:52:44 UTC
*** Bug 1720246 has been marked as a duplicate of this bug. ***

Comment 34 Frederic Branczyk 2019-07-02 12:32:25 UTC
Moving to modified as the PR to configure the log verbosity has been merged. https://github.com/openshift/openshift-ansible/pull/11735

With it customers can set the log verbosity to 0 (default 1) if they don't want to see these messages.

Comment 36 Junqi Zhao 2019-08-20 11:23:00 UTC
openshift_metrics_heapster_log_verbosity ansible parameter is added, default value is 1.
set openshift_metrics_heapster_log_verbosity=0 could reduce the verbose

# rpm -qa | grep ansible
openshift-ansible-docs-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-roles-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-playbooks-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-3.9.99-1.git.0.6d3f661.el7.noarch
ansible-2.4.6.0-1.el7ae.noarch

Comment 38 errata-xmlrpc 2019-08-26 16:27:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2550


Note You need to log in before you can comment on or make changes to this bug.