Bug 1593634

Summary: OpenShift Heapster is logging a lot of "no pod" or "no container" found messages
Product: OpenShift Container Platform Reporter: Christian Stark <cstark>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: high    
Version: 3.9.0CC: alegrand, andcosta, anpicker, aos-bugs, cstark, ddelcian, erooth, fshaikh, hgomes, jforrest, jrosenta, maupadhy, mloibl, openshift-bugs-escalate, pdwyer, pkanthal, pkrupa, pyates, rbdiri, rgudimet, rhowe, rsandu, rsunog, surbania, vlaad
Target Milestone: ---Flags: rbdiri: needinfo? (rgudimet)
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-26 16:27:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
heapster_logs none

Description Christian Stark 2018-06-21 09:26:46 UTC
Description of problem:


Customer being on 3.7.44 faces the issue that the logs are full of messages like:

2018-06-19T14:11:42.817885000Z I0619 14:11:42.813407       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:49.217479000Z I0619 14:11:49.216839       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw
2018-06-19T14:11:54.231636000Z I0619 14:11:54.226316       1 handlers.go:215] No metrics for pod iis-tu/context-cache-3990787087-xhbdw

I0618 11:19:51.922817 1 handlers.go:264] No metrics for container mongodb-backup in pod assistify/mongodb-backup-1528314300-dxl6l
I0618 12:01:41.487533 1 handlers.go:264] No metrics for container docker-build in pod rim-eu/rim-eu-roter-stable-105-build
I0618 12:01:41.487473 1 handlers.go:264] No metrics for container sti-build in pod wd-dev/doc-3-build
I0618 12:19:08.011249 1 handlers.go:264] No metrics for container aggregated-cms-tools-job-s0a6q in pod ecm-eu/aggregated-cms-tools-job-


While the same message (for the same pod or container) gets printed 12000 times (often but not always every 5 seconds), customer has altogether >2 million entries like this over 2 days.

The pods are short running and are completed or terminated before the error occurs.


In the logs I found some occurrences of
https://bugzilla.redhat.com/show_bug.cgi?id=1539830
which should be fixed with latest errata but there are 25 entries...

No further issues seen in the logs, also he uses standard settings.


Version-Release number of selected component (if applicable):

[computer@ocp-ansible]$ oc version

oc v3.7.44

kubernetes v1.7.6+a08f5eeb62

features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

lots of log entries as described above

Expected results:

no unnecessary logs anymore

no log entries

Additional info:

checked further erratas but found no relevant bug

Comment 2 Solly Ross 2018-06-21 15:15:55 UTC
Someone's trying to access Heapster (either via an HPA or the dashboard), and Heapster is logging an error message when it couldn't find any metrics for that container/pod.  If the pod is short-running or terminated, that's somewhat expected behavior to see that log message.  We could kill the log message, but it can be a useful debug tool if you know what you're looking for.  Does the customer know what's requesting the metrics?

Comment 17 Ruben Vargas Palma 2019-02-20 17:24:18 UTC
*** Bug 1636453 has been marked as a duplicate of this bug. ***

Comment 20 Ryan Howe 2019-02-21 14:20:27 UTC
As a workaround with the log issue, you can set the log level for heapster to 1 or 0. This will stop you from seeing the log "No metrics for container %s in pod %s/%s"[1].

To set the log level add `--v=1` to the replication controller for heapster under template.spec.containers.command

Comment 21 ravig 2019-04-02 18:46:52 UTC
@Fatima @Christian,

Can you update if the workaround suggested by Ryan is ok or do you want to go with the code change I suggested?I am planning to close this bug, if we don't have any update within a week.

Comment 22 Fatima 2019-04-03 00:54:15 UTC
Hi Ravi,

It seems like the case was closed automatically because customer did not respond.

Last comment on the case was by a Red Hatter (most probably a SA for that cu), that the cu will try the suggested workaround.

You can close this bug after Christian's approval ;)

Thanks,
Fatima

Comment 23 Andre Costa 2019-04-03 07:40:07 UTC
Hi,

I don't think we can close this ticket. I have been working with the customer and that workaround is not working.

Customer has updated the cluster and they are now on OCP v3.9.

Thank you

Comment 24 ravig 2019-04-03 18:39:50 UTC
Andre,

Can you provide the heapster logs when log-level has been set to 1?

Comment 25 Andre Costa 2019-04-04 07:12:53 UTC
Ravig,

I asked the updated logs from heapster.

Comment 26 Andre Costa 2019-04-10 08:51:12 UTC
Created attachment 1554127 [details]
heapster_logs

Comment 27 Andre Costa 2019-04-10 08:51:48 UTC
Hi,

I'm updating with the recent logs from heapster.

Comment 30 Seth Jennings 2019-06-18 13:52:44 UTC
*** Bug 1720246 has been marked as a duplicate of this bug. ***

Comment 34 Frederic Branczyk 2019-07-02 12:32:25 UTC
Moving to modified as the PR to configure the log verbosity has been merged. https://github.com/openshift/openshift-ansible/pull/11735

With it customers can set the log verbosity to 0 (default 1) if they don't want to see these messages.

Comment 36 Junqi Zhao 2019-08-20 11:23:00 UTC
openshift_metrics_heapster_log_verbosity ansible parameter is added, default value is 1.
set openshift_metrics_heapster_log_verbosity=0 could reduce the verbose

# rpm -qa | grep ansible
openshift-ansible-docs-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-roles-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-playbooks-3.9.99-1.git.0.6d3f661.el7.noarch
openshift-ansible-3.9.99-1.git.0.6d3f661.el7.noarch
ansible-2.4.6.0-1.el7ae.noarch

Comment 38 errata-xmlrpc 2019-08-26 16:27:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2550