Bug 989706 - Throttler is spamming the node platform.log with cpu utilization stats
Summary: Throttler is spamming the node platform.log with cpu utilization stats
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Fotios Lindiakos
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-29 18:25 UTC by Dan Mace
Modified: 2015-05-14 23:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-07 22:57:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dan Mace 2013-07-29 18:25:16 UTC
Description of problem:

The new cpu throttler component is spamming the platform.log file every 5 seconds with cpu stats such as:

July 29 14:23:30 INFO Shell command 'grep -H "" */{cpu.stat,cpuacct.usage,cpu.cfs_quota_us} 2> /dev/null' ran. rc=0 out=419903911671339895750656/cpu.stat:nr_periods 2736
419903911671339895750656/cpu.stat:nr_throttled 2649
419903911671339895750656/cpu.stat:throttled_time 194878040603
419903911671339895750656/cpuacct.usage:81604710920
419903911671339895750656/cpu.cfs_quota_us:100000


This is obfuscating and growing the platform log. Either change the logging such that it's squelched in the platform logs by default, or introduce a new log file for these stats. Or something else entirely, so long as the platform logs are no longer spammed.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Fotios Lindiakos 2013-07-30 15:47:47 UTC
PR submitted: https://github.com/openshift/origin-server/pull/3221

Comment 2 openshift-github-bot 2013-07-31 00:31:59 UTC
Commits pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/5047581cc74a83d9694036a2166834327f235d74
Bug 989706: Added new logger for watchman

https://github.com/openshift/li/commit/6bf7989b37cf35e5c91f839a2591e857e23c312b
Merge pull request #1782 from fotioslindiakos/Bug989706

Merged by openshift-bot

Comment 3 openshift-github-bot 2013-07-31 00:32:01 UTC
Commits pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/7d628a525d50897c47055016bf1ba0940a5de85e
Bug 989706: Quiet extra output from Libcgroup.usage

https://github.com/openshift/origin-server/commit/d59773c8bb554dcb05584fb4b452828fb6ad18b9
Merge pull request #3221 from fotioslindiakos/Bug989706

Merged by openshift-bot

Comment 4 Meng Bo 2013-07-31 10:02:56 UTC
Checked on devenv_3591, the cgroup related log will not be wrote to platform.log anymore.

But for the new cgroup.log, there is nothing generated in it.

Assign the bug back.

Comment 5 Fotios Lindiakos 2013-07-31 16:22:56 UTC
This was a different bug I introduced with the logging change. New PR inbound: https://github.com/openshift/origin-server/pull/3242

Note: The messages about throttling will still show up in /var/log/message from rhc-watchman. 

The log information from oo_spawn (like grep commands) should appear in /var/log/openshift/cgroups.log and cgroups-trace.log (and not platform.log and platform-trace.log)

Comment 6 Fotios Lindiakos 2013-07-31 18:06:06 UTC
The reason there was no log data was because of https://bugzilla.redhat.com/show_bug.cgi?id=990499

This PR should fix both.

Comment 7 openshift-github-bot 2013-07-31 18:07:21 UTC
Commits pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/271ba09e3292c7cecc67e9e01fc3b0ec66079c80
Bug 989706: Throttler dies if no cgroups are present

https://bugzilla.redhat.com/show_bug.cgi?id=989706

If there were no cgroups present, the `grep` command would fail.
By not expecting a 0 exit status, we can silently ignore this problem.

https://github.com/openshift/origin-server/commit/d1cea135fa91527ad1cc69fa922072803021ebfd
Merge pull request #3242 from fotioslindiakos/Bug989706

Merged by openshift-bot

Comment 8 Meng Bo 2013-08-01 04:24:12 UTC
Checked on devenv-stage_429, issue has been fixed.

cgroup related info will be logged into cgroup-trace.log

# tailf /var/log/openshift/node/cgroups-trace.log.log 
tailf: cannot stat "/var/log/openshift/node/cgroups-trace.log.log": No such file or directory
[root@ip-10-245-134-221 ~]# tailf /var/log/openshift/node/cgroups-trace.log
7e534df0fa6111e2a7d312313d01852f/cpuacct.usage:18802369166
7e534df0fa6111e2a7d312313d01852f/cpu.cfs_quota_us:30000

August 01 00:21:41 INFO oo_spawn running grep -H "" */{cpu.stat,cpuacct.usage,cpu.cfs_quota_us} 2> /dev/null: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :chdir=>"/cgroup/all/openshift", :out=>#<IO:fd 12>, :err=>#<IO:fd 10>}
August 01 00:21:41 INFO oo_spawn buffer(11/) 7e534df0fa6111e2a7d312313d01852f/cpu.stat:nr_periods 744
7e534df0fa6111e2a7d312313d01852f/cpu.stat:nr_throttled 328
7e534df0fa6111e2a7d312313d01852f/cpu.stat:throttled_time 21575515452
7e534df0fa6111e2a7d312313d01852f/cpuacct.usage:20308906976
7e534df0fa6111e2a7d312313d01852f/cpu.cfs_quota_us:30000

August 01 00:21:46 INFO oo_spawn running grep -H "" */{cpu.stat,cpuacct.usage,cpu.cfs_quota_us} 2> /dev/null: {:unsetenv_others=>false, :close_others=>true, :in=>"/dev/null", :chdir=>"/cgroup/all/openshift", :out=>#<IO:fd 12>, :err=>#<IO:fd 10>}
August 01 00:21:46 INFO oo_spawn buffer(11/) 7e534df0fa6111e2a7d312313d01852f/cpu.stat:nr_periods 794
7e534df0fa6111e2a7d312313d01852f/cpu.stat:nr_throttled 378
7e534df0fa6111e2a7d312313d01852f/cpu.stat:throttled_time 24856401724
7e534df0fa6111e2a7d312313d01852f/cpuacct.usage:21827493687
7e534df0fa6111e2a7d312313d01852f/cpu.cfs_quota_us:30000

Comment 9 openshift-github-bot 2013-08-01 17:20:48 UTC
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/51fd01dfc57893fc6945d4ba5a247890037001d0
Bug 989706: Added new logger for watchman


Note You need to log in before you can comment on or make changes to this bug.