This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 991824 - watchman's cgroups-trace.log file completely fills up /var in STG
watchman's cgroups-trace.log file completely fills up /var in STG
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Dan Mace
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-04 12:28 EDT by Thomas Wiest
Modified: 2015-05-14 19:25 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-07 18:59:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2013-08-04 12:28:07 EDT
Description of problem:
/var/log/openshift/node/cgroups-trace.log is completely filling up the /var partition in STG. This looks like a file used for debugging, and not something we want enabled for openshift production.

I noticed the problem last night and so I truncated the files to see how fast they grow. Within 8 hours, they've already grown to be over 3 gigs in size on some hosts.

I've tracked this down to watchman, here's a sample of what we see in that log file:

[...snip...]
August 04 00:59:48 INFO oo_spawn buffer(10/) 012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_periods 0
012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_throttled 0
012d513770e04943b410a7a32d74b8fe/cpu.stat:throttled_time 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_periods 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_throttled 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:throttled_time 0
0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_periods 0
0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_throttled 0
[...snip...]


Version-Release number of selected component (if applicable):
rhc-node-1.12.6-1.el6oso.x86_64

How reproducible:
very

Steps to Reproduce:
1. create a lot of apps that are active (90 or so)
2. make sure watchman is running
3. watch the size of that log file grow quickly


Actual results:
This debug log is on by default, and there doesn't seem to be a way to turn it off.

As a result, it's filling up /var on our STG ex-nodes.

Expected results:
debug logs should either not be on by default, or there should at the very least be a way to shut them off.

The way to shut them off should be clearly documented in the release ticket.
Comment 1 Dan Mace 2013-08-05 10:14:44 EDT
The rhc-watchman logger is now configurable.

https://github.com/openshift/origin-server/pull/3285
Comment 2 openshift-github-bot 2013-08-05 13:05:10 EDT
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/a3e3dc9bc1d318b0df85fce239333d705febe869
Bug 991824: Make watchman logging configurable via node.conf
Comment 3 openshift-github-bot 2013-08-05 13:28:21 EDT
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/4af06e3122131b6eadea65771d128d97c3c0deaa
Bug 991824: Make watchman logging configurable via node.conf
Comment 4 Meng Bo 2013-08-06 03:15:14 EDT
Checked on devenv-stage_438, issue has been fixed.

cgroup.log and cgroup-trace.log path are configurable for now.
And can show the log with corresponding log level.

Move bug to verified.
Comment 5 Meng Bo 2013-08-06 03:33:09 EDT
Add some more details about my verification.

By default, the cgroup.log in INFO level and cgroup-trace.log in ERROR level.
After some apps created, check the cgroup-trace.log under default path.

#tailf /var/log/openshift/node/cgroup-trace.log

There is nothing generated, since there is no error for this.

Add the following lines to node.conf

WATCHMAN_CGROUPS_LOG_FILE=/root/cgroup.log
WATCHMAN_CGROUPS_LOG_LEVEL=DEBUG
WATCHMAN_CGROUPS_TRACE_LOG_FILE=/root/cgroup-trace.log
WATCHMAN_CGROUPS_TRACE_LOG_LEVEL=DEBUG

Restart libra-watchman service.

The new logs are generated under /root/ path.

And check the log contents, it will show like following in cgroup-trace.log

5200a038461eb14370000004/cpuacct.usage:0
52d9daf2fe5f11e2bdf222000a8b1dc7/cpuacct.usage:0
72ebd064fe6611e2b04722000a8b1dc7/cpuacct.usage:0
128014944579843512074240/cpu.cfs_quota_us:100000
304472782972067923361792/cpu.cfs_quota_us:100000
3366b0bcfe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000
5200a038461eb14370000004/cpu.cfs_quota_us:200000
52d9daf2fe5f11e2bdf222000a8b1dc7/cpu.cfs_quota_us:100000
72ebd064fe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000

Note You need to log in before you can comment on or make changes to this bug.