Bug 991824

Summary: watchman's cgroups-trace.log file completely fills up /var in STG
Product: OpenShift Online Reporter: Thomas Wiest <twiest>
Component: ContainersAssignee: Dan Mace <dmace>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: bmeng, dmace
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-07 22:59:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thomas Wiest 2013-08-04 16:28:07 UTC
Description of problem:
/var/log/openshift/node/cgroups-trace.log is completely filling up the /var partition in STG. This looks like a file used for debugging, and not something we want enabled for openshift production.

I noticed the problem last night and so I truncated the files to see how fast they grow. Within 8 hours, they've already grown to be over 3 gigs in size on some hosts.

I've tracked this down to watchman, here's a sample of what we see in that log file:

[...snip...]
August 04 00:59:48 INFO oo_spawn buffer(10/) 012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_periods 0
012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_throttled 0
012d513770e04943b410a7a32d74b8fe/cpu.stat:throttled_time 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_periods 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_throttled 0
0148fcf65cb74864acf97b550e171aad/cpu.stat:throttled_time 0
0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_periods 0
0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_throttled 0
[...snip...]


Version-Release number of selected component (if applicable):
rhc-node-1.12.6-1.el6oso.x86_64

How reproducible:
very

Steps to Reproduce:
1. create a lot of apps that are active (90 or so)
2. make sure watchman is running
3. watch the size of that log file grow quickly


Actual results:
This debug log is on by default, and there doesn't seem to be a way to turn it off.

As a result, it's filling up /var on our STG ex-nodes.

Expected results:
debug logs should either not be on by default, or there should at the very least be a way to shut them off.

The way to shut them off should be clearly documented in the release ticket.

Comment 1 Dan Mace 2013-08-05 14:14:44 UTC
The rhc-watchman logger is now configurable.

https://github.com/openshift/origin-server/pull/3285

Comment 2 openshift-github-bot 2013-08-05 17:05:10 UTC
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/a3e3dc9bc1d318b0df85fce239333d705febe869
Bug 991824: Make watchman logging configurable via node.conf

Comment 3 openshift-github-bot 2013-08-05 17:28:21 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/4af06e3122131b6eadea65771d128d97c3c0deaa
Bug 991824: Make watchman logging configurable via node.conf

Comment 4 Meng Bo 2013-08-06 07:15:14 UTC
Checked on devenv-stage_438, issue has been fixed.

cgroup.log and cgroup-trace.log path are configurable for now.
And can show the log with corresponding log level.

Move bug to verified.

Comment 5 Meng Bo 2013-08-06 07:33:09 UTC
Add some more details about my verification.

By default, the cgroup.log in INFO level and cgroup-trace.log in ERROR level.
After some apps created, check the cgroup-trace.log under default path.

#tailf /var/log/openshift/node/cgroup-trace.log

There is nothing generated, since there is no error for this.

Add the following lines to node.conf

WATCHMAN_CGROUPS_LOG_FILE=/root/cgroup.log
WATCHMAN_CGROUPS_LOG_LEVEL=DEBUG
WATCHMAN_CGROUPS_TRACE_LOG_FILE=/root/cgroup-trace.log
WATCHMAN_CGROUPS_TRACE_LOG_LEVEL=DEBUG

Restart libra-watchman service.

The new logs are generated under /root/ path.

And check the log contents, it will show like following in cgroup-trace.log

5200a038461eb14370000004/cpuacct.usage:0
52d9daf2fe5f11e2bdf222000a8b1dc7/cpuacct.usage:0
72ebd064fe6611e2b04722000a8b1dc7/cpuacct.usage:0
128014944579843512074240/cpu.cfs_quota_us:100000
304472782972067923361792/cpu.cfs_quota_us:100000
3366b0bcfe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000
5200a038461eb14370000004/cpu.cfs_quota_us:200000
52d9daf2fe5f11e2bdf222000a8b1dc7/cpu.cfs_quota_us:100000
72ebd064fe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000