Description of problem: /var/log/openshift/node/cgroups-trace.log is completely filling up the /var partition in STG. This looks like a file used for debugging, and not something we want enabled for openshift production. I noticed the problem last night and so I truncated the files to see how fast they grow. Within 8 hours, they've already grown to be over 3 gigs in size on some hosts. I've tracked this down to watchman, here's a sample of what we see in that log file: [...snip...] August 04 00:59:48 INFO oo_spawn buffer(10/) 012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_periods 0 012d513770e04943b410a7a32d74b8fe/cpu.stat:nr_throttled 0 012d513770e04943b410a7a32d74b8fe/cpu.stat:throttled_time 0 0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_periods 0 0148fcf65cb74864acf97b550e171aad/cpu.stat:nr_throttled 0 0148fcf65cb74864acf97b550e171aad/cpu.stat:throttled_time 0 0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_periods 0 0188ce785dbe48f48b4a1e9c1443fd42/cpu.stat:nr_throttled 0 [...snip...] Version-Release number of selected component (if applicable): rhc-node-1.12.6-1.el6oso.x86_64 How reproducible: very Steps to Reproduce: 1. create a lot of apps that are active (90 or so) 2. make sure watchman is running 3. watch the size of that log file grow quickly Actual results: This debug log is on by default, and there doesn't seem to be a way to turn it off. As a result, it's filling up /var on our STG ex-nodes. Expected results: debug logs should either not be on by default, or there should at the very least be a way to shut them off. The way to shut them off should be clearly documented in the release ticket.
The rhc-watchman logger is now configurable. https://github.com/openshift/origin-server/pull/3285
Commit pushed to master at https://github.com/openshift/li https://github.com/openshift/li/commit/a3e3dc9bc1d318b0df85fce239333d705febe869 Bug 991824: Make watchman logging configurable via node.conf
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/4af06e3122131b6eadea65771d128d97c3c0deaa Bug 991824: Make watchman logging configurable via node.conf
Checked on devenv-stage_438, issue has been fixed. cgroup.log and cgroup-trace.log path are configurable for now. And can show the log with corresponding log level. Move bug to verified.
Add some more details about my verification. By default, the cgroup.log in INFO level and cgroup-trace.log in ERROR level. After some apps created, check the cgroup-trace.log under default path. #tailf /var/log/openshift/node/cgroup-trace.log There is nothing generated, since there is no error for this. Add the following lines to node.conf WATCHMAN_CGROUPS_LOG_FILE=/root/cgroup.log WATCHMAN_CGROUPS_LOG_LEVEL=DEBUG WATCHMAN_CGROUPS_TRACE_LOG_FILE=/root/cgroup-trace.log WATCHMAN_CGROUPS_TRACE_LOG_LEVEL=DEBUG Restart libra-watchman service. The new logs are generated under /root/ path. And check the log contents, it will show like following in cgroup-trace.log 5200a038461eb14370000004/cpuacct.usage:0 52d9daf2fe5f11e2bdf222000a8b1dc7/cpuacct.usage:0 72ebd064fe6611e2b04722000a8b1dc7/cpuacct.usage:0 128014944579843512074240/cpu.cfs_quota_us:100000 304472782972067923361792/cpu.cfs_quota_us:100000 3366b0bcfe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000 5200a038461eb14370000004/cpu.cfs_quota_us:200000 52d9daf2fe5f11e2bdf222000a8b1dc7/cpu.cfs_quota_us:100000 72ebd064fe6611e2b04722000a8b1dc7/cpu.cfs_quota_us:100000