Description of problem: found that there are a bunch of deleted /var/log/messages files occupied the /var disk space once restart atomic-openshift-node service, the occupied disk space will be released. Version-Release number of selected component (if applicable): openshift v3.1.1.6-26-g9549be3 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 How reproducible: almost all online cluster have this issue 100% reproduce Steps to Reproduce: 1. 2. 3. Actual results: [root@vm1 ~]# lsof 2>/dev/null | grep deleted | sort -k7 -n | tail -20 tuned 993 22369 root 6u REG 202,2 4096 16818306 /tmp/ffie1MOIC (deleted) openshift 65774 114440 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 17723 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 42340 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65686 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65775 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65776 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65777 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65778 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65779 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65780 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65783 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65831 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65835 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 65898 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 66653 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) openshift 65774 7534 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) tuned 993 root 6u REG 202,2 4096 16818306 /tmp/ffie1MOIC (deleted) monitor 22534 root 3w REG 202,3 407452 271 /var/log/openvswitch/ovs-vswitchd.log-20160303 (deleted) openshift 65774 root 21r REG 202,3 56708073 25167127 /var/log/messages-20160405 (deleted) [root@vm1 ~]# cd /var [root@vm1 var]# df -h . Filesystem Size Used Avail Use% Mounted on /dev/xvda3 8.0G 1.2G 6.9G 15% /var [root@vm1 var]# du -sh 1.1G . [root@vm1 var]# Expected results: Additional info:
Maybe the cadvisor code isn't handling rotated logs well when it's checking for OOM events. We'll take a look.
While we haven't definitively proven that the cadvisor code is the one holding the files open, I posted a PR upstream that makes cadvisor handle log rotation properly, closing and reopening the file so that it can be freed. https://github.com/google/cadvisor/pull/1264
PR to kube has merged: https://github.com/kubernetes/kubernetes/pull/25914. OpenShift will get this in whatever rebase contains that PR.
*** Bug 1333663 has been marked as a duplicate of this bug. ***
Origin rebase is complete: https://github.com/openshift/origin/pull/8856 I verified it contains the upstream fix from comment 3.
Sorry, looking at two bugs at once. It actually looks like this doesn't come in as part of the rebase and that the origin cadvisor dep needs to be bumped. Moving back to ASSIGNED.
This should be in the 3.3 builds now.
Test on openshift v3.3.0.9 There is no those error. verify this bug. On node: [root@ip-172-18-8-37 ~]# lsof 2>/dev/null | grep deleted [root@ip-172-18-8-37 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933