Bug 1331235 - deleted /var/log/messages occupied the disk space /var
Summary: deleted /var/log/messages occupied the disk space /var
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
: 1333663 (view as bug list)
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2016-04-28 04:33 UTC by Zhiwu Liu
Modified: 2017-03-08 18:26 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-27 09:31:32 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 13:24:36 UTC

Description Zhiwu Liu 2016-04-28 04:33:26 UTC
Description of problem:
found that there are a bunch of deleted /var/log/messages files occupied the /var disk space

once restart atomic-openshift-node service, the occupied disk space will be released.

Version-Release number of selected component (if applicable):
openshift v3.1.1.6-26-g9549be3
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
almost all online cluster have this issue 100% reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:
[root@vm1 ~]# lsof 2>/dev/null  | grep deleted | sort -k7 -n | tail -20
tuned        993  22369              root    6u      REG              202,2      4096   16818306 /tmp/ffie1MOIC (deleted)
openshift  65774 114440              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  17723              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  42340              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65686              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65775              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65776              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65777              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65778              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65779              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65780              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65783              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65831              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65835              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  65898              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774  66653              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
openshift  65774   7534              root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)
tuned        993                     root    6u      REG              202,2      4096   16818306 /tmp/ffie1MOIC (deleted)
monitor    22534                     root    3w      REG              202,3    407452        271 /var/log/openvswitch/ovs-vswitchd.log-20160303 (deleted)
openshift  65774                     root   21r      REG              202,3  56708073   25167127 /var/log/messages-20160405 (deleted)

[root@vm1 ~]# cd /var
[root@vm1 var]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda3      8.0G  1.2G  6.9G  15% /var
[root@vm1 var]# du -sh 
1.1G	.
[root@vm1 var]# 


Expected results:


Additional info:

Comment 1 Andy Goldstein 2016-04-28 12:21:07 UTC
Maybe the cadvisor code isn't handling rotated logs well when it's checking for OOM events. We'll take a look.

Comment 2 Seth Jennings 2016-05-18 21:57:44 UTC
While we haven't definitively proven that the cadvisor code is the one holding the files open, I posted a PR upstream that makes cadvisor handle log rotation properly, closing and reopening the file so that it can be freed.

https://github.com/google/cadvisor/pull/1264

Comment 3 Andy Goldstein 2016-05-27 15:17:08 UTC
PR to kube has merged: https://github.com/kubernetes/kubernetes/pull/25914. OpenShift will get this in whatever rebase contains that PR.

Comment 6 Seth Jennings 2016-06-16 16:46:30 UTC
*** Bug 1333663 has been marked as a duplicate of this bug. ***

Comment 7 Seth Jennings 2016-06-16 22:05:57 UTC
Origin rebase is complete:
https://github.com/openshift/origin/pull/8856

I verified it contains the upstream fix from comment 3.

Comment 8 Seth Jennings 2016-06-16 22:11:05 UTC
Sorry, looking at two bugs at once.  It actually looks like this doesn't come in as part of the rebase and that the origin cadvisor dep needs to be bumped.  Moving back to ASSIGNED.

Comment 12 Andy Goldstein 2016-07-22 18:33:24 UTC
This should be in the 3.3 builds now.

Comment 13 DeShuai Ma 2016-07-25 06:41:39 UTC
Test on openshift v3.3.0.9
There is no those error. verify this bug.
On node:
[root@ip-172-18-8-37 ~]# lsof 2>/dev/null  | grep deleted
[root@ip-172-18-8-37 ~]#

Comment 15 errata-xmlrpc 2016-09-27 09:31:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933


Note You need to log in before you can comment on or make changes to this bug.