Bug 1817059 - Journald does not free up space due to containers keeping deleted files open
Summary: Journald does not free up space due to containers keeping deleted files open
Keywords:
Status: CLOSED DUPLICATE of bug 1560358
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-25 13:48 UTC by ecrosby
Modified: 2021-11-16 22:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 15:58:33 UTC
Target Upstream Version:
Embargoed:
jcantril: needinfo-


Attachments (Terms of Use)

Description ecrosby 2020-03-25 13:48:46 UTC
Jounrald is set to limit used space to 8G. It believes it has done that, but services running in containers continue to keep these deleted files open, causing space to be held open.

[root@ocp-compute-004 ~]# journalctl --disk-usage
Archived and active journals take up 8.0G on disk.
[root@ocp-compute-004 ~]# df -h /var/log
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rhel-var_log   15G   15G  698M  96% /var/log
[root@ocp-compute-004 ~]# du -a /var/log |sort -nr |grep -v journal|head -5
8628792	/var/log
89088	/var/log/messages
38992	/var/log/audit
8196	/var/log/audit/audit.log.4
8196	/var/log/audit/audit.log.3
[root@ocp-compute-004 ~]# journalctl --disk-usage
Archived and active journals take up 8.0G on disk.

[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.out  |grep /var/log |grep -v fluentd |head -2
ruby-time  22524  22634     root   58r      REG              253,4   8388608              4230832 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
ruby-time  22524  22634     root   67r      REG              253,4   8388608              4194391 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.out  |grep /var/log |head -2
fluentd    22524            root   58r      REG              253,4   8388608              4230832 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
fluentd    22524            root   67r      REG              253,4   8388608              4194391 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)

[root@ocp-compute-004 ~]#grep deleted /tmp/lsof.out  |grep /var/log |grep -v fluentd |grep -v ruby

[ecrosby@ecrosby-localdomain Domino]$ oc get pods --all-namespaces -o wide |grep compute-004 |grep fluent
openshift-logging                          logging-fluentd-cfcd8                                          1/1       Running            0          15d       10.130.3.48     ocp-compute-004.fqdn       <none>
[ecrosby@ecrosby-localdomain Domino]$ oc delete pod logging-fluentd-cfcd8 -n openshift-logging
pod "logging-fluentd-cfcd8" deleted
[ecrosby@ecrosby-localdomain Domino]$ oc get pods --all-namespaces -o wide |grep compute-004 |grep fluent
openshift-logging                          logging-fluentd-dl29g                                          1/1       Running            0          17s       10.130.2.211    ocp-compute-004.fqdn       <none>

[root@ocp-compute-004 ~]# df -h /var/log
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rhel-var_log   15G  8.3G  6.8G  56% /var/log

[root@ocp-compute-004 ~]# lsof > /tmp/lsof.good.out


[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.good.out |grep /var/log
rsyslogd  128442            root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rsyslogd  128442            root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
in:imjour 128442 128455     root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
in:imjour 128442 128455     root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rs:main   128442 128456     root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rs:main   128442 128456     root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)


[root@ocp-compute-004 ~]# rpm -qa |grep systemd
systemd-libs-219-67.el7_7.3.x86_64
oci-systemd-hook-0.2.0-1.git05e6923.el7_6.x86_64
systemd-219-67.el7_7.3.x86_64
systemd-sysv-219-67.el7_7.3.x86_64

[root@ocp-compute-004 ~]# rpm -qa |grep openshift-ansible
openshift-ansible-docs-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-roles-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-playbooks-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-3.11.154-2.git.0.1640c49.el7.noarch

Comment 1 ecrosby 2020-03-25 14:57:04 UTC
Similar issue. I have only identified this issue with fluentd so far. https://bugzilla.redhat.com/show_bug.cgi?id=1560358

Comment 2 Jeff Cantrill 2020-03-25 15:58:33 UTC
Closing as this is a duplicate but ultimately also relies on a fix proposed https://bugzilla.redhat.com/show_bug.cgi?id=1812889.  Workaround solution is to periodically cycle fluentd [1].  I believe there is an associated kbase that fundamentally documents [1] 

[1] https://github.com/openshift/origin-aggregated-logging/blob/master/docs/troubleshooting.md#fluentd-is-holding-onto-deleted-journald-files-that-have-been-rotated

*** This bug has been marked as a duplicate of bug 1560358 ***

Comment 3 Mike Fiedler 2020-03-27 15:43:40 UTC
The bug this was dup-ed against was fixed in 3.10.z.  It looks like this bug is in 3.11.  Did the fix make it to 3.11?

Comment 4 Mike 2020-06-02 11:12:48 UTC
I am still having the same problem in 3.11.98. Any updates?


Note You need to log in before you can comment on or make changes to this bug.