Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1817059

Summary: Journald does not free up space due to containers keeping deleted files open
Product: OpenShift Container Platform Reporter: ecrosby
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED DUPLICATE QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: aos-bugs, gentilem, jcantril, mifiedle
Target Milestone: ---Flags: jcantril: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-25 15:58:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ecrosby 2020-03-25 13:48:46 UTC
Jounrald is set to limit used space to 8G. It believes it has done that, but services running in containers continue to keep these deleted files open, causing space to be held open.

[root@ocp-compute-004 ~]# journalctl --disk-usage
Archived and active journals take up 8.0G on disk.
[root@ocp-compute-004 ~]# df -h /var/log
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rhel-var_log   15G   15G  698M  96% /var/log
[root@ocp-compute-004 ~]# du -a /var/log |sort -nr |grep -v journal|head -5
8628792	/var/log
89088	/var/log/messages
38992	/var/log/audit
8196	/var/log/audit/audit.log.4
8196	/var/log/audit/audit.log.3
[root@ocp-compute-004 ~]# journalctl --disk-usage
Archived and active journals take up 8.0G on disk.

[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.out  |grep /var/log |grep -v fluentd |head -2
ruby-time  22524  22634     root   58r      REG              253,4   8388608              4230832 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
ruby-time  22524  22634     root   67r      REG              253,4   8388608              4194391 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.out  |grep /var/log |head -2
fluentd    22524            root   58r      REG              253,4   8388608              4230832 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
fluentd    22524            root   67r      REG              253,4   8388608              4194391 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)

[root@ocp-compute-004 ~]#grep deleted /tmp/lsof.out  |grep /var/log |grep -v fluentd |grep -v ruby

[ecrosby@ecrosby-localdomain Domino]$ oc get pods --all-namespaces -o wide |grep compute-004 |grep fluent
openshift-logging                          logging-fluentd-cfcd8                                          1/1       Running            0          15d       10.130.3.48     ocp-compute-004.fqdn       <none>
[ecrosby@ecrosby-localdomain Domino]$ oc delete pod logging-fluentd-cfcd8 -n openshift-logging
pod "logging-fluentd-cfcd8" deleted
[ecrosby@ecrosby-localdomain Domino]$ oc get pods --all-namespaces -o wide |grep compute-004 |grep fluent
openshift-logging                          logging-fluentd-dl29g                                          1/1       Running            0          17s       10.130.2.211    ocp-compute-004.fqdn       <none>

[root@ocp-compute-004 ~]# df -h /var/log
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/rhel-var_log   15G  8.3G  6.8G  56% /var/log

[root@ocp-compute-004 ~]# lsof > /tmp/lsof.good.out


[root@ocp-compute-004 ~]# grep deleted /tmp/lsof.good.out |grep /var/log
rsyslogd  128442            root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rsyslogd  128442            root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
in:imjour 128442 128455     root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
in:imjour 128442 128455     root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rs:main   128442 128456     root   29r      REG              253,4   8388608              4251471 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)
rs:main   128442 128456     root  251r      REG              253,4   8388608              4251470 /var/log/journal/ee906aa861254910b7717373743348e3/system (deleted)


[root@ocp-compute-004 ~]# rpm -qa |grep systemd
systemd-libs-219-67.el7_7.3.x86_64
oci-systemd-hook-0.2.0-1.git05e6923.el7_6.x86_64
systemd-219-67.el7_7.3.x86_64
systemd-sysv-219-67.el7_7.3.x86_64

[root@ocp-compute-004 ~]# rpm -qa |grep openshift-ansible
openshift-ansible-docs-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-roles-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-playbooks-3.11.154-2.git.0.1640c49.el7.noarch
openshift-ansible-3.11.154-2.git.0.1640c49.el7.noarch

Comment 1 ecrosby 2020-03-25 14:57:04 UTC
Similar issue. I have only identified this issue with fluentd so far. https://bugzilla.redhat.com/show_bug.cgi?id=1560358

Comment 2 Jeff Cantrill 2020-03-25 15:58:33 UTC
Closing as this is a duplicate but ultimately also relies on a fix proposed https://bugzilla.redhat.com/show_bug.cgi?id=1812889.  Workaround solution is to periodically cycle fluentd [1].  I believe there is an associated kbase that fundamentally documents [1] 

[1] https://github.com/openshift/origin-aggregated-logging/blob/master/docs/troubleshooting.md#fluentd-is-holding-onto-deleted-journald-files-that-have-been-rotated

*** This bug has been marked as a duplicate of bug 1560358 ***

Comment 3 Mike Fiedler 2020-03-27 15:43:40 UTC
The bug this was dup-ed against was fixed in 3.10.z.  It looks like this bug is in 3.11.  Did the fix make it to 3.11?

Comment 4 Mike 2020-06-02 11:12:48 UTC
I am still having the same problem in 3.11.98. Any updates?