RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1849557 - Rootless Podman does not properly close and remove temporary files
Summary: Rootless Podman does not properly close and remove temporary files
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: fuse-overlayfs
Version: 8.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Giuseppe Scrivano
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1850661
TreeView+ depends on / blocked
 
Reported: 2020-06-22 08:29 UTC by Mark Kluber
Modified: 2020-11-04 03:06 UTC (History)
13 users (show)

Fixed In Version: fuse-overlayfs-1.0.0-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1850661 (view as bug list)
Environment:
Last Closed: 2020-11-04 03:05:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
lsof command output (5.57 MB, application/gzip)
2020-06-23 12:32 UTC, Mark Kluber
no flags Details
find command output (311.73 KB, application/gzip)
2020-06-23 12:33 UTC, Mark Kluber
no flags Details
podman diff command output (3.93 KB, application/gzip)
2020-06-23 12:35 UTC, Mark Kluber
no flags Details

Description Mark Kluber 2020-06-22 08:29:56 UTC
Description of problem:
Rootless Podman keeps filling filesystem over time to 100% and taking up all open file ulimits. We have 3 running containers based on "selenium/standalone-opera-debug:latest" image. Inside the running containers no files are being created but every passing hour a few gigabytes of additional disk space is being consumed by podman according to df command. This keeps repeating until filesystem is completely filled up, if we execute the 'podman ps --size' command the temporary podman files get removed from the disk and filesystem gets cleared. After this the deleted temporary files however are still visible in 'lsof' output and marked as 'deleted' status, therefore still open by podman process and taking up all the open file ulimit to the maximum. As final solution to this a complete stop/start is required for all podman containers and processes for functionality to return. 

Version-Release number of selected component (if applicable):
1.6.4 - 12.module+el8.2.0+6669+dde598ec

How reproducible:
always

Steps to Reproduce:
1. Use fresh install of RHEL 8.2 with stock default rootless Podman settings. 
2. Create separate filesystem for rootless podman container storage location and edit the '/home/<rootlessuser>/.config/containers/storage.conf' file, modify the 'graphroot' value to point to the new filesystem. 
3. Run container images as rootless user. 

Actual results:
Containers are running but filesystem usage is consistently growing higher until eventually hitting 100% usage, in addition to that the maximum open file ulimit is eventually reached as well. 

Expected results:
The containers should run as expected without filesystem being filled up with podman temporary files.

Comment 1 Matthew Heon 2020-06-22 12:11:23 UTC
Are your containers generating large amounts of logs? That's the first thing that comes to mind when I think about disk space explosion.

I'll try and reproduce locally, but given we've never heard about this before, I suspect it must be something about the environment or the containers in question. Can you provide an `lsof` when the system starts to fill up, to determine what the open files count actually is? Also, further details about the containers would be greatly appreciated - can you provide the commands used to start the containers, do they have healthchecks, are you performing a large number of `podman exec` calls into the containers? Finally, the output of `podman info` from an affected user on an affected machine would be helpful.

Comment 2 Daniel Walsh 2020-06-22 16:25:45 UTC
Do you have fuse-overlayfs installed?  Are you using the vfs driver?

podman info

Comment 3 Tom Sweeney 2020-06-22 19:00:06 UTC
Assigning to Giuseppe

Comment 4 Giuseppe Scrivano 2020-06-22 19:08:45 UTC
please also show what files you have in the storage, could you show the output for `find $graphroot` ?

Comment 5 Mark Kluber 2020-06-23 12:32:38 UTC
Created attachment 1698445 [details]
lsof command output

Comment 6 Mark Kluber 2020-06-23 12:33:22 UTC
Created attachment 1698446 [details]
find command output

Comment 7 Mark Kluber 2020-06-23 12:35:16 UTC
Created attachment 1698447 [details]
podman diff command output

Comment 8 Mark Kluber 2020-06-23 12:36:01 UTC
Some additional info:
There are two container images used, 1st one is exactly the same as selenium/standalone-firefox-debug (from docker hub offcial repository). 2nd one extends 1st + contains ibm java + jmeter + citrix receiver inside. 
I am attaching podman diff output which does not show many changes. 
The dockerfile of the base image:

##############
FROM selenium/standalone-firefox-debug:latest
LABEL authors=Falcon

USER seluser

RUN mkdir -p /home/seluser/.cache/  && \
    sudo apt -y update  && \
    sudo apt -y upgrade
##############

@Matthew Heon
- The containers are not generating any significant amount of log files. 
- I am attaching lsof output of a time period when system was starting to get filled up. Affected user is 'falconapp021'
- Command used to start containers: podman run -d --name epricer -e SE_OPTS="-sessionTimeout 450" -p 4444:4444 -p 5910:5900 -v /dev/shm:/dev/shm   falcon-selenium-base
- Healthchecks are not used. 
- No podman exec calls are being made.  
- 'Podman info' output as affected user 'falconapp021':
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+6368+cf16aa14.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: 9adfe850ef954416ea5dd0438d428a60f2139473'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 3685527552
  MemTotal: 16643944448
  OCIRuntime:
    name: runc
    package: runc-1.0.0-65.rc10.module+el8.2.0+6368+cf16aa14.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 7526281216
  SwapTotal: 8589930496
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: b03lciapp021
  kernel: 4.18.0-193.el8.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+6368+cf16aa14.x86_64
    Version: |-
      slirp4netns version 0.4.2+dev
      commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
  uptime: 650h 51m 15.76s (Approximately 27.08 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /home/falconapp021/.config/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-5.module+el8.2.0+6368+cf16aa14.x86_64
      Version: |-
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  GraphRoot: /opt/IBM/falcon/podman/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 2
  RunRoot: /tmp/run-1000
  VolumePath: /opt/IBM/falcon/podman/containers/storage/volumes

@Daniel Walsh
fuse-overlayfs is used.  

@Giuseppe Scrivano
I am attaching the output of the 'find' command.

Comment 9 Giuseppe Scrivano 2020-06-23 15:20:47 UTC
thanks, the issue is already fixed upstream with https://github.com/containers/fuse-overlayfs/pull/164/

I could reproduce it locally and bisecting fuse-overlayfs, I confirm the patches in https://github.com/containers/fuse-overlayfs/pull/164/ fix it for me.

If you'd like to try with the latest fuse-overlayfs version, you can do something like:

$ wget https://github.com/containers/fuse-overlayfs/releases/download/v1.1.1/fuse-overlayfs-x86_64
$ chmod +x fuse-overlayfs-x86_64
$ podman --storage-opt overlay.mount_program=/tmp/fuse-overlayfs-x86_64 run ....

Comment 10 Jindrich Novy 2020-06-24 06:09:21 UTC
The fix for this issue is already present in both 8.2.1 and 8.3.0. It will be fixed once they are released.

Comment 12 Giuseppe Scrivano 2020-06-24 07:23:50 UTC
for QA, a good reproducer for the issue can be found here: https://github.com/containers/fuse-overlayfs/issues/210

Comment 13 Christoph Karl 2020-06-24 07:29:23 UTC
We think we have the same problem on RHEL 7.8 also.
>rpm -qa| grep overlayfs
fuse-overlayfs-0.7.2-6.el7_8.x86_64

Comment 14 Christoph Karl 2020-06-24 13:22:15 UTC
Yes, we have this problem on RH 7.8 also.
We discovered it running a headless LibreOffice inside an ubi7 container,
but we can also reproduce it with the reproducer from comment 12.

On the very same machine both things (LibreOffice and the reproducer) works,
if we change the storage driver to vfs.

Please provide a fix for RH7 also.

Thank you

Comment 18 Jindrich Novy 2020-06-24 16:11:13 UTC
(In reply to Christoph Karl from comment #14)
> Yes, we have this problem on RH 7.8 also.
> We discovered it running a headless LibreOffice inside an ubi7 container,
> but we can also reproduce it with the reproducer from comment 12.
> 
> On the very same machine both things (LibreOffice and the reproducer) works,
> if we change the storage driver to vfs.
> 
> Please provide a fix for RH7 also.
> 
> Thank you

Christoph, seems it is too late for 7.8 so I targeted the fix at the 7.9 release in bug 1850661.

Thanks.

Comment 19 Daniel Walsh 2020-06-24 22:31:18 UTC
RHEL 7 will not be getting any more updates other then CVE Fixes.  It is in the next phase of it's lifetime.  Please move to RHEL8.

Comment 23 Mario Trangoni 2020-10-27 10:01:37 UTC
JFTR, is fuse-overlayfs-1.0.0-2.el8 going to arrive to CentOS 8.x at any time?

Comment 24 Giuseppe Scrivano 2020-10-27 10:44:26 UTC
it is part of the last build, not sure how long it takes to get into CentOS

Comment 25 Tom Sweeney 2020-10-27 20:12:21 UTC
Lokesh, do you happen to know the timing per Giuseppe's comment: https://bugzilla.redhat.com/show_bug.cgi?id=1849557#c24

Comment 26 Lokesh Mandvekar 2020-10-28 12:27:57 UTC
(In reply to Tom Sweeney from comment #25)
> Lokesh, do you happen to know the timing per Giuseppe's comment:
> https://bugzilla.redhat.com/show_bug.cgi?id=1849557#c24

Not sure honestly, I've seen it vary from days to weeks. Maybe #centos on freenode could provide more info, or you could file this bug on https://bugs.centos.org with a link to this bug.

Comment 28 errata-xmlrpc 2020-11-04 03:05:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4694


Note You need to log in before you can comment on or make changes to this bug.