Bug 1834974
Summary: | Podman old containers are eating up space in overlay directory | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | karan singh <karan> |
Component: | Ceph-Ansible | Assignee: | Dimitri Savineau <dsavinea> |
Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | amsyedha, aschoen, assingh, bbaude, ceph-eng-bugs, dornelas, dsavinea, dwalsh, gabrioux, gmeno, hfukumot, hyelloji, jbrier, jligon, jnovy, johfulto, lsm5, mheon, mmuench, mzink, nthomas, rgowdege, rrajaram, seb, shan, tserlin, tsweeney, vereddy, vrothber, ykaul |
Target Milestone: | z1 | ||
Target Release: | 4.1 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ceph-ansible-4.0.25-1.el8cp, ceph-ansible-4.0.25-1.el7cp | Doc Type: | Bug Fix |
Doc Text: |
.Storage directories from old containers are removed
Previously, storage directories for old containers were not removed. This could cause high disk usage. This could be seen if you installed {storage-product}, purged it, and reinstalled it. In {storage-product} 4.1z1, storage directories for containers that are no longer being used are removed and excessive disk usage does not occur.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-20 14:21:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1186913, 1816167 |
Description
karan singh
2020-05-12 19:12:40 UTC
How is Podman being invoked to remove the containers? Is Podman being used at all? It seems like there are still active overlay mounts - it seems like the containers in question didn't exit cleanly / weren't removed by `podman rm`? I think we need a lot more information about how RHCS invokes Podman (and tears it down when being removed) to be able to diagnose this. Assigning to Matt now, but we likely will need input from RHCS. As an example, if you see this service file, you will find answers to your questions 1. podman is being managed using service files 2. start == run 3. stop == rm (pre-requisite to start) [root@rgw-5 ~]# cat /etc/systemd/system/ceph-radosgw@.service [Unit] Description=Ceph RGW After=network.target [Service] EnvironmentFile=/var/lib/ceph/radosgw/ceph-%i/EnvironmentFile ExecStartPre=-/usr/bin/podman stop ceph-rgw-rgw-5-${INST_NAME} ExecStartPre=-/usr/bin/podman rm ceph-rgw-rgw-5-${INST_NAME} ExecStart=/usr/bin/podman run --rm --net=host \ --memory=191620m \ --cpus=8 \ -v /var/lib/ceph:/var/lib/ceph:z \ -v /etc/ceph:/etc/ceph:z \ -v /var/run/ceph:/var/run/ceph:z \ -v /etc/localtime:/etc/localtime:ro \ -v /var/log/ceph:/var/log/ceph:z \ -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:z \ -e CEPH_DAEMON=RGW \ -e CLUSTER=ceph \ -e RGW_NAME=rgw-5.${INST_NAME} \ -e RGW_CIVETWEB_PORT=${INST_PORT} \ -e CONTAINER_IMAGE=registry.redhat.io/rhceph/rhceph-4-rhel8:latest \ --name=ceph-rgw-rgw-5-${INST_NAME} \ \ registry.redhat.io/rhceph/rhceph-4-rhel8:latest ExecStopPost=-/usr/bin/podman stop ceph-rgw-rgw-5-${INST_NAME} Restart=always RestartSec=10s TimeoutStartSec=120 TimeoutStopSec=15 [Install] WantedBy=multi-user.target [root@rgw-5 ~]# Per my output above, i do not see mounted container overlay directories if the container is not running (which is good), however, i do see extra container overlay directories under /var/lib/containers/overlay . These directories are old leftovers from earlier container runs. I even tried podman system prune --all --volumes but it was not useful. It doesn't look like you're using unit files from `podman generate systemd` - and that has produced a number of problems. This looks like a simple unit not managed by PID files. This is not a supported configuration for running Podman under systemd. Podman containers are not direct children of the Podman process; Podman starts conmon, a lightweight container monitor daemon, which double-forks to daemonize and reparent on init, and the container is started as a child of Conmon. At that point, the Podman process is entirely superfluous; killing Podman, as systemd would do to stop the unit, is by no means guaranteed to kill the actual *container*. Please investigate moving to unit files generated by `podman generate systemd --new`. These are Type=forking and use PID files to ensure that Conmon and the container are properly recognized and managed by systemd, amongst other fixes to ensure that Systemd can properly manage the container. Hi Guillaume / Seb Per comment-4 from Matthew Heon, can you guys chime into this discussion. Are we aware that managing podman under systems without PID files is not a supported configuration? We have been using this all over the place in RHCS 4. I am not sure in RHCS 4.1, we have plans to move to podman generate systems (Type=forking) and PID files, if that's the recommended way to run Podman on RHEL. (if you are not the right person for this discussion, pls add the right guy in this bz, so that we can come to a conclusion) FYI there is one more BZ along the same lines, checkout https://bugzilla.redhat.com/show_bug.cgi?id=1807440 I don't have much experience with podman, I'm adding Dimitri as well. Dimitri, any idea? Dan Walsh, can you comment on https://bugzilla.redhat.com/show_bug.cgi?id=1834974#c5 please? Does `podman image prune` exist and do anything? `podman container prune` The issue seems to be the way your systemd unit file is configured. Valenetin could you advise? Verified with Ceph Version: 14.2.8-79.el8cp Ceph Ansible Version: 4.0.25-1.el8cp.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3003 Seems to have introduced https://bugzilla.redhat.com/show_bug.cgi?id=1858865 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |