Bug 1661023 - podman broken metadata after out of disk space
Summary: podman broken metadata after out of disk space
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: podman
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Brent Baude
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-19 21:03 UTC by Aleksandar Kostadinov
Modified: 2019-03-15 03:35 UTC (History)
5 users (show)

Fixed In Version: podman-1.1.2-1.git0ad9b6b.fc29 podman-1.1.2-1.git0ad9b6b.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-10 18:23:31 UTC


Attachments (Terms of Use)
podman out of disk space (2.38 KB, text/plain)
2019-01-04 19:53 UTC, Aleksandar Kostadinov
no flags Details
podman out of space debug (32.74 KB, text/plain)
2019-01-08 08:40 UTC, Aleksandar Kostadinov
no flags Details

Description Aleksandar Kostadinov 2018-12-19 21:03:38 UTC
Description of problem:
After space on hdd exhausted, podman cannot clean up images anymore.

Version-Release number of selected component (if applicable):
podman version 0.10.1.3

How reproducible:


Steps to Reproduce:
1. podman build ...
2. make sure space is exhausted in point #1
3. pdman rmi imae1 image2 image3

Actual results:
> $ sudo podman rmi 501848910ca8 4540553024d0 043e30f6515d 31ca89666ade ec7db351a463 0881c9c277aa
> [sudo] password for avalon: 
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: 501848910ca8
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: 4540553024d0
> image is in use by a container
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: 043e30f6515d
> image is in use by a container
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: 31ca89666ade
> image is in use by a container
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: ec7db351a463
> image is in use by a container
> A container associated with containers/storage, i.e. via Buildah, CRI-O, etc., may be associated with this image: 0881c9c277aa
> image is in use by a container
> image is in use by a container
> $ sudo podman container ls -a
<empty output from ls>

Expected results:
images are removed

Comment 1 Brent Baude 2019-01-03 14:12:52 UTC
I am not able to reproduce this error with upstream master.  Would you be willing to build the upstream master and see if you can replicate it?  If you are using overlay, there is a bug in c/storage which I have submitted a fix for -> https://github.com/containers/storage/pull/258

Comment 2 Aleksandar Kostadinov 2019-01-03 15:18:30 UTC
Do you happen to have a build that I can install locally?

Comment 3 Brent Baude 2019-01-03 21:10:12 UTC
you could try from https://copr.fedorainfracloud.org/coprs/baude/Upstream_CRIO_Family/ ?

Comment 4 Aleksandar Kostadinov 2019-01-04 09:58:14 UTC
Thank you, Brent. 

Small issue with the repos. On fedora 29 the official package is version is

> 1:0.12.1.2-1.git9551f6b.fc29

While in your repo I see

> 0.12.2-1546546584.git9ffd4806.fc29

So it doesn't want to install OOB. I think if you bump it to 1:... or 2:... in repo it will work better.

Anyway, I installed by specifying version. My Dockerfile is:

> FROM example.com/aosqe/nextgenflex
> RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000


Build output:

> $ BUILDAH_LAYERS=false sudo podman build -f Dockerfile -t test-image --layers=false
> STEP 1: FROM example.com/aosqe/nextgenflex
> STEP 2: RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000
> ERRO[0026] read container terminal output: input/output error: input/output error 
> dd: error writing '/home/jenkins/file_too_big': No space left on device
> 11688+0 records in
> 11687+0 records out
> 12255678464 bytes (12 GB) copied, 26.6402 s, 460 MB/s
> ERRO[0028] error unmounting container: error unmounting build container "8ce62c5e29a828ef7451058bba907c87cb7d944fe12cae63cca838702e5a4948": write /var/lib/containers/storage/overlay-layers/.tmp-layers.json065046180: no space left on device 
> error building at step {Env:[OPENSHIFT_BUILD_NAME=cucushift-oc40-3 OPENSHIFT_BUILD_NAMESPACE=image-build OPENSHIFT_BUILD_COMMIT=40ef39df701547e4549f40e64fb6c9bcb326a9ed PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci HOME=/home/jenkins] Command:run Args:[dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000] Flags:[] Attrs:map[] Message:RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000 Original:RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000}: error while running runtime: exit status 1

Result:

> $ sudo podman images
> Could not get runtime: mkdir /var/lib/containers/storage/overlay/compat651067926: no space left on device

After I remove some unrelated files:

> $ sudo podman images
> REPOSITORY                                                 TAG        IMAGE ID       CREATED        SIZE
> example.com/aosqe/nextgenflex   latest     4628e5499724   25 hours ago   3.81 GB
> example.com/aosqe/nextgenflex   20190103   4628e5499724   25 hours ago   3.81 GB
> example.com/aosqe/nextgenflex   20181221   8797b76a65c8   2 weeks ago    3.81 GB
> example.com/aosqe/cucushift     oc40       7cb047d34ec1   2 weeks ago    2.6 GB

> $ sudo podman ps -a
> <nothing>

> $ du -sh ~/.local/share/containers/
> 4.0K	~/.local/share/containers/
> $ sudo du -sh /var/lib/containers/storage
> 17G  /var/lib/containers/storage
> # du -sh *
> 136K	libpod
> 4.0K	mounts
> 17G	overlay
> 156K	overlay-containers
> 88K	overlay-images
> 7.2M	overlay-layers
> 4.0K	storage.lock
> 4.0K	tmp

Space is not recoverable through `podman`, I assume the easiest way to reclaim it back is by `rm -rf /var/lib/containers`.

I think current behaviour is even more frustrating because there is no indication running the `podman` command that some objects can be removed. So if machine gets out of space, user has no easy way to figure out there are container/image objects that can be clean-ed up.

Now trying clean-up:
> # rm -rf containers
> rm: cannot remove 'containers/storage/overlay': Device or resource busy
> # podman images
> <nothing>
> # rm -rf containers
> <now dir gone without errors>

The above is an interesting observation that running `podman` again somehow released lock over overlay.

In summary, I think it is important to have a good automatic clean-up routine in case of out of space condition. In my experience it is easy to hit such situation and presently there is no user-friendly way to recover.

Comment 5 Brent Baude 2019-01-04 15:42:32 UTC
When i try to reproduce this, I see the following:

[fedora@localhost libpod]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        983M     0  983M   0% /dev
tmpfs           997M     0  997M   0% /dev/shm
tmpfs           997M  500K  996M   1% /run
tmpfs           997M     0  997M   0% /sys/fs/cgroup
/dev/sda1       3.9G  2.2G  1.5G  60% /
tmpfs           200M  4.0K  200M   1% /run/user/1000
[fedora@localhost libpod]$ BUILDAH_LAYERS=false bin/podman build --layers=false -f /foo/Dockerfile -t test-image .
STEP 1: FROM alpine
STEP 2: RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000
dd: writing '/root/file_too_big': No space left on device
1507+0 records in
1505+1 records out
error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000] Flags:[] Attrs:map[] Message:RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000 Original:RUN dd if=/dev/zero of=$HOME/file_too_big bs=1M count=1500000}: error while running runtime: exit status 1
[fedora@localhost libpod]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        983M     0  983M   0% /dev
tmpfs           997M     0  997M   0% /dev/shm
tmpfs           997M  500K  996M   1% /run
tmpfs           997M     0  997M   0% /sys/fs/cgroup
/dev/sda1       3.9G  2.2G  1.5G  60% /
tmpfs           200M  4.0K  200M   1% /run/user/1000

when running as you describe, it seems the space is still available?

Comment 6 Aleksandar Kostadinov 2019-01-04 15:49:18 UTC
Well, that's rather strange. Are you running Fedora 29? Maybe some other libraries are at fault?

Comment 8 Aleksandar Kostadinov 2019-01-04 19:53:29 UTC
Created attachment 1518536 [details]
podman out of disk space

I still reproduce, see log for podman version and logs. My fedora VM was updated just before I ran the test. Very strange.

Comment 9 Brent Baude 2019-01-04 20:57:22 UTC
can you create the same log but use podman --log-level=debug ?  maybe that shows something that will help us explain things.

Comment 10 Aleksandar Kostadinov 2019-01-08 08:40:40 UTC
Created attachment 1519149 [details]
podman out of space debug

Didn't have time yesterday. Please find attached.

Comment 11 Fedora Update System 2019-02-27 13:30:13 UTC
podman-1.1.0-1.git006206a.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2334f59273

Comment 12 Fedora Update System 2019-02-27 13:30:27 UTC
podman-1.1.0-1.git006206a.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-ead0cd452a

Comment 13 Fedora Update System 2019-02-28 18:55:35 UTC
podman-1.1.0-1.git006206a.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-2334f59273

Comment 14 Fedora Update System 2019-02-28 21:26:25 UTC
podman-1.1.0-1.git006206a.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ead0cd452a

Comment 15 Fedora Update System 2019-03-05 19:11:12 UTC
podman-1.1.2-1.git0ad9b6b.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-d244a0fe3e

Comment 16 Fedora Update System 2019-03-05 19:11:25 UTC
podman-1.1.2-1.git0ad9b6b.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-5730099f0b

Comment 17 Fedora Update System 2019-03-06 15:13:00 UTC
podman-1.1.2-1.git0ad9b6b.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-d244a0fe3e

Comment 18 Fedora Update System 2019-03-06 15:57:14 UTC
podman-1.1.2-1.git0ad9b6b.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-5730099f0b

Comment 19 Fedora Update System 2019-03-10 18:23:31 UTC
podman-1.1.2-1.git0ad9b6b.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2019-03-15 03:35:17 UTC
podman-1.1.2-1.git0ad9b6b.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.