Bug 1740079 - race/corruption: podman failed to launch containers
Summary: race/corruption: podman failed to launch containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: podman
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.1
Assignee: Brent Baude
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1727325 1734574 1741110
TreeView+ depends on / blocked
 
Reported: 2019-08-12 09:15 UTC by Michele Baldessari
Modified: 2020-11-14 07:56 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1741110 (view as bug list)
Environment:
Last Closed: 2019-11-05 21:02:57 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
strace of podman run (407.83 KB, text/plain)
2019-08-12 09:15 UTC, Michele Baldessari
no flags Details
ls -lr /var/lib/containers (2.35 MB, application/gzip)
2019-08-12 09:16 UTC, Michele Baldessari
no flags Details
bolt db on broken node (6.03 MB, application/octet-stream)
2019-08-12 09:17 UTC, Michele Baldessari
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:3403 0 None None None 2019-11-05 21:03:21 UTC

Description Michele Baldessari 2019-08-12 09:15:57 UTC
Created attachment 1602796 [details]
strace of podman run

Description of problem:

After some destructive testing involving many reboots of a controller node, some of which via hard reset, podman got into a completely borked state. Namely every podman run command claims that there is no image:
# podman run -d --net=host --name=test 192.168.24.1:8787/rhosp15/openstack-haproxy:pcmklatest
must provide image ID and image name to use an image: invalid argument


This does not happen normally and it took a few reboots to get into this state, but in this state not a single run command works:
[root@controller-0 ~]# rpm -q podman kernel runc
podman-1.0.3-1.git9d78c0c.module+el8.0.0.z+3717+fdd07b7c.x86_64
kernel-4.18.0-80.7.1.el8_0.x86_64
runc-1.0.0-55.rc5.dev.git2abd837.module+el8.0.0+3049+59fd2bba.x86_64

Also note that we reproduced this also with a testing version of runc:
runc-1.0.0-60.rc8.rhaos4.2.git3cbe540.el8.x86_64

[root@controller-0 ~]# podman image inspect 41bfdd5a7361
error parsing image data "41bfdd5a7361b1ecd6233d67bd163008cb407f9098c99fb5e625f9918b1558ef": readlink /var/lib/containers/storage/overlay/l/7G7QCIMC7D5MK7NQXQC4WXJTV7: no such file or directory

Notice the uppercase there which seems a bit suspicious?

This seems very similar to https://github.com/code-ready/crc/issues/325 ?

Am attaching strace from the run command, the bolt db and ls -lR from /var/lib/containers
What sprints to the eye is that on a working node we have:
[root@controller-1 ~]# ls -l /var/lib/containers/storage/overlay/l/ |wc -l
170

Whereas on a broken node we have:
[root@controller-0 ~]# ls /var/lib/containers/storage/overlay/l/
[root@controller-0 ~]#

Comment 1 Michele Baldessari 2019-08-12 09:16:43 UTC
Created attachment 1602797 [details]
ls -lr /var/lib/containers

Comment 2 Michele Baldessari 2019-08-12 09:17:26 UTC
Created attachment 1602798 [details]
bolt db on broken node

Comment 20 Joy Pu 2019-09-29 08:36:35 UTC
Thanks Michele for your feedback. Checked the code in vendor/github.com/containers/storage/drivers/overlay/overlay.go from podman-1.4.2-5.module+el8.1.0+4240+893c1ab8.src.rpm. The patches are already included. So set this to verified.

Comment 22 errata-xmlrpc 2019-11-05 21:02:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3403


Note You need to log in before you can comment on or make changes to this bug.