Bug 1921128

Summary: [gss][podman]Getting the error while starting container "Error: readlink /var/lib/containers/storage/overlay/l/XXX no such file or directory"
Product: Red Hat Enterprise Linux 8 Reporter: Geo Jose <gjose>
Component: podmanAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: Alex Jia <ajia>
Severity: high Docs Contact:
Priority: high    
Version: 8.2CC: ajia, akupczyk, bbaude, bhubbard, bniver, cchen, ceph-eng-bugs, ddarrah, dornelas, dwalsh, dzafman, gabrioux, gsitlani, jligon, jnovy, kchai, lithomas, lmiksik, lsm5, mheon, nojha, pthomas, rzarzyns, sseshasa, tsweeney, umohnani, vrothber, vumrao, ypu
Target Milestone: rc   
Target Release: 8.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: podman-3.0.1-6.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1940493 (view as bug list) Environment:
Last Closed: 2021-05-18 15:34:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1823899, 1940493, 1952246    

Description Geo Jose 2021-01-27 14:55:49 UTC
Description of problem:
 * After reboot, getting the error while starting container "Error: readlink /var/lib/containers/storage/overlay/l/XXX no such file or directory"

Version-Release number of selected component (if applicable):
 * RHCS 4.1
 * podmanVersion: 1.9.3
 * buildahVersion: 1.14.9
 * kernel: 4.18.0-193.28.1.el8_2.x86_64

How reproducible:
 * This issue is not reproducible at every time.
 * This issue happened after rebooting the node.

Comment 4 Daniel Walsh 2021-01-27 18:10:29 UTC
Did the machine crash, while committing an image?  We have seen similar failings like this before when a image does not get fully written based on a crash. or shutdown.

Comment 5 Tom Sweeney 2021-01-27 19:23:24 UTC
I'll ask Valentin to look at this as he seemed to have some thoughts during scrum today.

Comment 6 Valentin Rothberg 2021-01-28 11:02:01 UTC
I think it's the same issue as https://github.com/containers/podman/issues/5986. Let's continue the conversation upstream and report the solution here.

Comment 8 Valentin Rothberg 2021-01-28 15:06:11 UTC
I have been looking at this issue and had a conversation with Dan.  For now, we suggest using the mentioned workaround and re-pull the image.  We will tackle the root cause soon but this will take time.  The problem is storage corruption when the pull process is being killed.  There is a certain time window in which this can lead to data corruption as shown here and in the linked upstream issue.

Comment 11 Urvashi Mohnani 2021-03-08 19:11:29 UTC
The fix for this has made it into v3.0.1 for RHEL 8.4, moving to POST.

Comment 12 Jindrich Novy 2021-03-09 08:00:13 UTC
Confirming https://github.com/containers/storage/pull/822 is already applied in the current version of podman in 8.4.0.

Can we get qa ack please?

Comment 19 Daniel Walsh 2021-03-10 21:16:06 UTC
I would say containers/storage needs to be updated in buildah/podman and skopeo.

Comment 25 Alex Jia 2021-03-19 17:07:01 UTC
I can't reproduce this bug followed by steps in https://github.com/containers/podman/issues/5986,
and I got different error like this 'Error: error creating container storage: the container name 
"atomix-1" is already in use by "ed8ea5031a2111261fa70e56c9440aa27ada62c2a3495b2d944a14868d32bd32". 
You have to remove that container to be able to reuse that name.: that name is already in use',
and podman ps -a show nothing and also can't remove this container by podman rm -f, this issue
is found on podman-1.9.3-2.module+el8.2.1+6867+366c07d6 and podman-3.0.1-3.module+el8.4.0+10198+36d1d0e3,
as usual, you need to run tests several times then hit this issue.

Deploy rhel-guest-image-8.4-756.x86_64.qcow2 as libvirt VM then run tests inside the VM

[root@atomic-host-test-4109 ~]# rpm -q podman
podman-3.0.1-3.module+el8.4.0+10198+36d1d0e3.x86_64

[root@atomic-host-test-4109 ~]# podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN
ed8ea5031a2111261fa70e56c9440aa27ada62c2a3495b2d944a14868d32bd32
[root@atomic-host-test-4109 ~]# podman ps
CONTAINER ID  IMAGE                          COMMAND               CREATED        STATUS            PORTS                   NAMES
ed8ea5031a21  docker.io/atomix/atomix:3.1.5  --config /etc/ato...  4 seconds ago  Up 4 seconds ago  0.0.0.0:5679->5679/tcp  atomix-1

Destroy above running VM then start it again

[root@hp-dl360g9-04 ~]# virsh destroy ajia-8.4.0
Domain ajia-8.4.0 destroyed

[root@hp-dl360g9-04 ~]# virsh start ajia-8.4.0
Domain ajia-8.4.0 started

[root@atomic-host-test-4109 ~]# podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
[root@atomic-host-test-4109 ~]# podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN
Error: error creating container storage: the container name "atomix-1" is already in use by "ed8ea5031a2111261fa70e56c9440aa27ada62c2a3495b2d944a14868d32bd32". You have to remove that container to be able to reuse that name.: that name is already in use

[root@atomic-host-test-4109 ~]# podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES

NOTE: sometimes, can remove atomix-1 container, although podman ps -a show nothing

[root@atomic-host-test-4109 ~]# podman rm atomix-1
atomix-1
[root@atomic-host-test-4109 ~]# podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN
dd83b6ff7efd6769582349a27f57695d5caf6a37ad4cba81e6a2abbd9c2113ef
[root@atomic-host-test-4109 ~]# podman ps
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
[root@atomic-host-test-4109 ~]# podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
[root@atomic-host-test-4109 ~]# podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN
20f074d16b72a013f480b447cefa86b25436e5527d9b52773af8aafb4c41e02a
[root@atomic-host-test-4109 ~]# podman ps -a
CONTAINER ID  IMAGE                          COMMAND               CREATED        STATUS            PORTS                   NAMES
20f074d16b72  docker.io/atomix/atomix:3.1.5  --config /etc/ato...  2 seconds ago  Up 3 seconds ago  0.0.0.0:5679->5679/tcp  atomix-1

NOTE: repeat running above steps several times, the atomix-1 container is running again.

Comment 26 Jindrich Novy 2021-03-19 18:33:47 UTC
Alex, can you please re-test with the current versions attached to the advisory? podman-3.0.1-6.module+el8.4.0+10398+842aaf04 is the actual version which has all important bits vendored in to address this. Thanks.

https://errata.devel.redhat.com/advisory/65330/builds

Comment 27 Alex Jia 2021-03-20 05:08:02 UTC
(In reply to Jindrich Novy from comment #26)
> Alex, can you please re-test with the current versions attached to the
> advisory? podman-3.0.1-6.module+el8.4.0+10398+842aaf04 is the actual version
> which has all important bits vendored in to address this. Thanks.
> 
> https://errata.devel.redhat.com/advisory/65330/builds

I gave 3 times tests w/ podman-3.0.1-6.module+el8.4.0+10398+842aaf04
followed by steps in Comment 25, I can successfully start previous
atomix-1 container again after destroying and starting VM.

Is this test enought for you? if so, I will close this bug as VERIFIED, thanks!

Comment 28 Jindrich Novy 2021-03-22 07:32:36 UTC
Yes Alex, unless Geo objects, I think this is sufficient to consider this one verified.

Comment 29 Alex Jia 2021-03-22 11:39:46 UTC
(In reply to Jindrich Novy from comment #28)
> Yes Alex, unless Geo objects, I think this is sufficient to consider this
> one verified.

Thank you Jindrich and move this bug to VERIFIED state now.

Comment 32 errata-xmlrpc 2021-05-18 15:34:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1796