Bug 2055487

Summary: BUILDAH-Error locating just-written images while creating multiple container
Product: Red Hat Enterprise Linux 8 Reporter: Carroline <cpippin>
Component: buildahAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: Joy Pu <ypu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.5CC: arajan, ddarrah, dornelas, jnovy, mheon, pjagtap, prjagtap, pthomas, subhat, tsweeney, umohnani, ypu
Target Milestone: rcKeywords: FastFix, Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: buildah-1.24.2-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2066364 2066519 (view as bug list) Environment:
Last Closed: 2022-05-10 13:28:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2066364, 2066519    

Description Carroline 2022-02-17 04:54:28 UTC
Description of problem:

Error seen while building containers :

1)Error reproducible by pulling multiple docker-images from the docker-registry in a while loop.

//

error locating just-written image "containers-storage:[overlay@/opt/jenkins/.local/share/containers/storage+/run/user/800/containers]localhost/26:latest": image not known

//


Version-Release number of selected component (if applicable):

RHEL - 8.5
Errata applied - RHEA-2022:0352
Podman : podman-3.4.2-9.module+el8.5.0+13852+150547f7.src.rpm

How reproducible:

Error reproducible by pulling 30 docker-images from the docker-registry in a while loop.

#!/bin/bash
x=1
podman rmi -af
while [ $x -le 30 ]
do
  echo "$x times"
  podman build -t $x -f Dockerfile &> log$x.log & 
  x=$(( $x + 1 ))


Actual results:

Below error while execution:

log26.log:Error: error locating just-written image "containers-storage:[overlay@/opt/jenkins/.local/share/containers/storage+/run/user/800/containers]localhost/26:latest": image not known


Expected results:

- Container images should build.


Additional info:

Associated Bug: 2008997 & 2041515

Comment 16 Tom Sweeney 2022-02-23 13:48:09 UTC
Hi Carroline,

Actually, I'll change Aditya's response slightly. I don't know if the customer wants to do this, but their script will work if they don't send the build command to the background.  So changing the line to remove the last ampersand:

 podman build -t $x -f Dockerfile &> log$x.log & 

to

 podman build -t $x -f Dockerfile &> log$x.log 

The difference is their script won't return until all of the builds are complete rather than almost instantaneously.  The script should work as they originally created it, but this change is a short-term workaround for them if they want to employ it.

t

Comment 28 Tom Sweeney 2022-03-21 23:56:25 UTC
########### Impact Statement #############

When doing multiple simultaneous builds, random builds fail to complete due to locking issues within the code.  Adcubum AG is one of our premium customers and has asked that the fix be made in RHEL 8.5.0.4.  They are using Buildah in their CI, and there is not a workaround that is acceptable for their use. Given that this will be in 8.5.0.4, we need to get this in as a Blocker for RHEL 8.6 too.
We have fixes upstream that will need to be ported https://github.com/containers/image/pull/1480 and https://github.com/containers/storage/pull/1153.  Both of these fixes have been upstream for a couple weeks now without an issue.  These changes create no risk to any projects other than the container tools; Podman, Buildah and Skopeo.  The change to the image code is very small and simple.  The change to the storage code is more complex, but not outlandish.  I'd rate this as a medium-level risk for the container tools.  We do have the changes in hand, and just need the greenlight to backport.  I fully expect this work will be completed by ITM 31, giving us more soak/test time.

Comment 29 Tom Sweeney 2022-03-21 23:59:53 UTC
@ddarrah and @ypu can you add a qa_ack please?  @jnovy can you add a dev ack please?

Comment 31 Tom Sweeney 2022-03-24 15:52:46 UTC
PRs with fixes:  https://github.com/containers/image/pull/1503  https://github.com/containers/storage/pull/1174 https://github.com/containers/podman/pull/13623 . Setting to POST and assigning to Jindrich for any further BZ or packaging needs.

Comment 39 Joy Pu 2022-03-30 09:39:16 UTC
Can reproduce with buildah-1.24.2-2.module+el8.6.0+14488+6524fb7f.x86_64
# cat *log |grep "image not known"
error locating just-written image "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]localhost/23:latest": image not known
error locating new copy of image "750037c05cfe1857e16500167c7c217658e15eb9bc6283020cfb3524c93d1240" (i.e., "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]localhost/24:latest"): image not known
error locating just-written image "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]localhost/7:latest": image not known

And test it with buildah-1.24.2-4.module+el8.6.0+14594+d37c7ba8.x86_64.rpm, all build can finished normally. So move it to Verified. Details:
# cat *log |grep "image not known"
# buildah images
REPOSITORY                          TAG      IMAGE ID       CREATED       SIZE
localhost/26                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/28                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/23                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/2                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/29                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/20                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/30                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/1                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/22                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/12                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/19                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/8                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/25                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/13                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/21                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/6                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/14                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/11                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/16                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/15                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/24                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/9                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/17                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/5                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/7                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/27                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/4                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/18                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/3                         latest   750037c05cfe   5 weeks ago   159 MB
localhost/10                        latest   750037c05cfe   5 weeks ago   159 MB
localhost/first                     latest   750037c05cfe   5 weeks ago   159 MB
registry.fedoraproject.org/fedora   latest   750037c05cfe   5 weeks ago   159 MB

Comment 41 errata-xmlrpc 2022-05-10 13:28:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1762