RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1966872 - podman's image index corrupted during WAN emulation tests
Summary: podman's image index corrupted during WAN emulation tests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: podman
Version: 8.4
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Jindrich Novy
QA Contact: Yuhui Jiang
URL:
Whiteboard:
Depends On:
Blocks: 1972343
TreeView+ depends on / blocked
 
Reported: 2021-06-02 06:03 UTC by Flavio Percoco
Modified: 2023-09-15 01:14 UTC (History)
18 users (show)

Fixed In Version: podman-3.2.3-0.8.el8 or newer
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1972343 (view as bug list)
Environment:
Last Closed: 2021-11-09 17:38:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1964591 1 urgent CLOSED [master] ACM/ZTP with Wan emulation fails to start the agent service 2021-10-18 17:32:04 UTC
Red Hat Product Errata RHSA-2021:4154 0 None None None 2021-11-09 17:39:24 UTC

Internal Links: 1964591 1971630 1972063

Description Flavio Percoco 2021-06-02 06:03:31 UTC
Description of problem:

We (Assisted Installer team) have been doing WAN emulation tests and introducing network latency as well as dropped packages simulation since we are expecting to be running AI and managing deployments in such environments.

During our latest tests, `podman` failed to pull images, which resulted in a corrupted image index. After some testing, we noticed that deleting and then pulling the image again re-creates the index. The main issue is that, when podman's image index is corrupted, many commands stop working, `podman pull` and `podman images` among those.

This issue has the potential to break a high number of OpenShift deployments due to Assisted Installer's dependency on podamn.

More information can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1964591


 
Version-Release number of selected component (if applicable):


```
...
This is a host being installed by the OpenShift Assisted Installer.
It will be installed from scratch during the installation.
The primary service is agent.service.  To watch its status run e.g
sudo journalctl -u agent.service
**  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  ** **  **  **  **  **  **  **
Last login: Wed Jun  2 02:24:10 2021 from 198.18.8.1
[core@localhost ~]$ sudo podman images
Error: error retrieving size of image "8e8afa44d4455e950ffff8a747046e49f634b64c04c0bc918306e51d87f0c627": you may need to remove the image to resolve the error: unable to determine size: error locating layer with ID "3c91e5c083b2236769c9c9ec7a05fe138ff8e52ea10e3571dd597bada74cb874": layer not known
[core@localhost ~]$ podman version
Version:      3.0.2-dev
API Version:  3.0.0
Go Version:   go1.15.7
Built:        Wed Apr  7 13:07:54 2021
OS/Arch:      linux/amd64
```

How reproducible:

It's not deterministic but we can provide an environment where more debugging can be done.

Any guidance on what data to collect that could help with debugging this issue is greatly appreciated

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

In presence of network failure, podman's index should not end up in a corrupted state. More importantly, podman should be able to recover from this state (skipping the corrupted parts) and allow for new `pull` commands to work. 

Additional info:

Comment 15 Daniel Walsh 2021-06-11 14:52:23 UTC
Fixed in podman 3.2

Comment 25 Valentin Rothberg 2021-06-17 06:46:40 UTC
@Tom: I posted all relevant links already and it's addressed in all relevant versions.

The PR for v3.0.1 (merged): https://github.com/containers/podman/pull/10637

The PR for v3.2 (merged): https://github.com/containers/podman/pull/10636

Comment 26 Jindrich Novy 2021-06-17 10:12:16 UTC
Valentin, do you mind merging it to master too to avoid regression? (we are currently consuming podman from master in 8.5.0 and RHEL-9 which is why your change is not visible there)

Comment 27 Valentin Rothberg 2021-06-17 12:02:40 UTC
(In reply to Jindrich Novy from comment #26)
> Valentin, do you mind merging it to master too to avoid regression? (we are
> currently consuming podman from master in 8.5.0 and RHEL-9 which is why your
> change is not visible there)

Already done (see comment #5): https://github.com/containers/common/pull/609

This version of containers/common has not been merged into Podman yet but will eventually.

Comment 28 Jindrich Novy 2021-06-17 12:56:31 UTC
We need to mark this MODIFIED and add it to advisory only after it's been vendored into podman, otherwise the fix is obviously missing in podman.

Comment 29 Valentin Rothberg 2021-06-17 13:55:54 UTC
(In reply to Jindrich Novy from comment #28)
> We need to mark this MODIFIED and add it to advisory only after it's been
> vendored into podman, otherwise the fix is obviously missing in podman.

The fix *is* in v3.0.1 (which this bug is filed against.  It is also in Podman v3.2 which is scheduled for RHEL.

The fix *is not yet* in the main branch of Podman since it had to be fixed in containers/common which it is.

Hence, the fix *is* in Podman.

Comment 30 Jindrich Novy 2021-06-17 15:11:28 UTC
Valentin, note this bug is targeted at 8.5.0 where buildah, podman, is consumed from upstream master branch at the moment so the code is just not there :-) I can't switch this to MODIFIED and attach this bug to advisory as QE can't (pre)test fix which is missing in 8.5.0.

The v3.2 branch isn't v3.2-rhel so it's not supposed to be consumed in RHEL? (as in v3.0 is not going to RHEL but v3.0.1-rhel does) It is needed to communicate which branch should go to RHEL and when with me in advance. So far I have no information which branch is the future 8.5 - it is master for now.

Still having this in any branch first then master sounds like a regression to me.

The bug for 8.4.0.2 is #1972343 - which I can commit to dist-git after 8.4.0.1 is GA - this is where v3.0.1-rhel branch goes.

Tom, Laurie, can you please clarify and let me know if RHEL branches (and which ones of these) are ready and if I should switch to these and when for 8.5.0.

Thanks!

Comment 31 Laurie Friedman 2021-06-17 15:27:21 UTC
@jnovy I don't know enough about upstream and branching to answer this question.  Hopefully @tsweeney can help.  RHEL 8.5 branches are created for container-tools in git but you already know that so I don't think that answers your question.

Comment 32 Valentin Rothberg 2021-06-17 15:29:02 UTC
(In reply to Jindrich Novy from comment #30)
> Valentin, note this bug is targeted at 8.5.0 where buildah, podman, is
> consumed from upstream master branch at the moment so the code is just not
> there :-) I can't switch this to MODIFIED and attach this bug to advisory as
> QE can't (pre)test fix which is missing in 8.5.0.

Thanks for clarifying.  In this case, we need to wait until c/common is vendored into the main branch of Podman.
 
> The v3.2 branch isn't v3.2-rhel so it's not supposed to be consumed in RHEL?

I *think* that there will be -rhel branch at some point.  Matt will know.

> Still having this in any branch first then master sounds like a regression
> to me.

It was fixed in containers/common first.  *After* it was merged, I opened PRs for v3.0.1-rhel and did the necessary backports for v3.2.  I didn't open a vendor PR into Podman's main branch since these are happening regularly in any case and no code change in Podman was needed.  Note that the fix for v3.0.1-rhel was substantially different due to the major rewrite in the image code.

Comment 33 Tom Sweeney 2021-06-17 19:43:01 UTC
Jindrich, the fix from Valentin: https://github.com/containers/common/pull/612/files in upstream on c/common made it into the c/common v0.40.0 release.  That is being vendored into upstream Podman now with this PR: https://github.com/containers/podman/pull/10690, but it's having issues passing CI.  Once that is merged, we can pull from Podman upstream to do RHEL 8.5 and 9.0 testing with the fix in play.

Valentin, it looks like the version of c/common in the RHEL V3.2 branch is set to c/common v0.38.9, NOT v0.40.0.  In c/common v0.38.9, you had this commit https://github.com/containers/common/commit/2686c15b7b23f95af63b780d28376e5e1d8e5bf8 which was similar to, but not the same as https://github.com/containers/common/pull/612 noted in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1966872#c7.  The fix in #612 also checked for the image being nil and also different error handling.

So given that, do we need to make adjustments in v3.2 too?  The v3.2 branch will be the one used from RHEL 8.4.0.2.  It's not named with a '-rhel' yet Jindrich, I'm not sure if Matt is planning to add that or not, I hope so as I find it easier to track.

FYI @mheon

Comment 34 Valentin Rothberg 2021-06-17 20:22:18 UTC
(In reply to Tom Sweeney from comment #33)
> [...]
> Valentin, it looks like the version of c/common in the RHEL V3.2 branch is
> set to c/common v0.38.9, NOT v0.40.0.  

That is right. c/common v0.38 is used in v3.2 and v0.38.9 has the fixes for this BZ for v3.2.  Not sure why the v0.40 would matter for v3.2 since we want to keep the dependencies stable.

> In c/common v0.38.9, you had this
> commit
> https://github.com/containers/common/commit/
> 2686c15b7b23f95af63b780d28376e5e1d8e5bf8 which was similar to, but not the
> same as https://github.com/containers/common/pull/612 noted in this comment:
> https://bugzilla.redhat.com/show_bug.cgi?id=1966872#c7.  The fix in #612
> also checked for the image being nil and also different error handling.
>
> So given that, do we need to make adjustments in v3.2 too?  

Those are two different commits [1,2] with the second being a follow-up fix which is also mentioned in the commit message.  The two commits are in both c/common branches (main, v0.38).

Can we move on with the bug?  I am happy to answer more questions but am also surprised that the conversations in this BZ consumed more time than the actual fix and backpots.


[1] https://github.com/containers/common/commit/28e45551d6a37d1b4a10ee4f42de305695dcdf53
[2] https://github.com/containers/common/commit/2686c15b7b23f95af63b780d28376e5e1d8e5bf8

Comment 35 Tom Sweeney 2021-06-18 15:26:52 UTC
Valentin,

Welcome to a little glimpse of my BZ herding world!  ;^)  

You're right, c/common v0.40.0 does not matter for Podman v3.2.  However, I missed the fact that your second fix was in both upstream and the c/common v0.38 branch.  I did not see it in the notes for the release when I first asked the question.  So given that, I think Jindrich will need to be sure to grab the c/common v0.38 branch and use that after grabbing Podman v3.2 when he's building the containter-tools module for RHEL 8.4.0.2.

For the upcoming RHEL 8.5 container-tools module build that Jindrich will need to do, we need to get c/common v0.40.0 merged into upstream Podman by June 29.  There's a PR in-flight for that.

I think the confusion comes in with all the versions flying around and the numbering being somewhat similar.  Thanks for hanging in with us Valentin!

Jindrich, I think you have the info that you need for this, especially after the c/common v0.40.0 PR merges into Podman.  If not, please let me know.

Comment 36 Valentin Rothberg 2021-06-21 07:12:15 UTC
(In reply to Tom Sweeney from comment #35)
> [...] 
> So given that, I think Jindrich will need to be sure to
> grab the c/common v0.38 branch and use that after grabbing Podman v3.2 when
> he's building the containter-tools module for RHEL 8.4.0.2.

I made sure of that already. v0.38 is in Podman v3.2.1.

> I think the confusion comes in with all the versions flying around and the
> numbering being somewhat similar.  Thanks for hanging in with us Valentin!

I will make sure to make the version matrix more explicit in the future.

Comment 37 Flavio Percoco 2021-07-01 09:45:14 UTC
Thank you all for the hard work here.

@jnovy would you mind letting me know when there's a package we can test?

I would like to make sure we are using the right package when doing tests for WAN emulation.

Comment 50 Tom Sweeney 2021-08-27 15:47:50 UTC
Unless @vrothber thinks otherwise, I don't think we need to create a new BZ for the case when someone is using corrupted json @yujiang .  I think in that case, they get whatever they get.  We might be nice and tell them the json is corrupted, but that's sometimes not easy.

Comment 54 errata-xmlrpc 2021-11-09 17:38:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4154

Comment 55 Red Hat Bugzilla 2023-09-15 01:08:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.