Bug 1752599

Summary: Deadlock when pulling an image is interrupted
Product: Red Hat Enterprise Linux 8 Reporter: Matthew Heon <mheon>
Component: podmanAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.1CC: dornelas, dwalsh, jligon, jnovy, kanderso, lsm5, mheon, nalin, pthomas, tsweeney, vrothber, weshen
Target Milestone: rc   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: podman-1.6.3-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1779834 (view as bug list) Environment:
Last Closed: 2020-02-04 12:26:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1734578, 1779834    

Description Matthew Heon 2019-09-16 18:02:22 UTC
Description of problem:

Originally filed as BZ1748682 against Openshift 4.2.

When a `podman pull` command is interrupted (in the case of the original bug, by a reboot) while it is in certain critical sections, the containers/storage library will attempt to clean up the partially-completed operation if the pull is re-ran. This code path can attempt to take a lock multiple times, resulting in a deadlock that will prevent almost all Podman commands from running (and any command requiring c/storage - Buildah, CRI-O, Skopeo will also be unable to stop.

Version-Release number of selected component (if applicable):

Podman 1.4.2-stable2 (should reproduce on any released Podman)

How reproducible:

Fairly reproducible when `podman pull` is run as a systemd service on bootup, but this is a race condition with a fairly slim window - will be difficult to hit in normal use.

Steps to Reproduce:
1. podman pull <image>
2. Interrupt previous `podman pull` command while pull is in progress - SIGKILL should work
3. Re-run `podman pull <image>`

Actual results:

Second `podman pull` command freezes. Until it is closed, Podman, Skopeo, and CRI-O cannot launch successfully. Killing the frozen process will restore operation, but running the same `podman pull` again can potentially freeze again.

Expected results:

Second `podman pull` completes normally

Additional info:

Likelihood of triggering is low under normal circumstances, but Openshift found a fairly reliable way of doing it.

Comment 2 Daniel Walsh 2019-09-16 19:20:36 UTC
Is this a podman bug or a containers/storage bug?

Comment 3 Matthew Heon 2019-09-16 19:30:15 UTC
Bug is in c/storage. Fixed in 1.13.3. Will be vendored into Podman 1.6.0.

Comment 12 errata-xmlrpc 2020-02-04 12:26:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:0348