Bug 1775307

Summary: Concurrent 'podman pull/run' sometimes fails with "Error processing tar file(io: read/write on closed pipe)"
Product: Red Hat Enterprise Linux 8 Reporter: Michele Baldessari <michele>
Component: podmanAssignee: Valentin Rothberg <vrothber>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.2CC: bbaude, bdobreli, ddarrah, dornelas, dwalsh, gscrivan, jligon, jnovy, lsm5, mbasti, mheon, nalin, pthomas, toneata, tsweeney, twaugh, vrothber, weshen
Target Milestone: rcKeywords: ZStream
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: podman-1.6.4-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1787523 (view as bug list) Environment:
Last Closed: 2020-04-28 15:52:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1734579, 1787523    

Comment 8 Daniel Walsh 2019-12-02 15:18:27 UTC
Matt have you made any more progress?

Comment 9 Daniel Walsh 2019-12-02 15:19:03 UTC
Nalin, Giuseppe and Valentin, Any ideas?

Comment 10 Valentin Rothberg 2019-12-02 15:20:22 UTC
Brent asked me to have a look which I will do first thing tomorrow morning.

Comment 11 Valentin Rothberg 2019-12-03 17:02:43 UTC
@Michele, have so you seen other errors such as `error storing blob to file \"/var/tmp/storage939391665/2\": unexpected EOF"` as well?

We're still analysing and debugging the issue but given the indeterministic nature, it's a tough ride.

Comment 13 Valentin Rothberg 2019-12-04 15:28:24 UTC
Update from my side:

- I couldn't find anything in the code that would add up to be causing this error.
- It looks like something's going south when parsing the tar files which points to golang.
- I can reproduce the issue with the podman package from the RHEL repos.
- I *cannot* reproduce when building podman on a RHEL 8.2 VM and using that.

Given that we haven't see this error yet, and the code not (yet?) revealing an issue, I begin to feel that the issue is build related. I will continue following the build-related theory and give updates,

Comment 14 Valentin Rothberg 2019-12-04 18:44:40 UTC
Another update:

It is not build-related. Eduardo and I managed to reproduce the `io: read/write on closed pipe` (error#1) on RHEL 8 machines with different podman binaries (repositories, custom build, locally built rpm).

We also encountered the error mentioned in comment#11: `error storing blob to file \"/var/tmp/storage939391665/2\": unexpected EOF"` (error#2)

Both errors happen at different stages. error#1 happens when writing a layer to the storage backend, which happens during the commit stage when pulling an image. error#2 happens way before that (i.e., during PutBlob()) where individual blobs of an image are first downloaded _and_ decompressed and then written to a temp directory.

To me, error#1 is pointing to a potential tar issue; there are known issues with golang but those date back to go 1.10 while our builds are using go 1.13. Error#2 is pointing to a potential gzip decompression issue.

Error#2 clearly happens somewhere in github.com/klauspost/pgzip, the library we're using for parallel gzip compression. This library is API compatible with the go standard lib, so it's easy to switch back and forth. I'm currently running the reproducer locally with a Podman binary using the stdlib and didn't hit any error after 24 iterations of the reproducer script.

I will continue investigating tomorrow morning. Again, thanks a lot to Eduardo for helping to reproduce.

Comment 15 Brent Baude 2019-12-05 17:23:19 UTC
@Michele, can you please provide any and all details regarding the registry itself? versions, proxy, etc

Comment 16 Valentin Rothberg 2019-12-05 17:56:44 UTC
Another update:

It is not github.com/klauspost/pgzip, we can reproduce with the standard library as well. It is also not a bug of golang 1.13, at least we can reproduce with binaries built with golang 1.12 as well.

Both errors we are seeing can occur when a) a pipe is really being closed, and b) when data is corrupted or invalid in some form yield gzip/tar processing to go south.

I will run wireshard in the background tomorrow to collect data from the registry. As Brent mentions in comment#15, having details of the registry setup of registry-proxy.engineering.redhat.com would be hugely beneficial. FWIW, we could not reproduce this error with another registry so gar.

Comment 17 Valentin Rothberg 2019-12-06 11:29:54 UTC
Can we get someone from registry-proxy.engineering.redhat.com in? I begin to believe that the nginx might be causing issues (e.g, proxy buffering) which can be the source of what we're seeing in this BZ. This would explain why we fail to reproduce the issue with the same images on other registries.

Comment 37 errata-xmlrpc 2020-04-28 15:52:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1650