Bug 2073498 - [4.10] GitOps ZTP container extract command hangs intermittently
Summary: [4.10] GitOps ZTP container extract command hangs intermittently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.z
Assignee: Jim Ramsay
QA Contact: yliu1
URL:
Whiteboard:
Depends On: 2073439
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-08 15:41 UTC by Jim Ramsay
Modified: 2022-07-11 15:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2073439
Environment:
Last Closed: 2022-07-11 15:28:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 1066 0 None Merged Bug 2073498: ztp: Disable podman logging when extracting container data 2022-04-13 18:01:17 UTC
Red Hat Product Errata RHBA-2022:5514 0 None None None 2022-07-11 15:28:42 UTC

Description Jim Ramsay 2022-04-08 15:41:23 UTC
Clone for back-porting to 4.10

+++ This bug was initially created as a clone of Bug #2073439 +++

Description of problem: The documentation instructs users to extract the deployment CRs, source CRs, etc from the container using:
mkdir -p ./out
podman run -t --rm registry-proxy.engineering.redhat.com/rh-osbs/openshift4-ztp-site-generate:v4.10.0-41 extract /home/ztp --tar | tar x -C ./out
(replace the container version with any valid 4.10 pre-release build)

Without the -t option this command will hang on some runs/iterations. When the -t option is used the command will (sometimes) fail with output:
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

In these cases removing the pipe command shows that tar (command inside the 'extract' script in the container) is failing to write the output:
tar: Refusing to write archive contents to terminal (missing -f option?)
tar: Error is not recoverable: exiting now

Version-Release number of selected component (if applicable): 4.10


How reproducible: 100% on some environments, intermittent on others


Steps to Reproduce:
1. Run above command repeatedly until command fails with message "This does not look like a tar archive"

Actual results:


Expected results: Command populates the ./out directory with content and does not return error.

--- Additional comment from Jim Ramsay on 2022-04-08 13:58:29 UTC ---

So there are potentially 2 issues here.

The first is that podman seems to sometimes hang when sending lots of data to stdout.  Reason unknown.

We added '-t' to the recommended commandline to try to fix that.  However, with '-t' tar is unhappy.

Turns out that removing '-t' and also piping tar|cat inside the container leads to a more reliable experience; doing some stress tests now to see if it's a good fix or not.

--- Additional comment from Jim Ramsay on 2022-04-08 14:12:36 UTC ---

https://github.com/containers/podman/issues/13779 contains the root cause and the probable fix.

Podman logs all stdout to journald in order to make `podman logs` work, and this can get overrun when dumping large amounts of data to stdout.

Adding `--log-driver=none` to the podman commandline disables this logging in podman, and the data flows to stdout unencumbered.

Makes me think of this old TV spot; perhaps other Canadians will get it: https://www.youtube.com/watch?v=upsZZ2s3xv8

Running some stress tests now to make sure it works reliably, then we should update the docs with the new commandline.

--- Additional comment from Jim Ramsay on 2022-04-08 15:09:59 UTC ---

Yes, adding '--log-driver=none' to the podman commandline reliably extracts via tar|stdout

There's a PR out to master to fix the upstream docs and in-tool help output with the recommendation, and this will be backported to 4.10 as well.

aireilly: The 4.10 docs will also need to change to match.

Comment 2 yliu1 2022-05-10 12:49:08 UTC
We did not encounter original issue in qe env, thus mark this one as verified due to no regression found.

Comment 5 errata-xmlrpc 2022-07-11 15:28:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.22 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5514


Note You need to log in before you can comment on or make changes to this bug.