Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2073439

Summary: GitOps ZTP container extract command hangs intermittently
Product: OpenShift Container Platform Reporter: Ian Miller <imiller>
Component: Telco EdgeAssignee: Jim Ramsay <jramsay>
Telco Edge sub component: ZTP QA Contact: yliu1
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aireilly
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2073498 (view as bug list) Environment:
Last Closed: 2022-08-26 16:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2073498    

Description Ian Miller 2022-04-08 13:45:03 UTC
Description of problem: The documentation instructs users to extract the deployment CRs, source CRs, etc from the container using:
mkdir -p ./out
podman run -t --rm registry-proxy.engineering.redhat.com/rh-osbs/openshift4-ztp-site-generate:v4.10.0-41 extract /home/ztp --tar | tar x -C ./out
(replace the container version with any valid 4.10 pre-release build)

Without the -t option this command will hang on some runs/iterations. When the -t option is used the command will (sometimes) fail with output:
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

In these cases removing the pipe command shows that tar (command inside the 'extract' script in the container) is failing to write the output:
tar: Refusing to write archive contents to terminal (missing -f option?)
tar: Error is not recoverable: exiting now

Version-Release number of selected component (if applicable): 4.10


How reproducible: 100% on some environments, intermittent on others


Steps to Reproduce:
1. Run above command repeatedly until command fails with message "This does not look like a tar archive"

Actual results:


Expected results: Command populates the ./out directory with content and does not return error.

Comment 1 Jim Ramsay 2022-04-08 13:58:29 UTC
So there are potentially 2 issues here.

The first is that podman seems to sometimes hang when sending lots of data to stdout.  Reason unknown.

We added '-t' to the recommended commandline to try to fix that.  However, with '-t' tar is unhappy.

Turns out that removing '-t' and also piping tar|cat inside the container leads to a more reliable experience; doing some stress tests now to see if it's a good fix or not.

Comment 2 Jim Ramsay 2022-04-08 14:12:36 UTC
https://github.com/containers/podman/issues/13779 contains the root cause and the probable fix.

Podman logs all stdout to journald in order to make `podman logs` work, and this can get overrun when dumping large amounts of data to stdout.

Adding `--log-driver=none` to the podman commandline disables this logging in podman, and the data flows to stdout unencumbered.

Makes me think of this old TV spot; perhaps other Canadians will get it: https://www.youtube.com/watch?v=upsZZ2s3xv8

Running some stress tests now to make sure it works reliably, then we should update the docs with the new commandline.

Comment 3 Jim Ramsay 2022-04-08 15:09:59 UTC
Yes, adding '--log-driver=none' to the podman commandline reliably extracts via tar|stdout

There's a PR out to master to fix the upstream docs and in-tool help output with the recommendation, and this will be backported to 4.10 as well.

aireilly: The 4.10 docs will also need to change to match.

Comment 4 Jim Ramsay 2022-04-08 18:19:53 UTC
Marking as 'verified' since the actual verification will be done in the 4.10 clone of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2073498