Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1877548

Summary: must-gather fails, unable to download output: No available strategies to copy
Product: OpenShift Container Platform Reporter: Naveen Malik <nmalik>
Component: ocAssignee: Maciej Szulik <maszulik>
Status: CLOSED DUPLICATE QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: medium    
Version: 4.5CC: aos-bugs, jeder, jokerman, mfojtik, sgreene, wking
Target Milestone: ---Keywords: ServiceDeliveryImpact
Target Release: 4.7.0Flags: mfojtik: needinfo?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleReset
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-01 08:56:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Naveen Malik 2020-09-09 19:41:54 UTC
Description of problem:
Some 4.5.7 clusters I have tried to collect a must-gather on have failed.  Short summary of client output of warnings and errors:

WARNING: cannot use rsync: rsync not available in container
WARNING: cannot use tar: tar not available in container
[must-gather-87vsw] OUT gather output not downloaded: No available strategies to copy.
[must-gather-87vsw] OUT
error: unable to download output from pod must-gather-87vsw: No available strategies to copy.



Version-Release number of selected component (if applicable):
4.5.7

How reproducible:
100% every time when trying to collect output for https://bugzilla.redhat.com/show_bug.cgi?id=1876919 


Steps to Reproduce:
1. SD App SRE team provision OSD cluster on 4.5.7
2. Wait for failure seen in https://bugzilla.redhat.com/show_bug.cgi?id=1876919 (seems to be maybe 30%?)
3. Break glass to login as system:admin
4. oc adm must-gather

Actual results:
Fails to download any output.  Get an empty must-gather.


Expected results:
Output is downloaded.

Additional info:
If I try --keep and use rsh or exec I see the following:

$ oc rsh must-gather-87vsw
error: error sending request: Post https://api.domain.whatever:6443/api/v1/namespaces/openshift-must-gather-fwtzv/pods/must-gather-87vsw/exec?command=%2Fbin%2Fsh&command=-c&command=TERM%3D%22screen.xterm-256color%22+%2Fbin%2Fsh&container=copy&stdin=true&stdout=true&tty=true: EOF

$ oc exec  must-gather-87vsw -- pwd
error: error sending request: Post https://api.domain.whatever:6443/api/v1/namespaces/openshift-must-gather-fwtzv/pods/must-gather-87vsw/exec?command=pwd&container=copy&stderr=true&stdout=true: EOF

Comment 1 Naveen Malik 2020-09-09 19:50:40 UTC
Note I am able to run `oc cluster-info dump --all-namespaces --output-directory=somewhere` so I can get something to file on the related BZ but it's possibly missing things that would be useful.

Comment 3 W. Trevor King 2020-09-09 20:34:17 UTC
From comment 2's private --v=8 must gather:

I0909 16:25:59.761590   66867 util.go:26] error: error sending request: Post https://api.fastt02.i8v0.p1.openshiftapps.com:6443/api/v1/namespaces/openshift-must-gather-8d5bx/pods/must-gather-pblf7/exec?command=rsync&command=--version&container=copy&stderr=true&stdout=true: EOF
I0909 16:25:59.761599   66867 copy_multi.go:30] Error output:
WARNING: cannot use rsync: rsync not available in container

So I expect this is:

1. oc tried to talk to the API server, got an EOF.
2. oc is not distinguishing between "failed in the asking" and "successful ask confirmed no $COMMAND".

I expect the fix is either or both of:

a. Teach 'oc' to retry some reasonable number of times if it gets an EOF or other retry-able error.
b. Teach 'oc' to report "failed in the asking" errors with something that is distinct from "successful ask confirmed no $COMMAND".

And probably also fixing, via a separate bug, whatever is causing the API to EOF these execs.

Comment 4 Maciej Szulik 2020-09-21 10:42:08 UTC
Based on the previous comment where we're talking about improving must-gather I'm moving this to 4.7 since although being important it's not a 4.6 blocker.

Comment 5 Maciej Szulik 2020-10-01 14:50:03 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 Michal Fojtik 2020-10-21 11:12:10 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 7 Naveen Malik 2020-10-21 13:07:28 UTC
@mfojtik any update on this? Don't want it to disappear, I see you're tagged for needsinfo.

Comment 8 Michal Fojtik 2020-10-21 13:12:17 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 9 Maciej Szulik 2020-10-23 10:42:55 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 10 Maciej Szulik 2020-11-13 11:34:59 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 11 Michal Fojtik 2020-11-20 14:12:06 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 12 Naveen Malik 2020-11-20 18:51:14 UTC
@mfojtik still pending info and tagged upcoming sprint.  ETA on this work?  Still hitting this periodically.

Comment 13 Michal Fojtik 2020-11-20 19:12:12 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 14 Maciej Szulik 2020-11-23 09:47:37 UTC
Naveen there's an open PR improving this situation in https://github.com/openshift/oc/pull/631, you can also track https://bugzilla.redhat.com/show_bug.cgi?id=1888192

Comment 15 Maciej Szulik 2020-12-01 08:56:27 UTC

*** This bug has been marked as a duplicate of bug 1888192 ***