Bug 1877548 - must-gather fails, unable to download output: No available strategies to copy [NEEDINFO]
Summary: must-gather fails, unable to download output: No available strategies to copy
Keywords:
Status: CLOSED DUPLICATE of bug 1888192
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.7.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-09 19:41 UTC by Naveen Malik
Modified: 2020-12-01 08:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-01 08:56:27 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)

Description Naveen Malik 2020-09-09 19:41:54 UTC
Description of problem:
Some 4.5.7 clusters I have tried to collect a must-gather on have failed.  Short summary of client output of warnings and errors:

WARNING: cannot use rsync: rsync not available in container
WARNING: cannot use tar: tar not available in container
[must-gather-87vsw] OUT gather output not downloaded: No available strategies to copy.
[must-gather-87vsw] OUT
error: unable to download output from pod must-gather-87vsw: No available strategies to copy.



Version-Release number of selected component (if applicable):
4.5.7

How reproducible:
100% every time when trying to collect output for https://bugzilla.redhat.com/show_bug.cgi?id=1876919 


Steps to Reproduce:
1. SD App SRE team provision OSD cluster on 4.5.7
2. Wait for failure seen in https://bugzilla.redhat.com/show_bug.cgi?id=1876919 (seems to be maybe 30%?)
3. Break glass to login as system:admin
4. oc adm must-gather

Actual results:
Fails to download any output.  Get an empty must-gather.


Expected results:
Output is downloaded.

Additional info:
If I try --keep and use rsh or exec I see the following:

$ oc rsh must-gather-87vsw
error: error sending request: Post https://api.domain.whatever:6443/api/v1/namespaces/openshift-must-gather-fwtzv/pods/must-gather-87vsw/exec?command=%2Fbin%2Fsh&command=-c&command=TERM%3D%22screen.xterm-256color%22+%2Fbin%2Fsh&container=copy&stdin=true&stdout=true&tty=true: EOF

$ oc exec  must-gather-87vsw -- pwd
error: error sending request: Post https://api.domain.whatever:6443/api/v1/namespaces/openshift-must-gather-fwtzv/pods/must-gather-87vsw/exec?command=pwd&container=copy&stderr=true&stdout=true: EOF

Comment 1 Naveen Malik 2020-09-09 19:50:40 UTC
Note I am able to run `oc cluster-info dump --all-namespaces --output-directory=somewhere` so I can get something to file on the related BZ but it's possibly missing things that would be useful.

Comment 3 W. Trevor King 2020-09-09 20:34:17 UTC
From comment 2's private --v=8 must gather:

I0909 16:25:59.761590   66867 util.go:26] error: error sending request: Post https://api.fastt02.i8v0.p1.openshiftapps.com:6443/api/v1/namespaces/openshift-must-gather-8d5bx/pods/must-gather-pblf7/exec?command=rsync&command=--version&container=copy&stderr=true&stdout=true: EOF
I0909 16:25:59.761599   66867 copy_multi.go:30] Error output:
WARNING: cannot use rsync: rsync not available in container

So I expect this is:

1. oc tried to talk to the API server, got an EOF.
2. oc is not distinguishing between "failed in the asking" and "successful ask confirmed no $COMMAND".

I expect the fix is either or both of:

a. Teach 'oc' to retry some reasonable number of times if it gets an EOF or other retry-able error.
b. Teach 'oc' to report "failed in the asking" errors with something that is distinct from "successful ask confirmed no $COMMAND".

And probably also fixing, via a separate bug, whatever is causing the API to EOF these execs.

Comment 4 Maciej Szulik 2020-09-21 10:42:08 UTC
Based on the previous comment where we're talking about improving must-gather I'm moving this to 4.7 since although being important it's not a 4.6 blocker.

Comment 5 Maciej Szulik 2020-10-01 14:50:03 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 Michal Fojtik 2020-10-21 11:12:10 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 7 Naveen Malik 2020-10-21 13:07:28 UTC
@mfojtik any update on this? Don't want it to disappear, I see you're tagged for needsinfo.

Comment 8 Michal Fojtik 2020-10-21 13:12:17 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 9 Maciej Szulik 2020-10-23 10:42:55 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 10 Maciej Szulik 2020-11-13 11:34:59 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 11 Michal Fojtik 2020-11-20 14:12:06 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 12 Naveen Malik 2020-11-20 18:51:14 UTC
@mfojtik still pending info and tagged upcoming sprint.  ETA on this work?  Still hitting this periodically.

Comment 13 Michal Fojtik 2020-11-20 19:12:12 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 14 Maciej Szulik 2020-11-23 09:47:37 UTC
Naveen there's an open PR improving this situation in https://github.com/openshift/oc/pull/631, you can also track https://bugzilla.redhat.com/show_bug.cgi?id=1888192

Comment 15 Maciej Szulik 2020-12-01 08:56:27 UTC

*** This bug has been marked as a duplicate of bug 1888192 ***


Note You need to log in before you can comment on or make changes to this bug.