Bug 1999891

Summary: must-gather collects backup data even when Pods fails to be created
Product: OpenShift Container Platform Reporter: Michael Washer <mwasher>
Component: ocAssignee: Maciej Szulik <maszulik>
oc sub component: oc QA Contact: RamaKasturi <knarra>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: knarra, mfojtik, yinzhou
Version: 4.8   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Some errors were not being printed. Consequence: must-gather output did not contain information about problems and thus made it hard to figure out what failed. Fix: Bubble errors to make them visible to the user at any stage in the must-gather run. Result: The output provided by must-gather contains more detailed information about what went wrong, when and why.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:37:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Washer 2021-08-31 22:58:04 UTC
Description of problem:
The Must-Gather Pod can fail to be created due to issues with the parameters provided. This does not output the problem but instead attempts to collect the backup data.

Version-Release number of selected component (if applicable):
Current (OCP 4.6+)

How reproducible:
Every time

Steps to Reproduce:
1. Run `oc adm must-gather --node-name="node/abvbasib"

Actual results:
Will attempt to collect the backup data, as if the Must-Gather Pod had failed.

Expected results:
Present an error that the Node-Name parameter is formatted incorrectly

Additional info:
When looking at the following snippets of code:

Where the error is returned from the Pod creation
https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L325-L328

This deferred function then runs to collect the data:
https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L259-L264

But here there appears to have a sentiment that if the issue is caused by the user arguments, we should not run the backup collection.
https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L395-L399

This is an issue as the backup collection can take a large amount of time to collect and should not be needed if the must-gather can be run with the correct arguments.

Comment 2 Maciej Szulik 2022-03-16 16:21:32 UTC
https://github.com/openshift/oc/pull/1013 improved the messages in must-gather to make it clear what's happening and why.

Comment 8 zhou ying 2022-06-24 08:37:53 UTC
can't reproduce the issue now :

oc version --client
Client Version: 4.11.0-0.nightly-2022-06-24-041539
Kustomize Version: v4.5.4

oc adm must-gather --node-name="node/abvbasib"
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:82aa287dc11d558b9af6b261fe21b753c450649a8e8e3a7e0ef3e1440d8ec3c0
error: --node-name may not contain '/' or '%'
[root@localhost ~]# oc adm must-gather --node-name="sssdadkaadsl"
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:82aa287dc11d558b9af6b261fe21b753c450649a8e8e3a7e0ef3e1440d8ec3c0
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 182312bf-b2b7-432a-a9e5-3c4e3eb42db1
ClusterVersion: Stable at "4.11.0-0.nightly-2022-06-23-153912"
ClusterOperators:
	All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-6lqcd created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-kvxqv created
[must-gather      ] OUT namespace/openshift-must-gather-6lqcd deleted
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-kvxqv deleted


Error running must-gather collection:
    nodes "sssdadkaadsl" not found

Falling back to `oc adm inspect clusteroperators.v1.config.openshift.io` to collect basic cluster information.
....

Comment 9 errata-xmlrpc 2022-08-10 10:37:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069