1942940 – [release-4.6] must-gather improvements

Bug 1942940 - [release-4.6] must-gather improvements

Summary: [release-4.6] must-gather improvements

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Jan Chaloupka
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:	1942938
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-25 11:09 UTC by Jan Chaloupka
Modified:	2021-04-27 14:21 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1942938
Environment:
Last Closed:	2021-04-27 14:20:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift oc pull 782	0	None	open	bug 1942940: [release-4.6] inspect clusteroperators as a backup to must-gather if it fails	2021-03-25 11:28:17 UTC
Red Hat Product Errata	RHBA-2021:1232	0	None	None	None	2021-04-27 14:21:11 UTC

Description Jan Chaloupka 2021-03-25 11:09:00 UTC

+++ This bug was initially created as a clone of Bug #1942938 +++

+++ This bug was initially created as a clone of Bug #1942935 +++

- stop trying to gather metrics and other endpoints directly from pods
  https://github.com/openshift/oc/pull/763
- prevent inspect from panic-ing if pods are missing
  https://github.com/openshift/oc/pull/762
- indicate how many bytes the hidden secret key was
  https://github.com/openshift/oc/pull/752
- inspect clusteroperators as a backup to must-gather if it fails
  https://github.com/openshift/oc/pull/749
- Add summary to oc must-gather
  https://github.com/openshift/oc/pull/738

Comment 1 Maciej Szulik 2021-03-25 13:40:54 UTC

These improvements are significantly helping debugging clusters so I'm bumping priority for those to high.

Comment 3 zhou ying 2021-04-16 05:17:16 UTC

[root@localhost ~]# ./oc version 
Client Version: 4.6.0-0.nightly-2021-04-15-200627
Server Version: 4.8.0-0.nightly-2021-04-15-030836
Kubernetes Version: v1.21.0-rc.0+65c2569
[root@localhost ~]# ./oc adm must-gather 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c6aa63a1e62840702d5bbc7e4a3ff65e103a5cae470e2cd68d2fb1719569ab17
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 4d3ec0f1-b1c7-4483-b5fe-af4fa5b77257
ClusterVersion: Stable at "4.8.0-0.nightly-2021-04-15-030836"
ClusterOperators:
	All healthy and stable
....

So we could make sure https://github.com/openshift/oc/pull/738 works well;


From the output of must-gather, can't see the endpoints for pods:
├── etcd-ip-10-0-200-201.us-east-2.compute.internal
│   ├── etcd
│   │   └── etcd
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log
│   ├── etcdctl
│   │   └── etcdctl
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log
│   ├── etcd-ensure-env-vars
│   │   └── etcd-ensure-env-vars
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log

So make sure https://github.com/openshift/oc/pull/763 works well . 



[root@localhost ~]# ./oc get secret my-secret -o yaml 
apiVersion: v1
data:
  key1: c3VwZXJzZWNyZXQ=
  key2: dG9wc2VjcmV0
...
[root@localhost ~]# cat inspect.local.82254221142416974/namespaces/zhouytse/core/secrets/my-secret.yaml 
---
apiVersion: v1
data:
  key1: MTEgYnl0ZXMgbG9uZw==
  key2: OSBieXRlcyBsb25n
kind: Secret

From the inspect output , make sure https://github.com/openshift/oc/pull/752 works well . 

[root@localhost ~]# ./oc adm  must-gather --image='quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist'
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 4d3ec0f1-b1c7-4483-b5fe-af4fa5b77257
ClusterVersion: Stable at "4.8.0-0.nightly-2021-04-15-030836"
ClusterOperators:
	All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-jz2sd created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-xbngr created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist created
[must-gather-5cf5q] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist"
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-xbngr deleted
[must-gather      ] OUT namespace/openshift-must-gather-jz2sd deleted

。。。。
Wrote inspect data to must-gather.local.2065602046391988706.

Comment 6 errata-xmlrpc 2021-04-27 14:20:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.26 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1232

Note You need to log in before you can comment on or make changes to this bug.