Bug 1942938 - [release-4.7] must-gather improvements
Summary: [release-4.7] must-gather improvements
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Jan Chaloupka
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 1942935
Blocks: 1942940
TreeView+ depends on / blocked
 
Reported: 2021-03-25 11:06 UTC by Jan Chaloupka
Modified: 2021-04-20 18:53 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1942935
: 1942940 (view as bug list)
Environment:
Last Closed: 2021-04-20 18:52:40 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 766 0 None open bug 1942938: [release-4.7] inspect clusteroperators as a backup to must-gather if it fails 2021-03-25 12:48:26 UTC
Red Hat Product Errata RHBA-2021:1149 0 None None None 2021-04-20 18:53:02 UTC

Description Jan Chaloupka 2021-03-25 11:06:26 UTC
+++ This bug was initially created as a clone of Bug #1942935 +++

- stop trying to gather metrics and other endpoints directly from pods
  https://github.com/openshift/oc/pull/763
- prevent inspect from panic-ing if pods are missing
  https://github.com/openshift/oc/pull/762
- indicate how many bytes the hidden secret key was
  https://github.com/openshift/oc/pull/752
- inspect clusteroperators as a backup to must-gather if it fails
  https://github.com/openshift/oc/pull/749
- Add summary to oc must-gather
  https://github.com/openshift/oc/pull/738

Comment 1 Maciej Szulik 2021-03-25 13:40:08 UTC
These improvements are significantly helping debugging clusters so I'm bumping priority for those to high.

Comment 2 zhou ying 2021-04-06 08:26:54 UTC
1) [root@localhost oc]# ./oc version 
Client Version: 4.7.99
Server Version: 4.5.37
Kubernetes Version: v1.18.3+cdb0358
[root@localhost oc]# ./oc adm must-gather 
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0e29a7270bff4c9ca267b5a1976d2e4db2a1e86146aca6fc7145cba5b3fc09d8
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 2c9a1b73-9cf7-4d08-b081-d55b6c02a0c3
ClusterVersion: Stable at "4.5.37"
ClusterOperators:
	All healthy and stable
.....


We could see the summary to oc must-gather. 



2) From the output of must-gather for pods :
[root@localhost openshift-etcd]# tree pods/
pods/
├── etcd-ip-10-0-169-230.us-west-1.compute.internal
│   ├── etcd
│   │   └── etcd
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log
│   ├── etcdctl
│   │   └── etcdctl
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log
│   ├── etcd-ensure-env-vars
│   │   └── etcd-ensure-env-vars
│   │       └── logs
│   │           ├── current.log
│   │           ├── previous.insecure.log
│   │           └── previous.log


we could make sure the function for : stop trying to gather metrics and other endpoints directly from pods, has works well . 


3)  Check the function for : indicate how many bytes the hidden secret key was:

[root@localhost ~]# ./oc get secret my-secret -o yaml 
apiVersion: v1
data:
  key1: c3VwZXJzZWNyZXQ=
  key2: dG9wc2VjcmV0
kind: Secret


[root@localhost secrets]# more my-secret.yaml 
---
apiVersion: v1
data:
  key1: MTEgYnl0ZXMgbG9uZw==
  key2: OSBieXRlcyBsb25n
kind: Secret

So this function works well . 

4) Check the function for :inspect clusteroperators as a backup to must-gather if it fails

 [root@localhost oc]# ./oc adm  must-gather --image='quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist' 
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 2c9a1b73-9cf7-4d08-b081-d55b6c02a0c3
ClusterVersion: Stable at "4.5.37"
ClusterOperators:
	All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-9c7sm created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-sjltn created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist created
[must-gather-7cggz] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev:nonexist"
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-sjltn deleted
[must-gather      ] OUT namespace/openshift-must-gather-9c7sm deleted


When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 2c9a1b73-9cf7-4d08-b081-d55b6c02a0c3
ClusterVersion: Stable at "4.5.37"
ClusterOperators:
	All healthy and stable


Gathering data for ns/openshift-config...
Gathering data for ns/openshift-config-managed...
.....

we could see when the must-gather failed , will start the inspect command .

Comment 6 errata-xmlrpc 2021-04-20 18:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.7 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1149


Note You need to log in before you can comment on or make changes to this bug.