Bug 1707096 - install-gather is missing information
Summary: install-gather is missing information
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-06 19:35 UTC by David Eads
Modified: 2019-10-21 18:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-21 18:00:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Eads 2019-05-06 19:35:16 UTC
To debug install problems, the install-gather tool needs to add functionality for

1. de-conflict container names so that container logs are not lost.  Perhaps "prettyName-ID"
2. names and namespaces of secrets. I don't have to have the content, but we need to know what exists and what doesn't to determine which piece of the multi-stage flow is broken.

Comment 1 Abhinav Dahiya 2019-05-06 19:41:25 UTC
(In reply to David Eads from comment #0)

> 2. names and namespaces of secrets. I don't have to have the content, but we
> need to know what exists and what doesn't to determine which piece of the
> multi-stage flow is broken.

a) what sources can the install-gather script use to fetch these secrets. From the disk on bootstrap node `/opt/openshift` or from the API?

b) there have been concerns around gathering secrets? How do you recommend we collect that information without leaking those?

c) Why do you need the gather script to collect if secrets "exists" to debug. shouldn't the container logs provide information on what secrets it was looking for that it couldn't find?

Comment 2 David Eads 2019-05-06 19:55:05 UTC
a) use the API the way you gather the other resources

b) sos tool uses a post-processing regex to remove secrets.  must-gather elides like this: https://github.com/openshift/must-gather/blob/master/pkg/cmd/inspect/secret.go#L75-L85

c) It's the difference between hours and seconds in terms of knowing where a failure happens.  Most of these are optional and tolerated.  The behavior of the system is driven based on what is available.  Also keep in mind that logs are essentially streams of diffs that allow you to painstakingly, with information from multiple operators, from logs which may or may not exist, rebuild current state.  Or someone can provide the output `oc get secrets --all-namespaces <something to dump all annotations>` and save the hours of work per instance.

Comment 4 Scott Dodson 2019-10-21 18:00:48 UTC
It looks like both of these have already been addressed.


Note You need to log in before you can comment on or make changes to this bug.