1707096 – install-gather is missing information

Bug 1707096 - install-gather is missing information

Summary: install-gather is missing information

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Abhinav Dahiya
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-06 19:35 UTC by David Eads
Modified:	2019-10-21 18:00 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-21 18:00:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description David Eads 2019-05-06 19:35:16 UTC

To debug install problems, the install-gather tool needs to add functionality for

1. de-conflict container names so that container logs are not lost.  Perhaps "prettyName-ID"
2. names and namespaces of secrets. I don't have to have the content, but we need to know what exists and what doesn't to determine which piece of the multi-stage flow is broken.

Comment 1 Abhinav Dahiya 2019-05-06 19:41:25 UTC

(In reply to David Eads from comment #0)

> 2. names and namespaces of secrets. I don't have to have the content, but we
> need to know what exists and what doesn't to determine which piece of the
> multi-stage flow is broken.

a) what sources can the install-gather script use to fetch these secrets. From the disk on bootstrap node `/opt/openshift` or from the API?

b) there have been concerns around gathering secrets? How do you recommend we collect that information without leaking those?

c) Why do you need the gather script to collect if secrets "exists" to debug. shouldn't the container logs provide information on what secrets it was looking for that it couldn't find?

Comment 2 David Eads 2019-05-06 19:55:05 UTC

a) use the API the way you gather the other resources

b) sos tool uses a post-processing regex to remove secrets.  must-gather elides like this: https://github.com/openshift/must-gather/blob/master/pkg/cmd/inspect/secret.go#L75-L85

c) It's the difference between hours and seconds in terms of knowing where a failure happens.  Most of these are optional and tolerated.  The behavior of the system is driven based on what is available.  Also keep in mind that logs are essentially streams of diffs that allow you to painstakingly, with information from multiple operators, from logs which may or may not exist, rebuild current state.  Or someone can provide the output `oc get secrets --all-namespaces <something to dump all annotations>` and save the hours of work per instance.

Comment 4 Scott Dodson 2019-10-21 18:00:48 UTC

It looks like both of these have already been addressed.

Note You need to log in before you can comment on or make changes to this bug.