Bug 1707096

Summary: install-gather is missing information
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: medium CC: bleanhar, calfonso
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-21 18:00:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Eads 2019-05-06 19:35:16 UTC
To debug install problems, the install-gather tool needs to add functionality for

1. de-conflict container names so that container logs are not lost.  Perhaps "prettyName-ID"
2. names and namespaces of secrets. I don't have to have the content, but we need to know what exists and what doesn't to determine which piece of the multi-stage flow is broken.

Comment 1 Abhinav Dahiya 2019-05-06 19:41:25 UTC
(In reply to David Eads from comment #0)

> 2. names and namespaces of secrets. I don't have to have the content, but we
> need to know what exists and what doesn't to determine which piece of the
> multi-stage flow is broken.

a) what sources can the install-gather script use to fetch these secrets. From the disk on bootstrap node `/opt/openshift` or from the API?

b) there have been concerns around gathering secrets? How do you recommend we collect that information without leaking those?

c) Why do you need the gather script to collect if secrets "exists" to debug. shouldn't the container logs provide information on what secrets it was looking for that it couldn't find?

Comment 2 David Eads 2019-05-06 19:55:05 UTC
a) use the API the way you gather the other resources

b) sos tool uses a post-processing regex to remove secrets.  must-gather elides like this: https://github.com/openshift/must-gather/blob/master/pkg/cmd/inspect/secret.go#L75-L85

c) It's the difference between hours and seconds in terms of knowing where a failure happens.  Most of these are optional and tolerated.  The behavior of the system is driven based on what is available.  Also keep in mind that logs are essentially streams of diffs that allow you to painstakingly, with information from multiple operators, from logs which may or may not exist, rebuild current state.  Or someone can provide the output `oc get secrets --all-namespaces <something to dump all annotations>` and save the hours of work per instance.

Comment 4 Scott Dodson 2019-10-21 18:00:48 UTC
It looks like both of these have already been addressed.