1712637 – imagestreams.image.openshift.io "must-gather" not found when installer timeout during cluster initialisation

Bug 1712637 - imagestreams.image.openshift.io "must-gather" not found when installer timeout during cluster initialisation

Summary: imagestreams.image.openshift.io "must-gather" not found when installer timeou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Jan Chaloupka
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:	4.1.4
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-22 02:50 UTC by Praveen Kumar
Modified:	2022-08-29 10:14 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-04 09:01:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:1635	0	None	None	None	2019-07-04 09:01:33 UTC

Description Praveen Kumar 2019-05-22 02:50:35 UTC

Description of problem: Currently libvirt-e2e are failing on the CI and must-gather error out with `imagestreams.image.openshift.io "must-gather" not found` during the log collection.

- Failure run artifacts : https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/1628/pull-ci-openshift-installer-master-e2e-libvirt/463/artifacts/e2e-libvirt/ 

- https://github.com/openshift/installer/pull/1628


Version-Release number of selected component (if applicable): 


How reproducible:


Steps to Reproduce:
1. Trigger a CI job for installer repo using `/test e2e-libvirt`
2. wait for it to run and if it is fail then check the logs.

Actual results:

```
$ curl -L https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1628/pull-ci-openshift-installer-master-e2e-libvirt/464/artifacts/e2e-libvirt/container-logs/teardown.log | gzip -d -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1222  100  1222    0     0   7148      0 --:--:-- --:--:-- --:--:--  7188
Activated service account credentials for: [jenkins-ci-provisioner.gserviceaccount.com]
Updated property [core/project].
Updated property [compute/zone].
+ set +e
+ echo 'Collect all the info about clusteroperators'
Collect all the info about clusteroperators
+ LD_PRELOAD=/usr/lib64/libnss_wrapper.so
+ tee /tmp/artifacts/output-co-libvirt
+ gcloud compute --project openshift-gce-devel-ci ssh --zone us-east1-c packer@ci-op-qbcsxq1j-dee8c --command 'export KUBECONFIG=/home/$USER/clusters/installer/auth/kubeconfig && bash -ce "oc get co"'
NAME                                 VERSION                   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                 Unknown     Unknown       True       2m52s
cloud-credential                     0.0.1-2019-05-22-013339   True        False         False      33m
cluster-autoscaler                   0.0.1-2019-05-22-013339   True        False         False      31m
console                              0.0.1-2019-05-22-013339   Unknown     True          False      9m57s
dns                                  0.0.1-2019-05-22-013339   True        False         False      33m
image-registry                                                 False       True          False      2m55s
ingress                              unknown                   False       True          False      2m44s
kube-apiserver                       0.0.1-2019-05-22-013339   True        False         False      31m
kube-controller-manager              0.0.1-2019-05-22-013339   True        False         False      14m
kube-scheduler                       0.0.1-2019-05-22-013339   True        False         False      30m
machine-api                          0.0.1-2019-05-22-013339   True        False         False      33m
machine-config                       0.0.1-2019-05-22-013339   True        False         False      32m
marketplace                          0.0.1-2019-05-22-013339   True        False         False      2m10s
monitoring                                                     False       True          True       68s
network                              0.0.1-2019-05-22-013339   True        False         False      34m
node-tuning                          0.0.1-2019-05-22-013339   True        False         False      2m35s
openshift-apiserver                  0.0.1-2019-05-22-013339   True        False         False      42s
openshift-controller-manager         0.0.1-2019-05-22-013339   True        False         False      14m
openshift-samples                                              True        True          False      9m20s
operator-lifecycle-manager           0.0.1-2019-05-22-013339   True        False         False      32m
operator-lifecycle-manager-catalog   0.0.1-2019-05-22-013339   True        False         False      32m
service-ca                           0.0.1-2019-05-22-013339   True        False         False      33m
service-catalog-apiserver            0.0.1-2019-05-22-013339   True        False         False      2m48s
service-catalog-controller-manager   0.0.1-2019-05-22-013339   True        False         False      2m54s
storage                              0.0.1-2019-05-22-013339   True        False         False      2m55s
+ echo 'Run must gather on the cluster'
+ LD_PRELOAD=/usr/lib64/libnss_wrapper.so
+ gcloud compute --project openshift-gce-devel-ci ssh --zone us-east1-c packer@ci-op-qbcsxq1j-dee8c --command 'mkdir -p $HOME/must-gather && export KUBECONFIG=$HOME/clusters/installer/auth/kubeconfig && bash -ce "oc adm must-gather --dest-dir $HOME/must-gather || true"'
Run must gather on the cluster
Error from server (NotFound): imagestreams.image.openshift.io "must-gather" not found
scp everything related to installer back to pod
```


Expected results:

Must gather should collect the logs or atleast wait till that imagestream available and then collect the logs.


Additional info:

Comment 1 Xingxing Xia 2019-05-22 07:14:53 UTC

In normal env, after `oc delete is must-gather -n openshift`, immediately running `oc adm must-gather` got "Error from server (NotFound): imagestreams.image.openshift.io "must-gather" not found". Maybe `oc adm must-gather` should be designed to not depend on imagestream/must-gather?

Comment 2 Tomáš Nožička 2019-05-22 09:16:41 UTC

I not sure why we need imagestream for must gather, given the tool needs to be run during various stages where imagestreams may not be even available. They are served form an aggregated apiserver which may as well be down. Maciej will know why that was chosen.

I'd image we use something in lines of `oc adm release info ${RELEASE_IMAGE} --image-for=must-gather` internally to determine that info.

I guess you can just run that image without wrapping it in oc adm for now.

Comment 3 Maciej Szulik 2019-05-28 15:06:09 UTC

I agree with what was said before, we should tolerate some APIs not being present due to many implications. We should report the fact, but still continue invocations.

Comment 4 Luis Sanchez 2019-05-30 19:59:50 UTC

PR https://github.com/openshift/origin/pull/22974

Comment 5 W. Trevor King 2019-05-30 22:21:27 UTC

Not actually POST for this 4.1.z bug until that gets backported to release-4.1, right?

Comment 6 Michal Fojtik 2019-06-03 09:41:13 UTC

https://github.com/openshift/origin/pull/23001

Comment 8 Xingxing Xia 2019-06-27 09:02:35 UTC

Tested in oc v4.1.4 GitCommit:"c9e4f28ff", BuildDate:"2019-06-26T20:05:55Z":
$ while true; do   oc delete is must-gather -n openshift; done # The is is auto back, so use loop here
$ oc adm must-gather
imagestreams.image.openshift.io "must-gather" not found
Using image: quay.io/openshift/origin-must-gather:latest
...

It hard coded "origin" image in OCP product. Need this be fixed?

Comment 10 errata-xmlrpc 2019-07-04 09:01:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635

Note You need to log in before you can comment on or make changes to this bug.