Description of problem: Currently libvirt-e2e are failing on the CI and must-gather error out with `imagestreams.image.openshift.io "must-gather" not found` during the log collection. - Failure run artifacts : https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/1628/pull-ci-openshift-installer-master-e2e-libvirt/463/artifacts/e2e-libvirt/ - https://github.com/openshift/installer/pull/1628 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Trigger a CI job for installer repo using `/test e2e-libvirt` 2. wait for it to run and if it is fail then check the logs. Actual results: ``` $ curl -L https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1628/pull-ci-openshift-installer-master-e2e-libvirt/464/artifacts/e2e-libvirt/container-logs/teardown.log | gzip -d - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1222 100 1222 0 0 7148 0 --:--:-- --:--:-- --:--:-- 7188 Activated service account credentials for: [jenkins-ci-provisioner.gserviceaccount.com] Updated property [core/project]. Updated property [compute/zone]. + set +e + echo 'Collect all the info about clusteroperators' Collect all the info about clusteroperators + LD_PRELOAD=/usr/lib64/libnss_wrapper.so + tee /tmp/artifacts/output-co-libvirt + gcloud compute --project openshift-gce-devel-ci ssh --zone us-east1-c packer@ci-op-qbcsxq1j-dee8c --command 'export KUBECONFIG=/home/$USER/clusters/installer/auth/kubeconfig && bash -ce "oc get co"' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication Unknown Unknown True 2m52s cloud-credential 0.0.1-2019-05-22-013339 True False False 33m cluster-autoscaler 0.0.1-2019-05-22-013339 True False False 31m console 0.0.1-2019-05-22-013339 Unknown True False 9m57s dns 0.0.1-2019-05-22-013339 True False False 33m image-registry False True False 2m55s ingress unknown False True False 2m44s kube-apiserver 0.0.1-2019-05-22-013339 True False False 31m kube-controller-manager 0.0.1-2019-05-22-013339 True False False 14m kube-scheduler 0.0.1-2019-05-22-013339 True False False 30m machine-api 0.0.1-2019-05-22-013339 True False False 33m machine-config 0.0.1-2019-05-22-013339 True False False 32m marketplace 0.0.1-2019-05-22-013339 True False False 2m10s monitoring False True True 68s network 0.0.1-2019-05-22-013339 True False False 34m node-tuning 0.0.1-2019-05-22-013339 True False False 2m35s openshift-apiserver 0.0.1-2019-05-22-013339 True False False 42s openshift-controller-manager 0.0.1-2019-05-22-013339 True False False 14m openshift-samples True True False 9m20s operator-lifecycle-manager 0.0.1-2019-05-22-013339 True False False 32m operator-lifecycle-manager-catalog 0.0.1-2019-05-22-013339 True False False 32m service-ca 0.0.1-2019-05-22-013339 True False False 33m service-catalog-apiserver 0.0.1-2019-05-22-013339 True False False 2m48s service-catalog-controller-manager 0.0.1-2019-05-22-013339 True False False 2m54s storage 0.0.1-2019-05-22-013339 True False False 2m55s + echo 'Run must gather on the cluster' + LD_PRELOAD=/usr/lib64/libnss_wrapper.so + gcloud compute --project openshift-gce-devel-ci ssh --zone us-east1-c packer@ci-op-qbcsxq1j-dee8c --command 'mkdir -p $HOME/must-gather && export KUBECONFIG=$HOME/clusters/installer/auth/kubeconfig && bash -ce "oc adm must-gather --dest-dir $HOME/must-gather || true"' Run must gather on the cluster Error from server (NotFound): imagestreams.image.openshift.io "must-gather" not found scp everything related to installer back to pod ``` Expected results: Must gather should collect the logs or atleast wait till that imagestream available and then collect the logs. Additional info:
In normal env, after `oc delete is must-gather -n openshift`, immediately running `oc adm must-gather` got "Error from server (NotFound): imagestreams.image.openshift.io "must-gather" not found". Maybe `oc adm must-gather` should be designed to not depend on imagestream/must-gather?
I not sure why we need imagestream for must gather, given the tool needs to be run during various stages where imagestreams may not be even available. They are served form an aggregated apiserver which may as well be down. Maciej will know why that was chosen. I'd image we use something in lines of `oc adm release info ${RELEASE_IMAGE} --image-for=must-gather` internally to determine that info. I guess you can just run that image without wrapping it in oc adm for now.
I agree with what was said before, we should tolerate some APIs not being present due to many implications. We should report the fact, but still continue invocations.
PR https://github.com/openshift/origin/pull/22974
Not actually POST for this 4.1.z bug until that gets backported to release-4.1, right?
https://github.com/openshift/origin/pull/23001
Tested in oc v4.1.4 GitCommit:"c9e4f28ff", BuildDate:"2019-06-26T20:05:55Z": $ while true; do oc delete is must-gather -n openshift; done # The is is auto back, so use loop here $ oc adm must-gather imagestreams.image.openshift.io "must-gather" not found Using image: quay.io/openshift/origin-must-gather:latest ... It hard coded "origin" image in OCP product. Need this be fixed?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1635