A large number of metal ipi (disconnected) e2e runs were failing because the image import of openshift/must-gather:latest (part of payload) was significantly delayed (40m) and so fell back to a hardcoded pull spec that didn't work in the disconnected environment. The utility method that does that lookup was subtly wrong - it used a utility method intended for triggers and other controllers that get "last known image tag value WITH metadata", but in their use case the metadata is not required (the pull spec is present). [must-gather ] OUT unable to resolve the imagestream tag openshift/must-gather:latest ^ not imported yet [must-gather ] OUT [must-gather ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest ^ unable to pull in disconnected enviroments The bug is that neither oc debug or oc adm must-gather need the latest value (wait for import) or the status metadata that import provides (the image id in this case) and therefore those commands should not use that utility method. oc adm release new had a similar problem recently, but in a different direction - it needs the latest input from the spec tag and needs to fail if it hasn't been imported (the use case for new is never "fallback to old") Introduce a new utility method that makes the three scenarios more obvious to a user in library-go and then use it in debug/must-gather. A follow up next release will use it in the other locations so we can remove the old utility method.
A separate bug will be opened for "why import took 40m on metal-ipi ovn" https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-dualstack/1393181497514528768
We're still missing oc bits.
Checked with latest oc , when must-gather imagestream imported failed , the `oc adm must-gather` still could run : [root@localhost roottest]# oc get is must-gather -n openshift -o yaml apiVersion: image.openshift.io/v1 kind: ImageStream metadata: annotations: openshift.io/image.dockerRepositoryCheck: "2021-05-31T07:11:33Z" creationTimestamp: "2021-05-31T07:11:18Z" generation: 2 name: must-gather namespace: openshift resourceVersion: "106528" uid: f663e6b0-6ced-455a-8da1-bd72fdc262d4 spec: lookupPolicy: local: false tags: - annotations: null from: kind: DockerImage name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 generation: 2 importPolicy: {} name: latest referencePolicy: type: Source status: dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/must-gather tags: - conditions: - generation: 2 lastTransitionTime: "2021-05-31T07:11:33Z" message: 'Internal error occurred: [dockerimage.image.openshift.io "ec2-3-137-199-98.us-east-2.compute.amazonaws.com:5000/ocp/release@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5" not found, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5: Get "https://quay.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)]' reason: InternalError status: "False" type: ImportSuccess items: null tag: latest root@localhost roottest]# oc adm must-gather [must-gather ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information. ClusterID: 7d3bbd53-04d2-45e8-90be-fd247b39d949 ClusterVersion: Stable at "4.8.0-0.nightly-2021-05-29-114625" ClusterOperators: clusteroperator/cloud-credential is not upgradeable because Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade. [must-gather ] OUT namespace/openshift-must-gather-rswbl created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-nq6sr created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 created
For the `oc debug` command, when the tools imagestream with wrong for status, still could create debug pod with the spec.from.name . no need to wait for image imported . [root@localhost roottest]# oc get is tools -n openshift -o yaml apiVersion: image.openshift.io/v1 kind: ImageStream metadata: annotations: openshift.io/image.dockerRepositoryCheck: "2021-05-31T07:34:28Z" creationTimestamp: "2021-05-31T07:34:13Z" generation: 2 name: tools namespace: openshift resourceVersion: "115224" uid: 62210ee1-5373-4916-9b09-67a080377e9b spec: lookupPolicy: local: false tags: - annotations: null from: kind: DockerImage name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 generation: 2 importPolicy: {} name: latest referencePolicy: type: Source status: dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/tools tags: - conditions: - generation: 2 lastTransitionTime: "2021-05-31T07:34:28Z" message: 'Internal error occurred: [dockerimage.image.openshift.io "ec2-3-137-199-98.us-east-2.compute.amazonaws.com:5000/ocp/release@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5" not found, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5: Get "https://quay.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)]' reason: InternalError status: "False" type: ImportSuccess items: null tag: latest [root@localhost roottest]# oc get po NAME READY STATUS RESTARTS AGE ip-10-0-74-30us-east-2computeinternal-debug 0/1 ContainerCreating 0 7s [root@localhost roottest]# oc describe po/ip-10-0-74-30us-east-2computeinternal-debug Name: ip-10-0-74-30us-east-2computeinternal-debug Namespace: zhouyt Priority: 0 Node: ip-10-0-74-30.us-east-2.compute.internal/10.0.74.30 Start Time: Mon, 31 May 2021 15:34:47 +0800 Labels: <none> Annotations: debug.openshift.io/source-container: container-00 debug.openshift.io/source-resource: /v1, Resource=nodes/ip-10-0-74-30.us-east-2.compute.internal openshift.io/scc: node-exporter Status: Pending IP: 10.0.74.30 IPs: IP: 10.0.74.30 Containers: container-00: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 Image ID: Port: <none> Host Port: <none> Command: /bin/sh State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /host from host (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-kjzwz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: host: Type: HostPath (bare host directory volume) Path: / HostPathType: Directory default-token-kjzwz: Type: Secret (a volume populated by a Secret) SecretName: default-token-kjzwz Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 16s kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438