Bug 1960758 - oc debug / oc adm must-gather do not require openshift/tools and openshift/must-gather to be "the newest"
Summary: oc debug / oc adm must-gather do not require openshift/tools and openshift/mu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Clayton Coleman
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-14 18:50 UTC by Clayton Coleman
Modified: 2021-07-27 23:08 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift library-go pull 1078 0 None closed Bug 1960758: New helper for oc adm must-gather / oc debug 2021-05-27 01:21:19 UTC
Github openshift oc pull 833 0 None open Bug 1960758: use recent pull spec for must-gather and debug 2021-05-27 01:21:17 UTC
Github openshift origin pull 26185 0 None open Bug 1960758: test/extended/cli/mustgather: Local-only getPluginOutputDir 2021-05-26 17:25:29 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:57 UTC

Description Clayton Coleman 2021-05-14 18:50:11 UTC
A large number of metal ipi (disconnected) e2e runs were failing because the image import of openshift/must-gather:latest (part of payload) was significantly delayed (40m) and so fell back to a hardcoded pull spec that didn't work in the disconnected environment.  The utility method that does that lookup was subtly wrong - it used a utility method intended for triggers and other controllers that get "last known image tag value WITH metadata", but in their use case the metadata is not required (the pull spec is present).

[must-gather      ] OUT unable to resolve the imagestream tag openshift/must-gather:latest

^ not imported yet

[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest

^ unable to pull in disconnected enviroments

The bug is that neither oc debug or oc adm must-gather need the latest value (wait for import) or the status metadata that import provides (the image id in this case) and therefore those commands should not use that utility method.  oc adm release new had a similar problem recently, but in a different direction - it needs the latest input from the spec tag and needs to fail if it hasn't been imported (the use case for new is never "fallback to old")

Introduce a new utility method that makes the three scenarios more obvious to a user in library-go and then use it in debug/must-gather.  A follow up next release will use it in the other locations so we can remove the old utility method.

Comment 1 Clayton Coleman 2021-05-14 18:53:50 UTC
A separate bug will be opened for "why import took 40m on metal-ipi ovn" https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-dualstack/1393181497514528768

Comment 3 Maciej Szulik 2021-05-24 10:21:37 UTC
We're still missing oc bits.

Comment 5 zhou ying 2021-05-31 07:19:37 UTC
Checked with latest oc , when must-gather imagestream imported failed , the `oc adm must-gather` still could run :

[root@localhost roottest]# oc get is must-gather -n openshift -o yaml 
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2021-05-31T07:11:33Z"
  creationTimestamp: "2021-05-31T07:11:18Z"
  generation: 2
  name: must-gather
  namespace: openshift
  resourceVersion: "106528"
  uid: f663e6b0-6ced-455a-8da1-bd72fdc262d4
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5
    generation: 2
    importPolicy: {}
    name: latest
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/must-gather
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2021-05-31T07:11:33Z"
      message: 'Internal error occurred: [dockerimage.image.openshift.io "ec2-3-137-199-98.us-east-2.compute.amazonaws.com:5000/ocp/release@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5"
        not found, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5:
        Get "https://quay.io/v2/": net/http: request canceled while waiting for connection
        (Client.Timeout exceeded while awaiting headers)]'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: latest



root@localhost roottest]# oc adm must-gather 
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 7d3bbd53-04d2-45e8-90be-fd247b39d949
ClusterVersion: Stable at "4.8.0-0.nightly-2021-05-29-114625"
ClusterOperators:
	clusteroperator/cloud-credential is not upgradeable because Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.


[must-gather      ] OUT namespace/openshift-must-gather-rswbl created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-nq6sr created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5 created

Comment 6 zhou ying 2021-05-31 07:37:19 UTC
For the `oc debug` command, when the tools imagestream with wrong for status, still could create debug pod with the spec.from.name . no need to wait for image imported . 
[root@localhost roottest]# oc get is tools -n openshift -o yaml 
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2021-05-31T07:34:28Z"
  creationTimestamp: "2021-05-31T07:34:13Z"
  generation: 2
  name: tools
  namespace: openshift
  resourceVersion: "115224"
  uid: 62210ee1-5373-4916-9b09-67a080377e9b
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5
    generation: 2
    importPolicy: {}
    name: latest
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/tools
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2021-05-31T07:34:28Z"
      message: 'Internal error occurred: [dockerimage.image.openshift.io "ec2-3-137-199-98.us-east-2.compute.amazonaws.com:5000/ocp/release@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5"
        not found, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5:
        Get "https://quay.io/v2/": net/http: request canceled while waiting for connection
        (Client.Timeout exceeded while awaiting headers)]'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: latest



[root@localhost roottest]# oc get po 
NAME                                          READY   STATUS              RESTARTS   AGE
ip-10-0-74-30us-east-2computeinternal-debug   0/1     ContainerCreating   0          7s
[root@localhost roottest]# oc describe po/ip-10-0-74-30us-east-2computeinternal-debug
Name:         ip-10-0-74-30us-east-2computeinternal-debug
Namespace:    zhouyt
Priority:     0
Node:         ip-10-0-74-30.us-east-2.compute.internal/10.0.74.30
Start Time:   Mon, 31 May 2021 15:34:47 +0800
Labels:       <none>
Annotations:  debug.openshift.io/source-container: container-00
              debug.openshift.io/source-resource: /v1, Resource=nodes/ip-10-0-74-30.us-east-2.compute.internal
              openshift.io/scc: node-exporter
Status:       Pending
IP:           10.0.74.30
IPs:
  IP:  10.0.74.30
Containers:
  container-00:
    Container ID:  
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host from host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kjzwz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  host:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
  default-token-kjzwz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kjzwz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulling  16s   kubelet  Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87958c00bf1b4dd0abd777bb9240c3f38ce139930bbbf58cfddbb59d389d7ad5"

Comment 9 errata-xmlrpc 2021-07-27 23:08:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.