Bug 2066441 - Upgrade: operator image precaching failed at extract_bundle_names when more than one catalog sources included in one policy or in same CGU
Summary: Upgrade: operator image precaching failed at extract_bundle_names when more t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.10.z
Assignee: Vitaly Grinberg
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-21 19:02 UTC by yliu1
Modified: 2022-07-11 15:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-11 15:28:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cluster-group-upgrades-operator pull 136 0 None open Fix OLM index extraction 2022-03-29 15:42:14 UTC
Red Hat Product Errata RHBA-2022:5514 0 None None None 2022-07-11 15:28:44 UTC

Description yliu1 2022-03-21 19:02:55 UTC
Description of problem:
Precaching operator images fails at parsing index image with following trace:

list index out of range
Traceback (most recent call last):
  File "/opt/precache/parse_index.py", line 76, in <module>
    bundles = extract_bundle_names(args)
  File "/opt/precache/parse_index.py", line 42, in extract_bundle_names
    item[1].strip()][0].strip(f"{pkg_name}.")
IndexError: list index out of range
upgrades.pre-cache 2022-03-21T18:53:16+00:00 DEBUG extract_pull_spec failed


Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always

Steps to Reproduce:
1. Start CGU containing catlog source config policy and subscriptions policy with precaching enabled 
2. Observe precache pod created and started on spoke
3. Check precache status

Actual results:
- Precahe pod in Error state. It failed at extract_bundle_names when parsing index image.

Expected results:
- Operator images are successfully precached.


Additional info:

# Precache pod logs:

[kni ~]$ oc logs -n openshift-talo-pre-cache pre-cache--1-w2vlb -f 
upgrades.pre-cache 2022-03-21T18:53:04+00:00 DEBUG Release index is not specified. Release images will not be pre-cached
upgrades.pre-cache 2022-03-21T18:53:10+00:00 DEBUG registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10 image ID is a10961a010b370f50d6042ccd4882df96842533bd910959ebd62b90170b277e2
upgrades.pre-cache 2022-03-21T18:53:10+00:00 DEBUG Image mount: /var/lib/containers/storage/vfs/dir/7e6c93ca7cc20f1624d13f214f15c6eecec85489943dd4af01993385ef282b1d
time="2022-03-21T18:53:11Z" level=warning msg="\x1b[1;33mDEPRECATION NOTICE:\nSqlite-based catalogs and their related subcommands are deprecated. Support for\nthem will be removed in a future release. Please migrate your catalog workflows\nto the new file-based catalog format.\x1b[0m"
time="2022-03-21T18:53:11Z" level=info msg="export from the index" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="Pulling previous image registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10 to get metadata" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="running /host/usr/bin/podman pull registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="running /host/usr/bin/podman pull registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="Getting label data from previous image" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="running podman inspect" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:11Z" level=info msg="running podman create" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:12Z" level=info msg="running podman cp" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:13Z" level=info msg="running podman rm" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:14Z" level=info msg="Preparing to pull bundles map[\"registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436\":{\"sriov-fec\" \"2.1.0\"}]" index="registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10" package="[cluster-logging local-storage-operator performance-addon-operator ptp-operator sriov-fec sriov-network-operator]"
time="2022-03-21T18:53:14Z" level=info msg="running /host/usr/bin/podman pull registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436" img="registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436"
time="2022-03-21T18:53:14Z" level=info msg="running podman create" img="registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436"
time="2022-03-21T18:53:14Z" level=info msg="running podman cp" img="registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436"
time="2022-03-21T18:53:15Z" level=info msg="running podman rm" img="registry.connect.redhat.com/intel/sriov-fec-operator-bundle@sha256:93f8e87c2e4856aa727f1e28d7830b234179672ace9256f91fed11adda45b436"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/cluster-logging"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/local-storage-operator"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/performance-addon-operator"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/ptp-operator"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/sriov-fec"
time="2022-03-21T18:53:16Z" level=info msg="Writing package.yaml in /cache/downloaded/sriov-network-operator"
list index out of range
Traceback (most recent call last):
  File "/opt/precache/parse_index.py", line 76, in <module>
    bundles = extract_bundle_names(args)
  File "/opt/precache/parse_index.py", line 42, in extract_bundle_names
    item[1].strip()][0].strip(f"{pkg_name}.")
IndexError: list index out of range
upgrades.pre-cache 2022-03-21T18:53:16+00:00 DEBUG extract_pull_spec failed



# CGU:

[kni ~]$ oc get cgu -o yaml
apiVersion: v1
items:
- apiVersion: ran.openshift.io/v1alpha1
  kind: ClusterGroupUpgrade
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"ran.openshift.io/v1alpha1","kind":"ClusterGroupUpgrade","metadata":{"annotations":{},"name":"precache-operators","namespace":"default"},"spec":{"clusters":["ocp-edge87"],"enable":false,"managedPolicies":["du-upgrade-catsrc-policy","common-subscriptions-policy"],"preCaching":true,"remediationStrategy":{"maxConcurrency":101,"timeout":240}}}
    creationTimestamp: "2022-03-21T18:51:33Z"
    finalizers:
    - ran.openshift.io/cleanup-finalizer
    generation: 2
    name: precache-operators
    namespace: default
    resourceVersion: "7215644"
    uid: 9b55856e-da5e-4b51-8c71-9c73701961c6
  spec:
    actions:
      afterCompletion:
        deleteObjects: true
      beforeEnable: {}
    clusters:
    - ocp-edge87
    enable: false
    managedPolicies:
    - du-upgrade-catsrc-policy
    - common-subscriptions-policy
    preCaching: true
    remediationStrategy:
      maxConcurrency: 101
      timeout: 240
  status:
    computedMaxConcurrency: 1
    conditions:
    - lastTransitionTime: "2022-03-21T18:51:33Z"
      message: Precaching is not completed (required)
      reason: PrecachingRequired
      status: "False"
      type: Ready
    - lastTransitionTime: "2022-03-21T18:51:33Z"
      message: Precaching is required and not done
      reason: PrecachingNotDone
      status: "False"
      type: PrecachingDone
    - lastTransitionTime: "2022-03-21T18:51:33Z"
      message: Pre-caching spec is valid and consistent
      reason: PrecacheSpecIsWellFormed
      status: "True"
      type: PrecacheSpecValid
    managedPoliciesNs:
      common-subscriptions-policy: ztp-common
      du-upgrade-catsrc-policy: ztp-upgrade
    precaching:
      clusters:
      - ocp-edge87
      spec:
        operatorsIndexes:
        - registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/redhat-operators:v4.10
        - registry.hlxcl11.lab.eng.tlv2.redhat.com:5000/olm/far-edge-sriov-fec:v4.10
        operatorsPackagesAndChannels:
        - sriov-network-operator:4.10
        - ptp-operator:stable
        - performance-addon-operator:4.10
        - cluster-logging:stable
        - local-storage-operator:4.10
        - sriov-fec:stable
      status:
        ocp-edge87: UnrecoverableError
    status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 1 yliu1 2022-03-21 19:38:23 UTC
Update severity to low. The error was due to same policy container two catalog sources, one for fec operator and one for the other 5 operators.
Probably a better error should be returned, and it should be documented to guide user to split different catalog sources in its own policy, otherwise it will confuse the precaching code.

Comment 2 Vitaly Grinberg 2022-03-29 15:46:02 UTC
This bug was fixed in together with other issues in the main branch. I'm cherrypicking all the fixes in a single PR (above)

Comment 4 yliu1 2022-07-04 17:32:22 UTC
Verified with latest 4.10 TALM build. precaching with multiple catalog sources now succeeds.

Comment 7 errata-xmlrpc 2022-07-11 15:28:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.22 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5514


Note You need to log in before you can comment on or make changes to this bug.