Description of problem: This issue shows up on a disconnected environment which initially didn't have an ImageContentSourcePolicy including the mirror for operator images set. The InstallPlan initially shows the following failure lastTransitionTime: "2021-07-21T18:26:32Z" message: 'unpack job not completed: Unpack pod(openshift-marketplace/8448a620ab041469e30d9ce22dc6be76a624b8834ad8af66f50105cd73kxk6m) container(pull) is pending. Reason: ImagePullBackOff, Message: Back-off pulling image "registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle@sha256:9d93eb2a6f2cf7ba466784f72cf782e8c99921b43704f06dd493364aed95ace7"' reason: JobIncomplete status: "True" type: BundleLookupPending After creating the ImageContentSourcePolicy at 2021-07-21T18:27:00Z including the correct mirror for registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle@sha256:9d93eb2a6f2cf7ba466784f72cf782e8c99921b43704f06dd493364aed95ace7 : apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: creationTimestamp: "2021-07-21T18:27:00Z" generation: 1 name: redhat-internal-icsp resourceVersion: "2182653" uid: 550a03f4-5381-4014-b5aa-273adb787da2 spec: repositoryDigestMirrors: - mirrors: - registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000 source: registry.redhat.io - mirrors: - registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000 source: registry-proxy.engineering.redhat.com - mirrors: - registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000 source: registry.stage.redhat.io - mirrors: - registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/localimages/local-release-image source: registry.ci.openshift.org/ocp/release the InstallPlan doesn't progress and it eventually shows 'Job was active longer than specified deadline' at 2021-07-21T18:41:45Z apiVersion: v1 items: - apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan metadata: creationTimestamp: "2021-07-21T18:26:32Z" generateName: install- generation: 1 labels: operators.coreos.com/sriov-network-operator.openshift-sriov-network-operator: "" name: install-449sn namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: sriov-network-operator-subscription uid: 2d61735d-22f1-444e-91e5-d343f4fd6c12 resourceVersion: "2188022" uid: 142a8acf-53c9-45dc-a709-7e3ca7c75efe spec: approval: Automatic approved: true clusterServiceVersionNames: - sriov-network-operator.4.8.0-202107081650 generation: 1 status: bundleLookups: - catalogSourceRef: name: sriov-network-operator namespace: openshift-marketplace conditions: - message: bundle contents have not yet been persisted to installplan status reason: BundleNotUnpacked status: "True" type: BundleLookupNotPersisted - lastTransitionTime: "2021-07-21T18:26:32Z" message: 'unpack job not completed: Unpack pod(openshift-marketplace/8448a620ab041469e30d9ce22dc6be76a624b8834ad8af66f50105cd73kxk6m) container(pull) is pending. Reason: ImagePullBackOff, Message: Back-off pulling image "registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle@sha256:9d93eb2a6f2cf7ba466784f72cf782e8c99921b43704f06dd493364aed95ace7"' reason: JobIncomplete status: "True" type: BundleLookupPending - lastTransitionTime: "2021-07-21T18:41:45Z" message: Job was active longer than specified deadline reason: DeadlineExceeded status: "True" type: BundleLookupFailed identifier: sriov-network-operator.4.8.0-202107081650 path: registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle@sha256:9d93eb2a6f2cf7ba466784f72cf782e8c99921b43704f06dd493364aed95ace7 properties: '{"properties":[{"type":"olm.gvk","value":{"group":"sriovnetwork.openshift.io","kind":"SriovIBNetwork","version":"v1"}},{"type":"olm.gvk","value":{"group":"sriovnetwork.openshift.io","kind":"SriovNetwork","version":"v1"}},{"type":"olm.gvk","value":{"group":"sriovnetwork.openshift.io","kind":"SriovNetworkNodePolicy","version":"v1"}},{"type":"olm.gvk","value":{"group":"sriovnetwork.openshift.io","kind":"SriovNetworkNodeState","version":"v1"}},{"type":"olm.gvk","value":{"group":"sriovnetwork.openshift.io","kind":"SriovOperatorConfig","version":"v1"}},{"type":"olm.package","value":{"packageName":"sriov-network-operator","version":"4.8.0-202107081650"}}]}' replaces: "" catalogSources: [] conditions: - lastTransitionTime: "2021-07-21T18:41:47Z" lastUpdateTime: "2021-07-21T18:41:47Z" message: 'Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline' reason: InstallCheckFailed status: "False" type: Installed phase: Failed kind: List metadata: resourceVersion: "" selfLink: "" Version-Release number of selected component (if applicable): 4.8.0-rc.3 How reproducible: 100% Steps to Reproduce: 1. Create a catalogsource and subscription which generate an installplan that tries to pull a bundle image which is not reachable(in this case it was due to missing ICSP with the correct mirror) 2. Make the bundle image reachable Actual results: InstallPlan doesn't progress and stops at lastUpdateTime: "2021-07-21T18:41:47Z" message: 'Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline' reason: InstallCheckFailed Expected results: InstallPlan retries pulling the bundle image Additional info: To get it progressing I had to delete the existing catalogsource pod and installplan.
Hi Marius, This is actually expected behavior. Installplans are not a declarative resource, they are a definition of an execution that runs on a cluster (similar to a job). They will perform a certain number of retries in certain specific cases, but once they exceed their retry limit (on the order of ~seconds) they will go into a permanent failed state, and there are some failures that they cannot recover from.