Bug 2208563

Summary: The ocs-operator uses the `Always` pull policy for images pulled by digest
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Juan Hernández <juan.hernandez>
Component: ocs-operatorAssignee: Juan Hernández <juan.hernandez>
Status: CLOSED ERRATA QA Contact: Coady LaCroix <clacroix>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.12CC: muagarwa, nigoyal, odf-bz-bot, uchapaga, wking
Target Milestone: ---   
Target Release: ODF 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.14.0-111 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-08 18:50:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Hernández 2023-05-19 14:48:40 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Deploying ODF in OpenShift via the official `redhat-operators` catalog source results in pods that pull images by digest and use the `Always` pull policy, in particular the `ocs-operator` pods:

$ oc get deployment -n openshift-storage ocs-operator -o json | jq -r '.spec.template.spec.containers[] | .image + " " + .imagePullPolicy'
  registry.redhat.io/odf4/ocs-rhel8-operator@sha256:246dd606caeb609501fb0739b34de2010917d66a88ceff265cbfa6711299485d Always

Version of all relevant components (if applicable):

# oc get csv -n openshift-storage ocs-operator.v4.12.2-rhodf
NAME                         DISPLAY                       VERSION        REPLACES               PHASE
ocs-operator.v4.12.2-rhodf   OpenShift Container Storage   4.12.2-rhodf   ocs-operator.v4.12.1   Succeeded

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

It impacts the ability to upgrade a cluster in a fully disconnected environment without a image registry server. In that scenario the operator will not start even if the required image has already been pulled and is available in the container storage directory of the node.

Is there any workaround available to the best of your knowledge?

The workaround is to install a image registry server in the same cluster. This introduces a dependency cycle, because the registry (Quay in our case) will most probably require this operator working in order to use its storage. That cycle is eventually broken by repeated reconciliations, but it generates additional noise during the upgrade, and delays it.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?

Yes.

Can this issue reproduce from the UI?

Didn't test with the UI.

If this is a regression, please provide more details to justify this:

It isn't a regression.

Steps to Reproduce:

1. Install a cluster, and then ODF.
2. Check the image pull policy of the pods.

Actual results:

The image pull policy is `Always`.

Expected results:

The image pull policy should be `IfNotPresent`.

Additional info:

This is a request to use the `IfNotPresent` image pull policy when the images are pulled by digest. That simplifies upgrades in disconnected environments because then it is possible pre-pull the image in all the nodes and perform the upgrade without having a registry server available.

Comment 2 Juan Hernández 2023-05-19 14:58:38 UTC
A possible way to address this issue would be to change the `cvs-merger` tool so that it sets the image pull policy to `IfNotPresent` when the images are pulled by digest. That is implemented in this pull request:

Use IfNotPresent pull policy for images pulled by digest #2056 
https://github.com/red-hat-storage/ocs-operator/pull/2056

Comment 7 Coady LaCroix 2023-09-28 19:39:55 UTC
Deployed ODF 4.14.0 and verified that the ocs-operator imagePullPolicy is `IfNotPresent`:

$ oc get deployment -n openshift-storage ocs-operator -o json | jq -r '.spec.template.spec.containers[] | .image + " " + .imagePullPolicy'
registry.redhat.io/odf4/ocs-rhel9-operator@sha256:bf72ea7fcf5b1145ef369304296bfffa1e62146f38899fb6cf63d971d13f9ec5 IfNotPresent

$ oc get csv -n openshift-storage ocs-operator.v4.14.0-139.stable
NAME                              DISPLAY                       VERSION             REPLACES   PHASE
ocs-operator.v4.14.0-139.stable   OpenShift Container Storage   4.14.0-139.stable              Succeeded

Comment 9 errata-xmlrpc 2023-11-08 18:50:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832