Bug 2208563

Summary: The ocs-operator uses the `Always` pull policy for images pulled by digest
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Juan Hernández <juan.hernandez>
Component: ocs-operatorAssignee: Juan Hernández <juan.hernandez>
Status: MODIFIED --- QA Contact: Coady LaCroix <clacroix>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.12CC: muagarwa, nigoyal, odf-bz-bot, uchapaga, wking
Target Milestone: ---   
Target Release: ODF 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Hernández 2023-05-19 14:48:40 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Deploying ODF in OpenShift via the official `redhat-operators` catalog source results in pods that pull images by digest and use the `Always` pull policy, in particular the `ocs-operator` pods:

$ oc get deployment -n openshift-storage ocs-operator -o json | jq -r '.spec.template.spec.containers[] | .image + " " + .imagePullPolicy'
  registry.redhat.io/odf4/ocs-rhel8-operator@sha256:246dd606caeb609501fb0739b34de2010917d66a88ceff265cbfa6711299485d Always

Version of all relevant components (if applicable):

# oc get csv -n openshift-storage ocs-operator.v4.12.2-rhodf
NAME                         DISPLAY                       VERSION        REPLACES               PHASE
ocs-operator.v4.12.2-rhodf   OpenShift Container Storage   4.12.2-rhodf   ocs-operator.v4.12.1   Succeeded

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

It impacts the ability to upgrade a cluster in a fully disconnected environment without a image registry server. In that scenario the operator will not start even if the required image has already been pulled and is available in the container storage directory of the node.

Is there any workaround available to the best of your knowledge?

The workaround is to install a image registry server in the same cluster. This introduces a dependency cycle, because the registry (Quay in our case) will most probably require this operator working in order to use its storage. That cycle is eventually broken by repeated reconciliations, but it generates additional noise during the upgrade, and delays it.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?

Yes.

Can this issue reproduce from the UI?

Didn't test with the UI.

If this is a regression, please provide more details to justify this:

It isn't a regression.

Steps to Reproduce:

1. Install a cluster, and then ODF.
2. Check the image pull policy of the pods.

Actual results:

The image pull policy is `Always`.

Expected results:

The image pull policy should be `IfNotPresent`.

Additional info:

This is a request to use the `IfNotPresent` image pull policy when the images are pulled by digest. That simplifies upgrades in disconnected environments because then it is possible pre-pull the image in all the nodes and perform the upgrade without having a registry server available.

Comment 2 Juan Hernández 2023-05-19 14:58:38 UTC
A possible way to address this issue would be to change the `cvs-merger` tool so that it sets the image pull policy to `IfNotPresent` when the images are pulled by digest. That is implemented in this pull request:

Use IfNotPresent pull policy for images pulled by digest #2056 
https://github.com/red-hat-storage/ocs-operator/pull/2056