Bug 2009651

Summary: OUS uses wrong imagePullPolicy for graph-data initContainer
Product: OpenShift Container Platform Reporter: Andreas Bleischwitz <ableisch>
Component: OpenShift Update ServiceAssignee: W. Trevor King <wking>
OpenShift Update Service sub component: operator QA Contact: Yang Yang <yanyang>
Status: CLOSED ERRATA Docs Contact: Kathryn Alexander <kalexand>
Severity: high    
Priority: medium CC: lmohanty, wking, yanyang
Version: 4.6Keywords: Reopened
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The update service deployment used pullIfNotPresent for the graph-data container. Consequence: When the configured UpdateService graphDataImage used a by-tag pullspec, and the host node contained a cached image for that pullspec, new update service pods might not notice updated graph data and could continue to serve old graph data. Fix: The graph-data container is now pullAlways. Result: New update service pods will always retrieve fresh graph data, even when graphDataImage is configured with a by-tag pullspec.
Story Points: ---
Clone Of:
: 2055460 (view as bug list) Environment:
Last Closed: 2023-03-09 11:30:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2055460    

Description Andreas Bleischwitz 2021-10-01 07:24:29 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Current definition of a UpdateService deployment has "IfNotPresent" as a imagePullPolicy defined, which blocks updated graph-data images to be used. "Always" should be used for the initContainer.

      initContainers:
        - name: graph-data
          image: 'registry.example.com/ocp4/graph-data:latest'
          resources: {}
          volumeMounts:
            - name: cincinnati-graph-data
              mountPath: /var/lib/cincinnati/graph-data
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent      <== should be Always


Version of all relevant components (if applicable):

update-service-operator.v4.6.0
registry.redhat.io/openshift-update-service/openshift-update-service-rhel8-operator@sha256:08b4fc72501e5f7dfdd779e101d98b913d77982c021998f47b3cdc0367d7e0fa
registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:9748a280f2a04524da739d2f8b7d8a74c5b58170c9c40b3e40904dd8ca39fbe8

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

The initial deployment of a update-service will pull an graph-data image tagged "latest" - but subsequent updates to that image will not get pulled as this tag already exists on a node. The only way to mitigate that effect is to use different tags for the graph-data images. This on the other hand requires additional adjustments made to the update-service CR.

Is there any workaround available to the best of your knowledge?
Changing graph-data image tag from "latest" to a incremental one.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. install and deploy OUS according to the docs
2. create graph-data image as described in the docs
3. verify OUS functionality, rendering data of graph-data at that time.
4. whenever a update version of the update graph is available, update the graph-data image
5. OUS won't make use of updated graph-data image as <image>:latest is already on the node - even though it changed.


Actual results:
Outdated graph-data image will get used.

Expected results:
graph-data image is pulled every time to maintain current data.

Additional info:

Comment 1 Andreas Bleischwitz 2021-10-01 07:47:25 UTC
If I'm not looking at the wrong point, the following needs to get changed:

https://github.com/openshift/cincinnati-operator/blob/f4aae3fdaaf8174f2159437f796b775e9b7efe52/controllers/new.go#L425

		Image:           instance.Spec.GraphDataImage,
		ImagePullPolicy: corev1.PullIfNotPresent,
		VolumeMounts: []corev1.VolumeMount{

to
		Image:           instance.Spec.GraphDataImage,
		ImagePullPolicy: corev1.Always,
		VolumeMounts: []corev1.VolumeMount{

Or remove the line containing "ImagePullPolicy:" completely as the default is:

// Defaults to Always if :latest tag is specified, or IfNotPresent otherwise.

Which probably is the best option for this image.

Comment 2 liujia 2021-10-08 01:33:56 UTC
A known issue? Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1939855.

Comment 3 W. Trevor King 2021-10-08 01:49:00 UTC
hah, yes.  I'll close this one as a dup and attach my PR to the older issue.

*** This bug has been marked as a duplicate of bug 1939855 ***

Comment 4 Lalatendu Mohanty 2022-02-17 02:46:31 UTC
As per comment in https://bugzilla.redhat.com/show_bug.cgi?id=2048825#c4 , looks like this bug is fixed by https://github.com/openshift/cincinnati-operator/pull/133

Comment 6 Yang Yang 2023-02-06 08:44:55 UTC
Verifying on cincinnati-container-v5.0.1-3, cincinnati-operator-container-v5.0.1-3 and cincinnati-operator-bundle-container-v5.0.1-1

# oc get pod sample-856879d565-kmhfh -oyaml
...
initContainers:
  - image: quay.io/openshifttest/graph-data:5.0.1
    imagePullPolicy: Always
    name: graph-data
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000630000
...

The graph-data image has "Always" as imagePullPolicy.

Delete the UpdateService pod

# oc get pod
NAME                                      READY   STATUS    RESTARTS   AGE
sample-856879d565-hxxnr                   2/2     Running   0          2m16s
updateservice-operator-85758c57bb-977v5   1/1     Running   0          95m

# oc delete pod sample-856879d565-hxxnr
pod "sample-856879d565-hxxnr" deleted

# oc get pod
NAME                                      READY   STATUS    RESTARTS   AGE
sample-856879d565-gtmft                   2/2     Running   0          11s
updateservice-operator-85758c57bb-977v5   1/1     Running   0          96m

# oc describe pod sample-856879d565-gtmft
Events:
  Type     Reason          Age        From               Message
  ----     ------          ----       ----               -------
  Normal   Scheduled       <invalid>  default-scheduler  Successfully assigned openshift-update-service/sample-856879d565-gtmft to yanyang-0206b-mrjl7-worker-a-hqc96.c.openshift-qe.internal
  Normal   AddedInterface  <invalid>  multus             Add eth0 [10.129.2.15/23] from openshift-sdn
  Normal   Pulling         <invalid>  kubelet            Pulling image "quay.io/openshifttest/graph-data:5.0.1"
  Normal   Pulled          <invalid>  kubelet            Successfully pulled image "quay.io/openshifttest/graph-data:5.0.1" in 806.952415ms
  Normal   Created         <invalid>  kubelet            Created container graph-data
  Normal   Started         <invalid>  kubelet            Started container graph-data
  Normal   Pulled          <invalid>  kubelet            Container image "registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:e1f2095b56a9c942906510a988af30b3bf9537e5de5cc247c0f8e77ce8b9fc3f" already present on machine
  Normal   Created         <invalid>  kubelet            Created container graph-builder
  Normal   Started         <invalid>  kubelet            Started container graph-builder
  Normal   Pulled          <invalid>  kubelet            Container image "registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:e1f2095b56a9c942906510a988af30b3bf9537e5de5cc247c0f8e77ce8b9fc3f" already present on machine
  Normal   Created         <invalid>  kubelet            Created container policy-engine
  Normal   Started         <invalid>  kubelet            Started container policy-engine
  Warning  ProbeError      <invalid>  kubelet            Liveness probe error: Get "http://10.129.2.15:9081/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
body:
  Warning  Unhealthy  <invalid>  kubelet  Liveness probe failed: Get "http://10.129.2.15:9081/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The new pod has graph-data pulled. It looks good to me.

Comment 9 Yang Yang 2023-02-09 02:19:27 UTC
Based on comment#6, moving it to verified state.

Comment 11 errata-xmlrpc 2023-03-09 11:30:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHEA: OSUS Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:1161