Bug 2009651
| Summary: | OUS uses wrong imagePullPolicy for graph-data initContainer | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andreas Bleischwitz <ableisch> | |
| Component: | OpenShift Update Service | Assignee: | W. Trevor King <wking> | |
| OpenShift Update Service sub component: | operator | QA Contact: | Yang Yang <yanyang> | |
| Status: | CLOSED ERRATA | Docs Contact: | Kathryn Alexander <kalexand> | |
| Severity: | high | |||
| Priority: | medium | CC: | lmohanty, wking, yanyang | |
| Version: | 4.6 | Keywords: | Reopened | |
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | x86_64 | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: The update service deployment used pullIfNotPresent for the graph-data container.
Consequence: When the configured UpdateService graphDataImage used a by-tag pullspec, and the host node contained a cached image for that pullspec, new update service pods might not notice updated graph data and could continue to serve old graph data.
Fix: The graph-data container is now pullAlways.
Result: New update service pods will always retrieve fresh graph data, even when graphDataImage is configured with a by-tag pullspec.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2055460 (view as bug list) | Environment: | ||
| Last Closed: | 2023-03-09 11:30:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2055460 | |||
If I'm not looking at the wrong point, the following needs to get changed: https://github.com/openshift/cincinnati-operator/blob/f4aae3fdaaf8174f2159437f796b775e9b7efe52/controllers/new.go#L425 Image: instance.Spec.GraphDataImage, ImagePullPolicy: corev1.PullIfNotPresent, VolumeMounts: []corev1.VolumeMount{ to Image: instance.Spec.GraphDataImage, ImagePullPolicy: corev1.Always, VolumeMounts: []corev1.VolumeMount{ Or remove the line containing "ImagePullPolicy:" completely as the default is: // Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. Which probably is the best option for this image. A known issue? Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1939855. hah, yes. I'll close this one as a dup and attach my PR to the older issue. *** This bug has been marked as a duplicate of bug 1939855 *** As per comment in https://bugzilla.redhat.com/show_bug.cgi?id=2048825#c4 , looks like this bug is fixed by https://github.com/openshift/cincinnati-operator/pull/133 Verifying on cincinnati-container-v5.0.1-3, cincinnati-operator-container-v5.0.1-3 and cincinnati-operator-bundle-container-v5.0.1-1
# oc get pod sample-856879d565-kmhfh -oyaml
...
initContainers:
- image: quay.io/openshifttest/graph-data:5.0.1
imagePullPolicy: Always
name: graph-data
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000630000
...
The graph-data image has "Always" as imagePullPolicy.
Delete the UpdateService pod
# oc get pod
NAME READY STATUS RESTARTS AGE
sample-856879d565-hxxnr 2/2 Running 0 2m16s
updateservice-operator-85758c57bb-977v5 1/1 Running 0 95m
# oc delete pod sample-856879d565-hxxnr
pod "sample-856879d565-hxxnr" deleted
# oc get pod
NAME READY STATUS RESTARTS AGE
sample-856879d565-gtmft 2/2 Running 0 11s
updateservice-operator-85758c57bb-977v5 1/1 Running 0 96m
# oc describe pod sample-856879d565-gtmft
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <invalid> default-scheduler Successfully assigned openshift-update-service/sample-856879d565-gtmft to yanyang-0206b-mrjl7-worker-a-hqc96.c.openshift-qe.internal
Normal AddedInterface <invalid> multus Add eth0 [10.129.2.15/23] from openshift-sdn
Normal Pulling <invalid> kubelet Pulling image "quay.io/openshifttest/graph-data:5.0.1"
Normal Pulled <invalid> kubelet Successfully pulled image "quay.io/openshifttest/graph-data:5.0.1" in 806.952415ms
Normal Created <invalid> kubelet Created container graph-data
Normal Started <invalid> kubelet Started container graph-data
Normal Pulled <invalid> kubelet Container image "registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:e1f2095b56a9c942906510a988af30b3bf9537e5de5cc247c0f8e77ce8b9fc3f" already present on machine
Normal Created <invalid> kubelet Created container graph-builder
Normal Started <invalid> kubelet Started container graph-builder
Normal Pulled <invalid> kubelet Container image "registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:e1f2095b56a9c942906510a988af30b3bf9537e5de5cc247c0f8e77ce8b9fc3f" already present on machine
Normal Created <invalid> kubelet Created container policy-engine
Normal Started <invalid> kubelet Started container policy-engine
Warning ProbeError <invalid> kubelet Liveness probe error: Get "http://10.129.2.15:9081/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
body:
Warning Unhealthy <invalid> kubelet Liveness probe failed: Get "http://10.129.2.15:9081/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
The new pod has graph-data pulled. It looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHEA: OSUS Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:1161 |
Description of problem (please be detailed as possible and provide log snippests): Current definition of a UpdateService deployment has "IfNotPresent" as a imagePullPolicy defined, which blocks updated graph-data images to be used. "Always" should be used for the initContainer. initContainers: - name: graph-data image: 'registry.example.com/ocp4/graph-data:latest' resources: {} volumeMounts: - name: cincinnati-graph-data mountPath: /var/lib/cincinnati/graph-data terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent <== should be Always Version of all relevant components (if applicable): update-service-operator.v4.6.0 registry.redhat.io/openshift-update-service/openshift-update-service-rhel8-operator@sha256:08b4fc72501e5f7dfdd779e101d98b913d77982c021998f47b3cdc0367d7e0fa registry.redhat.io/openshift-update-service/openshift-update-service-rhel8@sha256:9748a280f2a04524da739d2f8b7d8a74c5b58170c9c40b3e40904dd8ca39fbe8 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? The initial deployment of a update-service will pull an graph-data image tagged "latest" - but subsequent updates to that image will not get pulled as this tag already exists on a node. The only way to mitigate that effect is to use different tags for the graph-data images. This on the other hand requires additional adjustments made to the update-service CR. Is there any workaround available to the best of your knowledge? Changing graph-data image tag from "latest" to a incremental one. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. install and deploy OUS according to the docs 2. create graph-data image as described in the docs 3. verify OUS functionality, rendering data of graph-data at that time. 4. whenever a update version of the update graph is available, update the graph-data image 5. OUS won't make use of updated graph-data image as <image>:latest is already on the node - even though it changed. Actual results: Outdated graph-data image will get used. Expected results: graph-data image is pulled every time to maintain current data. Additional info: