Bug 2048825
| Summary: | Updateservice pod should be re-deployed when update graphDataImage of updateservice object | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | OpenShift Update Service | Assignee: | Pratik Mahajan <pmahajan> |
| OpenShift Update Service sub component: | operand | QA Contact: | liujia <jiajliu> |
| Status: | CLOSED DEFERRED | Docs Contact: | Kathryn Alexander <kalexand> |
| Severity: | medium | ||
| Priority: | medium | CC: | ableisch, kkarampo, lmohanty, wking |
| Version: | 4.6 | Keywords: | NeedsTestCase |
| Target Milestone: | --- | ||
| Target Release: | 4.13.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-09 01:33:13 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1939855 | ||
| Bug Blocks: | |||
Huh, comment 3 is a separate use case, and should get a new bug series. For this one, I was trying to fix: 1. Install the OpenShift Update Service operator. 2. Build graph-data image v1 and push to registry like registry.example.com/whatever/graph-data:latest. 3. Create an UpdateService from web-console or oc CLI to refer to graphDataImage with the by-tag pullspec (e.g. ...:latest). 4. Graphs pulled from the update service policyEngineURI have the v1 data. 5. Build graph-data image v2 and push to registry over the existing tag (e.g. registry.example.com/whatever/graph-data:latest). 6. Delete the update service pod. 7. The deployment controller will create a replacement pod. 8. Pull a graph from the update service policyEngineURI: a. Before this fix, the new pod might have found a cached-to-the-local v1 image for the by-tag pullspec, and returned v1 data. b. After this fix, the new pod will always have refetched the by-tag pullspec, and will return v2 data. Hi, Trevor Referring the description in the bug, the issue in this bug which reported in #1939855 is about comment 3. So i think it's not fixed yet. And for the fixed scenario you stated in comment 4, which should be the one in bz #2009651. I thought they have the same root cause, but it seems not. So i think we can get #2009651 un-duped and move the fix in that bug to track the issue in comment4, and re-open this one to continuously track the issue. hdty? Hi Lala Looks like Trevor is on vacation now. Could you help check above? Now the issue in bz #2048825 is not fixed yet(but attached in v4.9.1 ad https://errata.devel.redhat.com/advisory/87165). And the pr should now fix the issue in bz #2009651 which was duplicated with this bug. Based on comment 3,4,5, they should be two issues which is better to be tracked in two bugs. So should we re-active bz #2009651 and attach it into the ad #87165?And since bz #2048825 is still in the advisory too, do we plan to keep it in v4.9.1 release scope or drop it? Although the oc-mirror docs [1,2] don't seem to talk about this, oc-mirror generates an updateService.yaml manifest [3] which seems to link the graph-data image by digest [4,5]. So folks using that oc-mirror flow to both create the graph-data image and configure their UpdateService should be using by-digest references, and won't be impacted by this by-tag bug. [1]: https://docs.openshift.com/container-platform/4.11/installing/disconnected_install/installing-mirroring-disconnected.html [2]: https://docs.openshift.com/container-platform/4.11/updating/updating-restricted-network-cluster.html#update-mirror-repository-oc-mirror_updating-restricted-network-cluster [3]: https://github.com/openshift/oc-mirror/blob/159c5394ec582601b6e5a5685d8e5b09498ea24c/pkg/cli/mirror/manifests.go#L358 [4]: https://github.com/openshift/oc-mirror/blob/159c5394ec582601b6e5a5685d8e5b09498ea24c/pkg/cli/mirror/manifests.go#L258 [5]: https://github.com/openshift/oc-mirror/blob/159c5394ec582601b6e5a5685d8e5b09498ea24c/pkg/cli/mirror/cincinnati_graph_image.go#L114-L120 Re-summarizing the issue here: When graphDataImage is a by-tag pullspec, the update-service Deployment comes up with the by-tag reference, pulls whatever is fresh (since [1]) when a new pod comes up, but once pods are running, there is nothing polling or otherwise watching the registry to see if the content behind the tag has changed. Access to the registry can be complicated (mirror config, custom X.509 trust stores, network proxy config), and training new code to reliably jump through those hoops is tricky. Two possible approaches: However, Cincinnati already has a plugin to pull the graph-data content from a registry [2] and we know Cincinnati-to-registry access works fairly well, because that's the approach we use to list the locally-available release images today. We could presumably move from the init-container approach to the dkrv2_openshift_secondary_metadata_scraper approach, and have Cincinnati polling tags for us. If pivoting the Cincinnati approach is too much work, and we wanted a shorter-term fix, we could lean on CRI-O-to-registry access by having the update-service operator create a CronJob periodically launching Jobs with containers for each graph data image and no-op commands. It doesn't matter if the commands succeed or not. The operator could look at the pod specs to find the digests that CRI-O resolved, and inject that digest in our Deployment (actually tweaking the init-container pullspec, or just bumping an annotation, or whatever). Then the deployment controller would roll out new operand pods, and the PullAlways init containers would pick up the new content. [1]: https://github.com/openshift/cincinnati-operator/pull/142 [2]: https://github.com/openshift/cincinnati/blob/bb32fa311b97d3e363448dd59cc5c2b924514d1f/cincinnati/src/plugins/internal/graph_builder/dkrv2_openshift_secondary_metadata_scraper/plugin.rs OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9745 |
Version: cincinnati-container-v4.9.0-8 cincinnati-operator-bundle-container-v4.9.0-11 cincinnati-operator-container-v4.9.0-9 1. Install osus operator v4.9.0 on ocp v4.9 # ./oc get po NAME READY STATUS RESTARTS AGE updateservice-operator-74d8998684-gvcfg 1/1 Running 0 3m43s 2. Build graph-data image v1.0.0 and push to registry 3. Create updateservice instance with graph-data image v1.0.0 # ./oc get po NAME READY STATUS RESTARTS AGE test-6d4f6bc755-mrwmk 2/2 Running 0 42s updateservice-operator-74d8998684-gvcfg 1/1 Running 0 13m # ./oc get updateservice -ojson|jq .items[].spec { "graphDataImage": "quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0", "releases": "quay.io/openshift-release-dev/ocp-release", "replicas": 1 } 4. Build graph-data image v1.0.1 and push to registry again. 5. Update updateservice from web-console or cli to refer to graphDataImage v1.0.1 # ./oc get updateservice -ojson|jq .items[].spec { "graphDataImage": "quay.io/openshifttest/cincinnati-graph-data-container:v1.0.1", "releases": "quay.io/openshift-release-dev/ocp-release", "replicas": 1 } Result: But the updateservice pod was not re-deployed. # ./oc get po NAME READY STATUS RESTARTS AGE test-6d4f6bc755-mrwmk 2/2 Running 0 28m updateservice-operator-74d8998684-gvcfg 1/1 Running 0 41m # ./oc get deployment.apps/test -oyaml|grep -A3 initContainers: initContainers: - image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0 imagePullPolicy: Always name: graph-data # ./oc get po test-6d4f6bc755-mrwmk -oyaml|grep cincinnati-graph-data-container: - image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0 image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0