Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2048825

Summary: Updateservice pod should be re-deployed when update graphDataImage of updateservice object
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: OpenShift Update ServiceAssignee: Pratik Mahajan <pmahajan>
OpenShift Update Service sub component: operand QA Contact: liujia <jiajliu>
Status: CLOSED DEFERRED Docs Contact: Kathryn Alexander <kalexand>
Severity: medium    
Priority: medium CC: ableisch, kkarampo, lmohanty, wking
Version: 4.6Keywords: NeedsTestCase
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:33:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1939855    
Bug Blocks:    

Comment 3 liujia 2022-02-09 02:46:13 UTC
Version:
cincinnati-container-v4.9.0-8
cincinnati-operator-bundle-container-v4.9.0-11
cincinnati-operator-container-v4.9.0-9

1. Install osus operator v4.9.0 on ocp v4.9
# ./oc get po
NAME                                      READY   STATUS    RESTARTS   AGE
updateservice-operator-74d8998684-gvcfg   1/1     Running   0          3m43s
2. Build graph-data image v1.0.0 and push to registry

3. Create updateservice instance with graph-data image v1.0.0
# ./oc get po
NAME                                      READY   STATUS    RESTARTS   AGE
test-6d4f6bc755-mrwmk                     2/2     Running   0          42s
updateservice-operator-74d8998684-gvcfg   1/1     Running   0          13m
# ./oc get updateservice -ojson|jq .items[].spec
{
  "graphDataImage": "quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0",
  "releases": "quay.io/openshift-release-dev/ocp-release",
  "replicas": 1
}
4. Build graph-data image v1.0.1 and push to registry again.
5. Update updateservice from web-console or cli to refer to graphDataImage v1.0.1
# ./oc get updateservice -ojson|jq .items[].spec
{
  "graphDataImage": "quay.io/openshifttest/cincinnati-graph-data-container:v1.0.1",
  "releases": "quay.io/openshift-release-dev/ocp-release",
  "replicas": 1
}

Result:
But the updateservice pod was not re-deployed.
# ./oc get po
NAME                                      READY   STATUS    RESTARTS   AGE
test-6d4f6bc755-mrwmk                     2/2     Running   0          28m
updateservice-operator-74d8998684-gvcfg   1/1     Running   0          41m
# ./oc get deployment.apps/test -oyaml|grep -A3 initContainers:
      initContainers:
      - image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0
        imagePullPolicy: Always
        name: graph-data

# ./oc get po test-6d4f6bc755-mrwmk -oyaml|grep cincinnati-graph-data-container:
  - image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0
    image: quay.io/openshifttest/cincinnati-graph-data-container:v1.0.0

Comment 4 W. Trevor King 2022-02-12 04:28:04 UTC
Huh, comment 3 is a separate use case, and should get a new bug series.  For this one, I was trying to fix:

1. Install the OpenShift Update Service operator.
2. Build graph-data image v1 and push to registry like registry.example.com/whatever/graph-data:latest.
3. Create an UpdateService from web-console or oc CLI to refer to graphDataImage with the by-tag pullspec (e.g. ...:latest).
4. Graphs pulled from the update service policyEngineURI have the v1 data.
5. Build graph-data image v2 and push to registry over the existing tag (e.g. registry.example.com/whatever/graph-data:latest).
6. Delete the update service pod.
7. The deployment controller will create a replacement pod.
8. Pull a graph from the update service policyEngineURI:
   a. Before this fix, the new pod might have found a cached-to-the-local v1 image for the by-tag pullspec, and returned v1 data.
   b. After this fix, the new pod will always have refetched the by-tag pullspec, and will return v2 data.

Comment 5 liujia 2022-02-14 01:20:18 UTC
Hi, Trevor

Referring the description in the bug, the issue in this bug which reported in #1939855 is about comment 3. So i think it's not fixed yet. And for the fixed scenario you stated in comment 4, which should be the one in bz #2009651. I thought they have the same root cause, but it seems not. So i think we can get #2009651 un-duped and move the fix in that bug to track the issue in comment4, and re-open this one to continuously track the issue. hdty?

Comment 6 liujia 2022-02-14 03:29:58 UTC
Hi Lala

Looks like Trevor is on vacation now. Could you help check above? Now the issue in bz #2048825 is not fixed yet(but attached in v4.9.1 ad https://errata.devel.redhat.com/advisory/87165). And the pr should now fix the issue in bz #2009651 which was duplicated with this bug. Based on comment 3,4,5, they should be two issues which is better to be tracked in two bugs. So should we re-active bz #2009651 and attach it into the ad #87165?And since bz #2048825 is still in the advisory too, do we plan to keep it in v4.9.1 release scope or drop it?

Comment 9 W. Trevor King 2023-01-05 18:26:34 UTC
Re-summarizing the issue here:

When graphDataImage is a by-tag pullspec, the update-service Deployment comes up with the by-tag reference, pulls whatever is fresh (since [1]) when a new pod comes up, but once pods are running, there is nothing polling or otherwise watching the registry to see if the content behind the tag has changed.  Access to the registry can be complicated (mirror config, custom X.509 trust stores, network proxy config), and training new code to reliably jump through those hoops is tricky.

Two possible approaches:

However, Cincinnati already has a plugin to pull the graph-data content from a registry [2] and we know Cincinnati-to-registry access works fairly well, because that's the approach we use to list the locally-available release images today.  We could presumably move from the init-container approach to the dkrv2_openshift_secondary_metadata_scraper approach, and have Cincinnati polling tags for us.

If pivoting the Cincinnati approach is too much work, and we wanted a shorter-term fix, we could lean on CRI-O-to-registry access by having the update-service operator create a CronJob periodically launching Jobs with containers for each graph data image and no-op commands.  It doesn't matter if the commands succeed or not.  The operator could look at the pod specs to find the digests that CRI-O resolved, and inject that digest in our Deployment (actually tweaking the init-container pullspec, or just bumping an annotation, or whatever).  Then the deployment controller would roll out new operand pods, and the PullAlways init containers would pick up the new content.

[1]: https://github.com/openshift/cincinnati-operator/pull/142
[2]: https://github.com/openshift/cincinnati/blob/bb32fa311b97d3e363448dd59cc5c2b924514d1f/cincinnati/src/plugins/internal/graph_builder/dkrv2_openshift_secondary_metadata_scraper/plugin.rs

Comment 10 Shiftzilla 2023-03-09 01:33:13 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9745