Bug 1756443
| Summary: | Race condition on releasing new images between "upgrade prepare" and "external-upgrade run --tags container_image_prepare" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Paras Babbar <pbabbar> | ||||
| Component: | openshift-heat-templates | Assignee: | Jiri Stransky <jstransk> | ||||
| Status: | CLOSED EOL | QA Contact: | RHOS Maint <rhos-maint> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 15.0 (Stein) | CC: | abishop, amoralej, athomas, dbecker, jpretori, jstransk, lbezdick, mburns, morazi, pbabbar, scollier, sgolovat, stchen | ||||
| Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-09-30 19:17:02 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1727807 | ||||||
| Attachments: |
|
||||||
|
Description
Paras Babbar
2019-09-27 16:12:29 UTC
As noted in the BZ description, this is something for DFG:DF (or maybe DFG:Upgrades ?) to comment on. While cinder's container image may have been involved, the problem does not seem to be with cinder itself. I went through the code paths today, and the initial suspicion seems correct. This is a race condition on TripleO actions vs. releasing new images, it can occur both on deployment and upgrade, but it is less likely during deployment because the code runs within a single command. First the mistral workflow (deployment [1] or update/upgrade [2]) calls an action to prepare container image parameters [3] and that action tries to dereference the latest images into specific (non-latest) tags and writes them into "environments/containers-default-parameters.yaml" file [4]. Then the image-prepare (the part which uploads images to undercloud) task runs, and it gets fed the same ContainerImagePrepare parameter [5] as the mistral action earlier, but it does not get fed the actual dereferenced values from the yaml file created earlier [4]. So it does its own dereferencing of the latest images before uploading them to the undercloud registry. The overcloud wants to use images which were dereferenced by the mistral workflow, but the undercloud contains images dereferenced by the image uploader. If some of the images were updated betweeen the parameter generation and the image upload, the overcloud will want to fetch different images than what undercloud offers, and deployment/upgrade can break. [1] https://github.com/openstack/tripleo-common/blob/58abba685e65441e52a1e577baa653dac6852fcc/workbooks/deployment.yaml#L168 [2] https://github.com/openstack/tripleo-common/blob/58abba685e65441e52a1e577baa653dac6852fcc/workbooks/package_update.yaml#L27 [3] https://github.com/openstack/tripleo-common/blob/58abba685e65441e52a1e577baa653dac6852fcc/tripleo_common/actions/container_images.py#L117 [4] https://github.com/openstack/tripleo-common/blob/58abba685e65441e52a1e577baa653dac6852fcc/tripleo_common/constants.py#L180-L181 [5] https://github.com/openstack/tripleo-heat-templates/blob/668588d73e6e40adb85e53c0000c078d84624837/deployment/container-image-prepare/container-image-prepare-baremetal-ansible.j2.yaml#L117-L128 ----- I'm adding DFG:DF into the whiteboard as the race condition is also present on deployment and the root cause lies within the image prepare mechanisms. Workaround is both in deployment/upgrade cases to re-run the failed command(s) again. In upgrade case the race condition potential is larger, the scope could be reduced (make it the same as during deployment) by essentially running `external-upgrade run --tags container_image_prepare` from within `upgrade prepare`, e.g. by adding `--container-image-prepare` parameter. Closing EOL, OSP 15 has been retired as of Sept 19 |