Bug 1494107
Summary: | OSP11 -> OSP12 upgrade: libvirtd service on compute nodes gets stopped during major-upgrade-composable-steps-docker.yaml | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> | ||||
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 12.0 (Pike) | CC: | dbecker, jschluet, mandreou, mbracho, mbultel, mburns, morazi, rhel-osp-director-maint, shardy, tvignaud | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | 12.0 (Pike) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-tripleo-heat-templates-7.0.3-0.20171023134947.8da5e1f.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-13 22:11:04 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1399762 | ||||||
Attachments: |
|
Description
Marius Cornea
2017-09-21 13:05:13 UTC
o/ Marius spent some time looking at this one. Going to mark as triaged and adding some thoughts so I can point others to it. To confirm, this should be happening on all upgrades right now and it shouldn't be confined to any one environment right? --> It is the deployment_steps (host_prep_tasks specifically afaics) that are being executed on the computes, not the upgrade_tasks. There is indeed a task that stops libvirtd here https://github.com/openstack/tripleo-heat-templates/blob/420126fd98193f755562887603f604ca5fd53175/docker/services/nova-libvirt.yaml#L288-L295 --> I think the roles_data disable_upgrade_deployment flag is being set correctly in the environment because both computes (and no other nodes) got the /root/tripleo_upgrade_node.sh delivered. https://github.com/openstack/tripleo-heat-templates/blob/420126fd98193f755562887603f604ca5fd53175/common/major_upgrade_steps.j2.yaml#L41-L57 --> Suspect the problem is here https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L6 but not sure why since enabled_roles should be set https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/post-upgrade.j2.yaml#L3 which just then includes the deploy-steps.j2 ... Created attachment 1329633 [details]
ansible-playbook invocations from journal on compute 0 and compute 1
I think this is caused by https://review.openstack.org/#/c/502470/4/common/deploy-steps.j2 We made that change so the json files would be written to the nodes, and the RoleConfig output would be generated for all roles, even when upgrade is disabled. But I missed that we'll then run host_prep_tasks even on nodes where upgrade is disabled, so we need to decouple that from the other tasks (which just write data that is later consumed by the ansible driven upgrade). To clarify, I think to fix this we need to decouple host_prep_tasks here: https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 So we can make them not run on nodes where upgrade is disabled but we need to decide if that means they never get run on upgrade (in which case there may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) or if we make them run via the operator driven upgrade script. (In reply to Steven Hardy from comment #4) > To clarify, I think to fix this we need to decouple host_prep_tasks here: > > https://github.com/openstack/tripleo-heat-templates/blob/ > fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 > > So we can make them not run on nodes where upgrade is disabled but we need > to decide if that means they never get run on upgrade (in which case there > may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) > or if we make them run via the operator driven upgrade script. o/ I just posted this wdyt? https://review.openstack.org/507524 (In reply to marios from comment #5) > (In reply to Steven Hardy from comment #4) > > To clarify, I think to fix this we need to decouple host_prep_tasks here: > > > > https://github.com/openstack/tripleo-heat-templates/blob/ > > fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 > > > > So we can make them not run on nodes where upgrade is disabled but we need > > to decide if that means they never get run on upgrade (in which case there > > may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) > > or if we make them run via the operator driven upgrade script. > > o/ I just posted this wdyt? https://review.openstack.org/507524 I don't think that will work as is, thinking about it just now. We *do* want those to be included normally, just not on upgrade. SO the disable_upgrade_deployment is not the right check to make there. We need to know if it is upgrade. WIll update the review I think you are out today anyway thanks shardy not yet merged on Pike so moving back ASSIGNED and updating trackers Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |