Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: libvirtd service on compute nodes gets stopped during major-upgrade-composable-steps-docker.yaml major-upgrade-composable-steps-docker.yaml should not touch the services running on compute nodes as this role has the disable_upgrade_deployment: True set. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.0-0.20170913050524.0rc2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with 3 controller + 2 compute + 3 ceph nodes 2. Run first step of the overcloud upgrade to OSP12 - major-upgrade-composable-steps-docker.yaml #!/bin/bash timeout 100m openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services-docker/sahara.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/ceph-ansible-env.yaml \ -e /home/stack/docker-osp12.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \ 3. Check status of libvirtd service on compute nodes Actual results: (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.16 'sudo systemctl status libvirtd' Warning: Permanently added '192.168.24.16' (ECDSA) to the list of known hosts. ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; disabled; vendor preset: enabled) Active: inactive (dead) since Thu 2017-09-21 12:42:29 UTC; 21min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 19535 (code=exited, status=0/SUCCESS) Sep 21 11:02:10 compute-1 systemd[1]: Starting Virtualization daemon... Sep 21 11:02:10 compute-1 systemd[1]: Started Virtualization daemon. Sep 21 12:42:29 compute-1 systemd[1]: Stopping Virtualization daemon... Sep 21 12:42:29 compute-1 systemd[1]: Stopped Virtualization daemon. (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.6 'sudo systemctl status libvirtd' Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts. ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; disabled; vendor preset: enabled) Active: inactive (dead) since Thu 2017-09-21 12:42:29 UTC; 21min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 19537 (code=exited, status=0/SUCCESS) Sep 21 11:02:10 compute-0 systemd[1]: Starting Virtualization daemon... Sep 21 11:02:10 compute-0 systemd[1]: Started Virtualization daemon. Sep 21 12:42:29 compute-0 systemd[1]: Stopping Virtualization daemon... Sep 21 12:42:29 compute-0 systemd[1]: Stopped Virtualization daemon. Expected results: The libvirtd service should be running as it was before running major-upgrade-composable-steps-docker.yaml: -(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.6 'sudo systemctl status libvirtd' Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts. ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2017-09-21 11:02:10 UTC; 1h 3min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 19537 (libvirtd) CGroup: /system.slice/libvirtd.service └─19537 /usr/sbin/libvirtd Sep 21 11:02:10 compute-0 systemd[1]: Starting Virtualization daemon... Sep 21 11:02:10 compute-0 systemd[1]: Started Virtualization daemon. (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.16 'sudo systemctl status libvirtd' Warning: Permanently added '192.168.24.16' (ECDSA) to the list of known hosts. ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2017-09-21 11:02:10 UTC; 1h 3min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 19535 (libvirtd) CGroup: /system.slice/libvirtd.service └─19535 /usr/sbin/libvirtd Sep 21 11:02:10 compute-1 systemd[1]: Starting Virtualization daemon... Sep 21 11:02:10 compute-1 systemd[1]: Started Virtualization daemon. Additional info:
o/ Marius spent some time looking at this one. Going to mark as triaged and adding some thoughts so I can point others to it. To confirm, this should be happening on all upgrades right now and it shouldn't be confined to any one environment right? --> It is the deployment_steps (host_prep_tasks specifically afaics) that are being executed on the computes, not the upgrade_tasks. There is indeed a task that stops libvirtd here https://github.com/openstack/tripleo-heat-templates/blob/420126fd98193f755562887603f604ca5fd53175/docker/services/nova-libvirt.yaml#L288-L295 --> I think the roles_data disable_upgrade_deployment flag is being set correctly in the environment because both computes (and no other nodes) got the /root/tripleo_upgrade_node.sh delivered. https://github.com/openstack/tripleo-heat-templates/blob/420126fd98193f755562887603f604ca5fd53175/common/major_upgrade_steps.j2.yaml#L41-L57 --> Suspect the problem is here https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L6 but not sure why since enabled_roles should be set https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/post-upgrade.j2.yaml#L3 which just then includes the deploy-steps.j2 ...
Created attachment 1329633 [details] ansible-playbook invocations from journal on compute 0 and compute 1
I think this is caused by https://review.openstack.org/#/c/502470/4/common/deploy-steps.j2 We made that change so the json files would be written to the nodes, and the RoleConfig output would be generated for all roles, even when upgrade is disabled. But I missed that we'll then run host_prep_tasks even on nodes where upgrade is disabled, so we need to decouple that from the other tasks (which just write data that is later consumed by the ansible driven upgrade).
To clarify, I think to fix this we need to decouple host_prep_tasks here: https://github.com/openstack/tripleo-heat-templates/blob/fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 So we can make them not run on nodes where upgrade is disabled but we need to decide if that means they never get run on upgrade (in which case there may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) or if we make them run via the operator driven upgrade script.
(In reply to Steven Hardy from comment #4) > To clarify, I think to fix this we need to decouple host_prep_tasks here: > > https://github.com/openstack/tripleo-heat-templates/blob/ > fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 > > So we can make them not run on nodes where upgrade is disabled but we need > to decide if that means they never get run on upgrade (in which case there > may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) > or if we make them run via the operator driven upgrade script. o/ I just posted this wdyt? https://review.openstack.org/507524
(In reply to marios from comment #5) > (In reply to Steven Hardy from comment #4) > > To clarify, I think to fix this we need to decouple host_prep_tasks here: > > > > https://github.com/openstack/tripleo-heat-templates/blob/ > > fb54bc7901885ffb8c93c648643cab7ab70b41df/common/deploy-steps.j2#L192 > > > > So we can make them not run on nodes where upgrade is disabled but we need > > to decide if that means they never get run on upgrade (in which case there > > may sometimes be tasks that exist in both host_prep_tasks and upgrade_tasks) > > or if we make them run via the operator driven upgrade script. > > o/ I just posted this wdyt? https://review.openstack.org/507524 I don't think that will work as is, thinking about it just now. We *do* want those to be included normally, just not on upgrade. SO the disable_upgrade_deployment is not the right check to make there. We need to know if it is upgrade. WIll update the review I think you are out today anyway thanks shardy
not yet merged on Pike so moving back ASSIGNED and updating trackers
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462