Description of problem: In the previous iteration we had that mechanism in place https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/tasks/tripleo_upgrade_node.sh#L61-L69 for ensure that non-controller node were working after the upgrade and before the converge. This is especially critical for compute node which should be able to get vm before the convergence step. For compute node we also have to ensure that rpc pin/unpin happen within the nova_compute container using this parameter UpgradeLevelNovaCompute.
Current proposal (duplicated from upstream bug for convenience) and adding the reviews to trackers above With the help of a utility function in https://review.openstack.org/#/c/491749/ (python-tripleoclient) we can use the upgrade_tasks playbook generated by the tripleo-heat-templates at https://review.openstack.org/#/c/490848/ (note: this depends on a few shardy tht reviews see shortlog). So, in the upgrade-non-controller.sh script, we add download and execution of both the upgrade_tasks and deploy_steps playbooks with https://review.openstack.org/#/c/490847/ (tripleo-common). The generated playbooks look like https://paste.fedoraproject.org/paste/gUi5Ckq2qoTT~ed5kItxRw/raw (while it lasts)... seems like most of the things we need for the compute and swift nodes are in the ugprade_tasks (e.g. stop openstack-nova-compute which we had to add recently into the tripleo_upgrade_node.sh). Reviews: (tripleo-common): https://review.openstack.org/#/c/490847/ "Download and run upgrade/deploy_steps_playbooks for upgrade" | |Depends-On: | -->(tripleo-heat-templates): https://review.openstack.org/#/c/490848/ "Also write an upgrade_(batch)_tasks playbook" (&see shortlog!) | |Depends-On: | -->(python-tripleo-client): https://review.openstack.org/#/c/491749/ "Adds when in upgrade_tasks playbook written by config download"
just also posted https://review.openstack.org/498776 for disabling the puppet config run and related workarounds from the tripleo-upgrade-node.sh script. If testing you'll also need to apply this on your tripleo-heat-templates before running the major-upgrade-composable-steps-docker.yaml stage of the overcloud upgrade. adding to trackers and for testing: # tripleo-heat-templates: https://review.openstack.org/#/c/498776/ "Remove puppet run and workarounds from tripleo_upgrade_node.sh" curl https://review.openstack.org/changes/498776/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1
So I managed to get the RoleConfig output after applying the following patch and running the deploy command with --setup-heat-outputs option. I think we should include this step in the major-upgrade-composable-steps-docker.yaml step so we don't have to include an additional step in the upgrade procedure. curl -4 https://review.openstack.org/changes/495658/revisions/current/patch?download | base64 -d | sudo patch -d /usr/lib/python2.7/site-packages/ -p1 -f #!/bin/bash timeout 180m openstack overcloud deploy \ --setup-heat-outputs \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services-docker/sahara.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/docker-osp12.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \ After this I was able to run upgrade-non-controller.sh --upgrade compute-0 which failed with the below error: a quick note here: /usr/bin/tripleo-ansible-inventory --list takes around 2 minutes for a basic 1 controller + 1 compute deployment so you get the impression that the command is stuck at: Wed Aug 30 11:03:04 EDT 2017 upgrade-non-controller.sh Starting the upgrade steps playbook run for compute-0 from compute-0/tripleo-bVAAT_-config/ In the end the playbook fails with the following error: TASK [Ensure empty directory: emptying.] ****************************************************************************************************************************************************************************************************** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: ('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2) fatal: [192.168.24.13]: FAILED! => {"failed": true, "msg": "The conditional check '('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2)' failed. The error was: error while evaluating conditional (('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2)): 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/home/stack/compute-0/tripleo-bVAAT_-config/Compute/upgrade_tasks.yaml': line 42, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - file:\n ^ here\n"} to retry, use: --limit @/home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_playbook.retry
This is the complete output: You can see that the 'Check openvswitch version' is skipped hence the dict object' has no attribute 'stdout' error regarding ovs_version.stdout PLAY [overcloud] ****************************************************************************************************************************************************************************************************************************** TASK [Gathering Facts] ************************************************************************************************************************************************************************************************************************ ok: [192.168.24.13] TASK [include] ******************************************************************************************************************************************************************************************************************************** included: /home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_tasks.yaml for 192.168.24.13 included: /home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_tasks.yaml for 192.168.24.13 included: /home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_tasks.yaml for 192.168.24.13 included: /home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_tasks.yaml for 192.168.24.13 included: /home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_tasks.yaml for 192.168.24.13 TASK [include] ******************************************************************************************************************************************************************************************************************************** skipping: [192.168.24.13] TASK [include] ******************************************************************************************************************************************************************************************************************************** included: /home/stack/compute-0/tripleo-bVAAT_-config/Compute/upgrade_tasks.yaml for 192.168.24.13 TASK [Check if neutron_ovs_agent is deployed] ************************************************************************************************************************************************************************************************* changed: [192.168.24.13] TASK [Check yum for rpm-python present] ******************************************************************************************************************************************************************************************************* skipping: [192.168.24.13] TASK [Fail when rpm-python wasn't present] **************************************************************************************************************************************************************************************************** skipping: [192.168.24.13] TASK [PreUpgrade step0,validation: Check service neutron-openvswitch-agent is running] ******************************************************************************************************************************************************** skipping: [192.168.24.13] TASK [Stop neutron_ovs_agent service] ********************************************************************************************************************************************************************************************************* skipping: [192.168.24.13] TASK [Stop snmp service] ********************************************************************************************************************************************************************************************************************** skipping: [192.168.24.13] TASK [Check openvswitch version.] ************************************************************************************************************************************************************************************************************* skipping: [192.168.24.13] TASK [Check openvswitch packaging.] *********************************************************************************************************************************************************************************************************** skipping: [192.168.24.13] TASK [Ensure empty directory: emptying.] ****************************************************************************************************************************************************************************************************** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: ('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2) fatal: [192.168.24.13]: FAILED! => {"failed": true, "msg": "The conditional check '('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2)' failed. The error was: error while evaluating conditional (('2.5.0-14' in '{{ovs_version.stdout}}' or ovs_packaging_issue|succeeded) and (step == 2)): 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/home/stack/compute-0/tripleo-bVAAT_-config/Compute/upgrade_tasks.yaml': line 42, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - file:\n ^ here\n"} to retry, use: --limit @/home/stack/compute-0/tripleo-bVAAT_-config/upgrade_steps_playbook.retry PLAY RECAP ************************************************************************************************************************************************************************************************************************************ 192.168.24.13 : ok=8 changed=1 unreachable=0 failed=1
we also need https://review.openstack.org/#/c/499540/ mcornea ++ adding to trackers
Adding another review for allowing the upgrade tasks to run between steps: https://review.openstack.org/#/c/499517/ Also I filed a BZ for tripleo-inventory being too slow: https://bugzilla.redhat.com/show_bug.cgi?id=1487759
Remaining issues that we need to track in this bug: - set up RoleConfig output during major-upgrade-composable-steps so we don't have to run an additional step with --setup-heat-outputs option - cache the tripleo-ansible-inventory so we don't waste 5 minutes per non controller node waiting for the ouptut of tripleo-ansible-inventory
(In reply to Marius Cornea from comment #13) > Remaining issues that we need to track in this bug: > > - set up RoleConfig output during major-upgrade-composable-steps so we > don't have to run an additional step with --setup-heat-outputs option > > - cache the tripleo-ansible-inventory so we don't waste 5 minutes per non > controller node waiting for the ouptut of tripleo-ansible-inventory The slow inventory issue was addressed by https://review.openstack.org/#/c/501603/ In addition we need to address upgrading non controller nodes for split stack deployments.
RoleConfig output issue is being tracked in bug 1490425 Remaining issues to be addressed by this bug: - upgrading non controller nodes on split stack deployments
(In reply to Marius Cornea from comment #15) > RoleConfig output issue is being tracked in bug 1490425 > > Remaining issues to be addressed by this bug: > > - upgrading non controller nodes on split stack deployments We actually have a different BZ (bug 1474697) filed for split stack deployments so I think this bug can be moved to POST as all the patches attached to it are merged to stable/pike.
(In reply to Marius Cornea from comment #16) > (In reply to Marius Cornea from comment #15) > > RoleConfig output issue is being tracked in bug 1490425 > > > > Remaining issues to be addressed by this bug: > > > > - upgrading non controller nodes on split stack deployments > > We actually have a different BZ (bug 1474697) filed for split stack > deployments so I think this bug can be moved to POST as all the patches > attached to it are merged to stable/pike. thanks mcornea I updated the trackers to point to stable/pike (the last two merged before pike was branched and I checked they are in stable/pike tripleo-heat-templates and tripleo-common for https://review.openstack.org/#/c/490848/ and https://review.openstack.org/#/c/490847/ respectively I'll bring this on our call later and we can move to POST
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462