Bug 1561178

Summary: FFU: provide Openstack CLI for the fast forward upgrade workflow
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: python-tripleoclientAssignee: Marios Andreou <mandreou>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: dbecker, hbrock, jschluet, jslagle, jstransk, mandreou, mbracho, mbultel, mburns, morazi, rhel-osp-director-maint, sathlang
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-9.2.1-1.el7ost openstack-tripleo-common-8.6.1-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:48:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2018-03-27 20:19:55 UTC
Description of problem:

Currently the fast forward upgrade workflow is made up of a combination of running stack updates via overcloud deploy command and running several ansible-playbooks commands manually. 

This process is fragile as it is error prone(the overcloud with its workloads can reach in an undesired state if some playbook is missed or contains the wrong options). 

Moreover the entire workflow require the user to understand the internals of the upgrade mechanism in order to make sense of the steps, e.g. what each ansible playbook command does.

We should simplify this process and hide unnecessary bits behind a CLI wrapper which exposes the actions which are meaningful to the operator, e.g: assuming a basic environment with 3 controller and 2 computes, the end user should be able to do the upgrade into simple steps such as:

openstack ffwd-upgrade --role Controller ## upgrade controller nodes
openstack ffwd-upgrade --node compute-0  ## upgrade compute-0 node
openstack ffwd-upgrade --node compute-1  ## upgrade compute-1 node

As it is right now the procedure involves multiple steps which make the upgrade experience confusing and also introduces multiple points of possible manual errors. 

I am exemplifying below the steps required to upgrade at this moment:

1. Update stack outputs by appending environment files to the overcloud deploy command used for initial deployment

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
--control-scale 3 \
--control-flavor controller \
--compute-scale 2 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/fast-forward-upgrade.yaml \
-e /home/stack/ffu_repos.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/config-download-environment.yaml \
-e /home/stack/ceph-ansible-env.yaml \

2. Download config and save the location to be later used with ansible-playbook commands:
openstack overcloud config download --name $stack_name

3. Generate inventory file
/usr/bin/tripleo-ansible-inventory \
   --plan $stack_name \
   --static-yaml-inventory /home/stack/tripleo-ansible-inventory-static.yaml

4. Run FFU playbook
ansible-playbook --module-path /usr/share/ansible-modules/ -i /home/stack/tripleo-ansible-inventory-static.yaml -b /home/stack/tripleo-p1woxQ-config/fast_forward_upgrade_playbook.yaml

5. Run FFU upgrade step on controller nodes
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-p1woxQ-config/upgrade_steps_playbook.yaml --skip-tags=validation --limit=Controller

6. Run FFU deploy step on controller nodes
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-UnqRNC-config/deploy_steps_playbook.yaml --limit=Controller

7. Run FFU upgrade step on compute-0 node
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-p1woxQ-config/upgrade_steps_playbook.yaml --skip-tags=validation --limit=compute-0

8. Run FFU deploy step on compute-0 nodes
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-UnqRNC-config/deploy_steps_playbook.yaml --limit=compute-0

9. Run FFU upgrade step on compute-1 node
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-p1woxQ-config/upgrade_steps_playbook.yaml --skip-tags=validation --limit=compute-1

10. Run FFU deploy step on compute-1 nodes
ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory -b /home/stack/tripleo-UnqRNC-config/deploy_steps_playbook.yaml --limit=compute-1

Comment 1 Marios Andreou 2018-03-30 15:02:45 UTC
spent some time looking into this today posted https://review.openstack.org/557936 https://review.openstack.org/557937 for discussion ... I used the sequence from comment #0 as reference, see my comment at https://review.openstack.org/#/c/557937/1//COMMIT_MSG@13

the converge is still pending will revisit next week thanks

Comment 2 Marios Andreou 2018-04-19 12:22:30 UTC
updating trackers for queens reviews

Comment 8 errata-xmlrpc 2018-06-27 13:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086