Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1847463

Summary: [OVN migration] Some overcloud nodes are missing in the ansible inventory file used for migration
Product: Red Hat OpenStack Reporter: Roman Safronov <rsafrono>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Roman Safronov <rsafrono>
Severity: urgent Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: amcleod, apevec, jamsmith, jlibosva, jpretori, lhh, majopela, owalsh, sclewis, scohen, spower, tfreger
Target Milestone: z1Keywords: AutomationBlocker, Regression, Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-networking-ovn-7.2.1-0.20200611133438.15f2281.el8ost Doc Type: Bug Fix
Doc Text:
This update fixes a bug that caused the `generate-inventory` step to fail during in-place migration from ML2/OVS to ML2/OVN. + Note that in the Red Hat OpenStack Platform 16.1.0 (GA release), migration from ML2/OVS to ML2/OVN was not supported. As of Red Hat OpenStack Platform 16.1.1, in-place migration is supported for non-NFV deployments, with various exceptions, limitations, and requirements as described in "Migrating from ML2/OVS to ML2/OVN." [1] + [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/networking_with_open_virtual_network/index#migrating-ml2ovs-to-ovn
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-27 15:20:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tripleo ansible inventory osp16.1
none
tripleo ansible inventory osp16
none
tripleo ansible inventory osp16.1 ml2ovs none

Description Roman Safronov 2020-06-16 12:31:36 UTC
Description of problem:
When trying to perform a migration from ml2ovs to ml2ovn using migration tool, ovn_migration.sh script creates a file hosts_for_migration which includes only single controller and single compute node even on environments with more than 1 compute and controller nodes.

The problem happens because output of "/usr/bin/tripleo-ansible-inventory --list" changed in osp16.1 (see attached files created on osp16 and osp16.1)

When running get_role_hosts function from ovn_migration.sh (see attached script which runs this function as the migration script does i.e.  get_role_hosts /tmp/ansible-inventory.txt neutron_api ) we get the following:

on osp16.1
jq: error (at /tmp/ansible-inventory.txt:1): Cannot iterate over null (null)
controller-0

on osp16 
controller-0 controller-1 controller-2


Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20200611.n.0
python3-networking-ovn-migration-tool-7.2.1-0.20200611111150.18fabca.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Run d/s ovn migration job on an environment with 3 controllers and 2 compute nodes, deployed by tripleo/director

Actual results:
Job creates hosts_for_migration ansible inventory file with only 1 controller and 1 compute. Migration tasks are applied only for these hosts. 

Expected results:
Job creates hosts_for_migration ansible inventory file with all avalable controllers and compute nodes. Migration tasks are applied only for all hosts. 

Additional info:

Comment 1 Roman Safronov 2020-06-16 12:37:56 UTC
Script that executes get_role_hosts function from ovn_migration.sh 
The script should return list of controller nodes, expected output on a setup with 3 controllers: controller-0 controller-1 controller-2

==========================================================
#!/bin/bash

get_role_hosts() {
    inventory_file=$1
    role_name=$2
    roles=`jq -r  \.$role_name\.children\[\] $inventory_file`
    for role in $roles; do
        # During the rocky cycle the format changed to have .value.hosts
        hosts=`jq -r --arg role "$role" 'to_entries[] | select(.key == $role) | .value.hosts[]' $inventory_file`
        if [[ "x$hosts" == "x" ]]; then
            # But we keep backwards compatibility with nested childrens (Queens)
            hosts=`jq -r --arg role "$role" 'to_entries[] | select(.key == $role) | .value.children[]' $inventory_file`

            for host in $hosts; do
               HOSTS="$HOSTS `jq -r --arg host "$host" 'to_entries[] | select(.key == $host) | .value.hosts[0]' $inventory_file`"
            done
        else
            HOSTS="${hosts} ${HOSTS}"
        fi
    done
    echo $HOSTS
}

source ~/stackrc
/usr/bin/tripleo-ansible-inventory --list > /tmp/ansible-inventory.txt
get_role_hosts /tmp/ansible-inventory.txt neutron_api

Comment 2 Roman Safronov 2020-06-16 12:47:21 UTC
osp16
(undercloud) [stack@undercloud-0 ~]$ jq -r  \.neutron_api\.children\[\] /tmp/ansible-inventory.txt
Controller

osp16.1
(overcloud) [stack@undercloud-0 ~]$ jq -r  \.neutron_api\.children\[\] /tmp/ansible-inventory.txt
overcloud_neutron_api


related snippet from tripleo-ansible-inventory on osp16
    "neutron_api": {
        "children": [
            "Controller"
        ],
        "vars": {
            "ansible_ssh_user": "heat-admin"
        }
    },



related snippet from tripleo-ansible-inventory on osp16.1

   "neutron_api": {
        "children": [
            "overcloud_neutron_api"
        ]
    },
    "overcloud_neutron_api": {
        "children": [
            "overcloud_Controller"
        ]
    },

Comment 3 Roman Safronov 2020-06-16 12:51:54 UTC
Created attachment 1697608 [details]
tripleo ansible inventory osp16.1

Comment 4 Roman Safronov 2020-06-16 12:52:23 UTC
Created attachment 1697609 [details]
tripleo ansible inventory osp16

Comment 6 Roman Safronov 2020-06-16 13:33:52 UTC
hosts_for_migration file generated on osp16.1 environment with 3 controllers and 2 computes (missing 2 controllers and 1 compute)
====================================================================================================
[ovn-dbs]
controller-0 ansible_host=192.168.24.29 ovn_central=true ansible_ssh_user=heat-admin ansible_become=true

[ovn-controllers]
compute-0 ansible_host=192.168.24.33 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true
controller-0 ansible_host=192.168.24.29 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true


[overcloud-controllers:children]
ovn-dbs

[overcloud:children]
ovn-controllers
ovn-dbs


[overcloud:vars]
remote_user=heat-admin
public_network_name=nova
image_name=cirros
working_dir=/home/stack/ovn_migration
server_user_name=cirros
validate_migration=True
overcloud_ovn_deploy_script=/home/stack/overcloud-deploy-ovn.sh
overcloudrc=/home/stack/overcloudrc
ovn_migration_backups=/var/lib/ovn-migration-backup

[overcloud-controllers:vars]
remote_user=heat-admin
public_network_name=nova
image_name=cirros
working_dir=/home/stack/ovn_migration
server_user_name=cirros
validate_migration=True
overcloud_ovn_deploy_script=/home/stack/overcloud-deploy-ovn.sh
overcloudrc=/home/stack/overcloudrc
ovn_migration_backups=/var/lib/ovn-migration-backup

Comment 9 Roman Safronov 2020-06-22 12:57:38 UTC
Note: for OSP16.1 we can run "get_role_hosts" function as follows :

get_role_hosts /tmp/ansible-inventory.txt overcloud_neutron_api   (L143 from tools/ovn_migration/tripleo_environment/ovn_migration.sh )

in this case output is:
controller-0 controller-1 controller-2

as expected.

Comment 10 Roman Safronov 2020-06-22 13:21:41 UTC
Also L158 should be changed for OSP16.1, it should look like:

get_role_hosts  ansible-inventory_osp16.1_ovs  overcloud_neutron_ovs_agent

in this case output is correct (the nodes where we want to launch ovn-controller after the migration):

controller-0 controller-1 controller-2 compute-0 compute-1

Comment 11 Roman Safronov 2020-06-22 13:24:09 UTC
Created attachment 1698297 [details]
tripleo ansible inventory osp16.1 ml2ovs

Comment 12 Roman Safronov 2020-06-23 12:16:27 UTC
Possible solution is to replace L93 in  ./tools/ovn_migration/tripleo_environment/ovn_migration.sh
from
roles=`jq -r  \.$role_name\.children\[\] $inventory_file`
to 
roles=`roles=`jq -r  \.overcloud_$role_name\.children\[\] $inventory_file ||  jq -r  \.$role_name\.children\[\] $inventory_file`

In this case the function returns proper lists of nodes from OSP16 and OSP16.1  ansible-inventory file for ovn and ovs

Comment 13 Ollie Walsh 2020-06-23 15:44:36 UTC
(In reply to Roman Safronov from comment #12)
> Possible solution is to replace L93 in 
> ./tools/ovn_migration/tripleo_environment/ovn_migration.sh
> from
> roles=`jq -r  \.$role_name\.children\[\] $inventory_file`
> to 
> roles=`roles=`jq -r  \.overcloud_$role_name\.children\[\] $inventory_file ||
> jq -r  \.$role_name\.children\[\] $inventory_file`
> 
> In this case the function returns proper lists of nodes from OSP16 and
> OSP16.1  ansible-inventory file for ovn and ovs

Cannot hard-code the 'overcloud' stack name.

> get_role_hosts /tmp/ansible-inventory.txt neutron_api

I'm a bit confused by this usage. Are you trying to get the host list for a service (e.g neutron_api), or the host list for a role (e.g Controller)?

Comment 14 Roman Safronov 2020-06-23 16:01:56 UTC
(In reply to Ollie Walsh from comment #13)
> 
> Cannot hard-code the 'overcloud' stack name.
> 
> > get_role_hosts /tmp/ansible-inventory.txt neutron_api
> 
> I'm a bit confused by this usage. Are you trying to get the host list for a
> service (e.g neutron_api), or the host list for a role (e.g Controller)?

We need to get host list for a service (e.g. neutron_api).

Comments from the code:

    # We want to run ovn_dbs where neutron_api is running
    OVN_DBS=$(get_role_hosts /tmp/ansible-inventory.txt neutron_api)

    # We want to run ovn-controller where OVS agent was running before the migration
    OVN_CONTROLLERS=$(get_role_hosts /tmp/ansible-inventory.txt neutron_ovs_agent)

Comment 15 Ollie Walsh 2020-06-23 20:03:46 UTC
Can use ansible-inventory with --graph to expand the groups.

E.g:

$ tripleo-ansible-inventory --stack overcloud --static-yaml-inventory static_inventory.yaml
$ ansible-inventory -i static_inventory.yaml --graph neutron_api 
@neutron_api:
  |--@overcloud_neutron_api:
  |  |--@overcloud_Controller:
  |  |  |--overcloud-controller-0
$ ansible-inventory -i static_inventory.yaml --graph neutron_api | sed -ne 's/^[ \t|]\+--\([a-z0-9\-]\+\)$/\1/p'
overcloud-controller-0

Comment 16 Ollie Walsh 2020-06-23 20:09:54 UTC
Note ansible-inventory fails with a non-zero exit code if the group does not exist in the inventory

Comment 23 spower 2020-07-21 13:38:02 UTC
Moving to z2, this was not approved for z1 which is in Blockers Only. If it meets Blocker criteria for 16.1.1 please follow blocker process.

Comment 28 Roman Safronov 2020-08-17 12:24:19 UTC
Verified on puddle RHOS-16.1-RHEL-8-20200813.n.0 with python3-networking-ovn-migration-tool-7.2.1-0.20200611133439.15f2281.el8ost.noarch
Verified that ansible inventory file for migration (hosts_for_migration) contains all relevant overcloud nodes.

Comment 33 errata-xmlrpc 2020-08-27 15:20:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openstack-neutron bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3568