Bug 1653306 - Upgrade from 13 to 14 is failing with composable role
Summary: Upgrade from 13 to 14 is failing with composable role
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-26 14:01 UTC by Roee Agiman
Modified: 2018-12-12 14:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-06 15:25:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Roee Agiman 2018-11-26 14:01:48 UTC
Description of problem:
I have a Jenkins job failing on overcloud upgrade when trying to upgrade OSP13 to 14 with composable roles
We have a side-issue with IR (causes failure on UC upgrade) that has a WA, but even though next step fails. seems to be something related to 'merge-new-params-nic-config-script.py'

Job output can be found here:
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-vxlan-composable/25/console

Version-Release number of selected component (if applicable):
OSP13

How reproducible:
100%

Steps to Reproduce:
1. Try and run the job using IR master branch (The WA for UC issue)
2. See failure at overcloud-upgrade and debug the results
3.

Actual results:
Job failing

Expected results:
Job passing

Additional info:

Comment 1 Bob Fournier 2018-11-26 14:37:30 UTC
Assigning to HardProv to look at.

Comment 2 Bob Fournier 2018-11-26 15:17:29 UTC
Roee - can we see the templates and deployment command being used? Its not clear how to access them from the failed test link. It looks like there's a mismatch between the use of deprecated_nic_config_names in roles_data.yaml and the role_name.

error is here:
    # If deprecated_nic_config_names is set for role the deprecated name must
    # be used when loading the reference file.    
    with open(OPTS.roles_data) as roles_data_file:
        roles_data = yaml.safe_load(roles_data_file)
    nic_config_name = next((x.get('deprecated_nic_config_name',
                                  OPTS.role_name.lower() + '.yaml') for x in
                            roles_data if x['name'] == OPTS.role_name))

Comment 4 Bob Fournier 2018-11-26 15:51:20 UTC
Thanks Roee.  The templates at that link are using just compute.yaml and controller.yaml for nic config files:

 # Specify the relative/absolute path to the config files you want to use for override the default.
   3   OS::TripleO::ComputeSriov::Net::SoftwareConfig: nic-configs/compute.yaml
   4   OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml

While the test is failing because the script is run against the files in /home/stack/composable_roles,
e.g.
/home/stack/composable_roles/network/nic-configs//swift-storage.yaml 
/home/stack/composable_roles/network/nic-configs//database_internal.yaml

This is the script failure using the nic config yaml file that isn't needed for this test:
/home/stack/composable_roles/roles/nodes.yaml | awk -F '::' '{ print $3 }' );\n python /usr/share/openstack-tripleo-heat-templates/tools/merge-new-params-nic-config-script.py --tht-dir /usr/share/openstack-tripleo-heat-templates --role-name $NIC_ROLE_NAME --roles-data /home/stack/composable_roles/roles/roles_data.yaml --discard-comments yes --template /home/stack/composable_roles/network/nic-configs//swift-storage.yaml",

Comment 6 Harald Jensås 2018-11-26 18:11:55 UTC
The code[1] that fail in the script is:

    nic_config_name = next((x.get('deprecated_nic_config_name',
                                  OPTS.role_name.lower() + '.yaml') for x in
                            roles_data if x['name'] == OPTS.role_name))

The exception StopIteration[2] indicates that it iterated trought all the roles without finding a match. So my guess is that whatever the CI job assign to NIC_ROLE_NAME is not a role name in roles_data.

Notice that the grep command uses a double forward slash: ``nic-configs//swift-storage.yaml`` i.e:

 NIC_ROLE_NAME=$( grep /home/stack/composable_roles/network/nic-configs//swift-storage.yaml /home/stack/composable_roles/roles/nodes.yaml

Is there two forward slashes in the string you are searching for in /home/stack/composable_roles/roles/nodes.yaml? (My guess is that there is not? Maby you can use dirname and basename commands in the CI automation? Or simply remove the additional slash that is inserted?)




[1] https://github.com/openstack/tripleo-heat-templates/blame/master/tools/merge-new-params-nic-config-script.py#L213-L215
[2] https://docs.python.org/2/library/exceptions.html#exceptions.StopIteration

Comment 7 Bob Fournier 2018-11-30 13:43:08 UTC
Has this been resolved via infrared?

Comment 13 Bob Fournier 2018-12-04 15:32:08 UTC
Thanks Yurii. So it seems that this particular issue isn't a bug since the role wasn't set, but it would be useful if the script generated a clear warning message instead of the "StopIteration" exception. Do you agree?

Comment 15 Bob Fournier 2018-12-06 15:25:21 UTC
I've created a bug [1] to modify merge-new-params-nic-config to gracefully exit to make the missing role apparent.

I'm closing this bug as the fixes are in IR.   If its useful to keep this open to track the IR fixes please reopen and assign to another DFG.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1656878


Note You need to log in before you can comment on or make changes to this bug.