Bug 1653306

Summary: Upgrade from 13 to 14 is failing with composable role
Product: Red Hat OpenStack Reporter: Roee Agiman <ragiman>
Component: openstack-tripleo-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Gurenko Alex <agurenko>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: abregman, bcafarel, bfournie, hjensas, mburns, mcornea, ragiman, yprokule
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-06 15:25:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roee Agiman 2018-11-26 14:01:48 UTC
Description of problem:
I have a Jenkins job failing on overcloud upgrade when trying to upgrade OSP13 to 14 with composable roles
We have a side-issue with IR (causes failure on UC upgrade) that has a WA, but even though next step fails. seems to be something related to 'merge-new-params-nic-config-script.py'

Job output can be found here:
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-vxlan-composable/25/console

Version-Release number of selected component (if applicable):
OSP13

How reproducible:
100%

Steps to Reproduce:
1. Try and run the job using IR master branch (The WA for UC issue)
2. See failure at overcloud-upgrade and debug the results
3.

Actual results:
Job failing

Expected results:
Job passing

Additional info:

Comment 1 Bob Fournier 2018-11-26 14:37:30 UTC
Assigning to HardProv to look at.

Comment 2 Bob Fournier 2018-11-26 15:17:29 UTC
Roee - can we see the templates and deployment command being used? Its not clear how to access them from the failed test link. It looks like there's a mismatch between the use of deprecated_nic_config_names in roles_data.yaml and the role_name.

error is here:
    # If deprecated_nic_config_names is set for role the deprecated name must
    # be used when loading the reference file.    
    with open(OPTS.roles_data) as roles_data_file:
        roles_data = yaml.safe_load(roles_data_file)
    nic_config_name = next((x.get('deprecated_nic_config_name',
                                  OPTS.role_name.lower() + '.yaml') for x in
                            roles_data if x['name'] == OPTS.role_name))

Comment 4 Bob Fournier 2018-11-26 15:51:20 UTC
Thanks Roee.  The templates at that link are using just compute.yaml and controller.yaml for nic config files:

 # Specify the relative/absolute path to the config files you want to use for override the default.
   3   OS::TripleO::ComputeSriov::Net::SoftwareConfig: nic-configs/compute.yaml
   4   OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml

While the test is failing because the script is run against the files in /home/stack/composable_roles,
e.g.
/home/stack/composable_roles/network/nic-configs//swift-storage.yaml 
/home/stack/composable_roles/network/nic-configs//database_internal.yaml

This is the script failure using the nic config yaml file that isn't needed for this test:
/home/stack/composable_roles/roles/nodes.yaml | awk -F '::' '{ print $3 }' );\n python /usr/share/openstack-tripleo-heat-templates/tools/merge-new-params-nic-config-script.py --tht-dir /usr/share/openstack-tripleo-heat-templates --role-name $NIC_ROLE_NAME --roles-data /home/stack/composable_roles/roles/roles_data.yaml --discard-comments yes --template /home/stack/composable_roles/network/nic-configs//swift-storage.yaml",

Comment 6 Harald Jensås 2018-11-26 18:11:55 UTC
The code[1] that fail in the script is:

    nic_config_name = next((x.get('deprecated_nic_config_name',
                                  OPTS.role_name.lower() + '.yaml') for x in
                            roles_data if x['name'] == OPTS.role_name))

The exception StopIteration[2] indicates that it iterated trought all the roles without finding a match. So my guess is that whatever the CI job assign to NIC_ROLE_NAME is not a role name in roles_data.

Notice that the grep command uses a double forward slash: ``nic-configs//swift-storage.yaml`` i.e:

 NIC_ROLE_NAME=$( grep /home/stack/composable_roles/network/nic-configs//swift-storage.yaml /home/stack/composable_roles/roles/nodes.yaml

Is there two forward slashes in the string you are searching for in /home/stack/composable_roles/roles/nodes.yaml? (My guess is that there is not? Maby you can use dirname and basename commands in the CI automation? Or simply remove the additional slash that is inserted?)




[1] https://github.com/openstack/tripleo-heat-templates/blame/master/tools/merge-new-params-nic-config-script.py#L213-L215
[2] https://docs.python.org/2/library/exceptions.html#exceptions.StopIteration

Comment 7 Bob Fournier 2018-11-30 13:43:08 UTC
Has this been resolved via infrared?

Comment 13 Bob Fournier 2018-12-04 15:32:08 UTC
Thanks Yurii. So it seems that this particular issue isn't a bug since the role wasn't set, but it would be useful if the script generated a clear warning message instead of the "StopIteration" exception. Do you agree?

Comment 15 Bob Fournier 2018-12-06 15:25:21 UTC
I've created a bug [1] to modify merge-new-params-nic-config to gracefully exit to make the missing role apparent.

I'm closing this bug as the fixes are in IR.   If its useful to keep this open to track the IR fixes please reopen and assign to another DFG.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1656878