Bug 1787592 - [OSP16]Sriov minor update fails in controllers
Summary: [OSP16]Sriov minor update fails in controllers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z1
: 16.0 (Train on RHEL 8.1)
Assignee: Saravanan KR
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-03 14:49 UTC by Candido Campos
Modified: 2020-03-03 09:45 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200131231705.39bf6c2.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-03 09:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1859129 0 None None None 2020-01-10 06:32:15 UTC
OpenStack gerrit 701882 0 None MERGED Modify import_role to include_role for boot params service 2021-01-22 17:53:39 UTC
OpenStack gerrit 702344 0 None MERGED Modify import_role to include_role for boot params service 2021-01-22 17:53:39 UTC
Red Hat Product Errata RHBA-2020:0655 0 None None None 2020-03-03 09:45:47 UTC

Description Candido Campos 2020-01-03 14:49:21 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
deploy osp16 with sriov
minor update between:
RHOS_TRUNK-16.0-RHEL-8-20191213.n.5
RHOS_TRUNK-16.0-RHEL-8-20191224.n.0


Steps to Reproduce:

for i in controller-0 controller-1 controller-2; do openstack overcloud update run --stack overcloud --playbook all --limit $i ; done

Actual results:
Fails

Expected results:
Pass

Additional info:


TASK [tuned : Enable tuned profile] ********************************************
Friday 03 January 2020  12:48:19 +0000 (0:00:00.770)       0:11:34.215 ******** 
fatal: [controller-2]: FAILED! => {"changed": true, "cmd": ["tuned-adm", "profile", "cpu-partitioning"], "delta": "0:00:00.433576", "end": "2020-01-03 12:48:20.209823", "msg": "non-zero return code", "rc": 1, "start": "2020-01-03 12:48:19.776247", "stderr": "Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed.", "stderr_lines": ["Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed."], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
controller-2               : ok=182  changed=82   unreachable=0    failed=1    skipped=370  rescued=0    ignored=1   

Friday 03 January 2020  12:48:20 +0000 (0:00:00.994)       0:11:35.209 ******** 
=============================================================================== 

Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun [-] Exception occured while running the command: RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun Traceback (most recent call last):
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     super(Command, self).run(parsed_args)
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     return super(Command, self).run(parsed_args)
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     return_code = self.take_action(parsed_args) or 0
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_update.py", line 171, in take_action
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     priv_key=key)
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 1194, in run_update_ansible_action
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     verbosity=verbosity, extra_vars=extra_vars)
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun   File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/package_update.py", line 127, in update_ansible
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun     raise RuntimeError('Update failed with: {}'.format(payload['message']))
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.
2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun 
2020-01-03 12:48:22.105 86514 ERROR openstack [-] Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.: RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.
2020-01-03 12:48:22.105 86514 INFO osc_lib.shell [-] END return value: 1

Comment 2 Rabi Mishra 2020-01-03 15:30:52 UTC

*** This bug has been marked as a duplicate of bug 1787459 ***

Comment 3 Eran Kuris 2020-01-07 13:55:56 UTC
(In reply to Rabi Mishra from comment #2)
> 
> *** This bug has been marked as a duplicate of bug 1787459 ***

are you sure its the same issue as bzz 1787459?

Comment 4 Rabi Mishra 2020-01-07 15:15:27 UTC
> are you sure its the same issue as bzz 1787459?

What makes you think it's not? The traceback and the error are the same. Also I see the last run (on 2nd Jan) of the job is green[1].

[1]https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-16_director-rhel-virthost-3cont_2comp-ipv4-vlan-sriov/lastBuild/

Comment 9 Saravanan KR 2020-01-10 06:10:12 UTC
Controller Role's tuned_profile is "throughput-performance".
ComputeSriov Role's tuned_profile is "cpu-partitioning".

Variable 'tuned_profile' with value 'cpu-partitioning' is applied to
Controller Role, when import_role is used with vars.

    - import_role:
        name: tuned
      vars:
        tuned_profile: 'cpu-partitioning'

Eventhough 'cpu-partitioning' profile is defined only for the ComputeSriov
role under the condition of 'ComptueSriov' role name, because of using
import_role, the variable 'tuned_profile' with value 'cpu-partitioning' is
applied to the whole PLAY itself. As per the TripleO's Role-specific
implementation, the Role-specific variables should be applied only to the
specific TripleO Role, and should not affect the other Roles.

Firstly, is the this expected ansible behavior for import_role? 
  https://docs.ansible.com/ansible/latest/modules/import_role_module.html#notes
As per the ansible import_role documentation, the behavior is expected. "Since
Ansible 2.7 variables defined in vars and defaults for the role are exposed at
playbook parsing time. Due to this, these variables will be accessible to
roles and tasks executed before the location of the import_role task."


For TripleO's Role-specific parameter support, using 'include_role' gives the
expected behavior of apply the variable only to the included role (of the
TripleO's Role) only and not affecting the entier PLAY. I have create a small
gist to understand the behavior.

  - name: play1
    hosts: test
    tasks:
      - debug: var=test_role_var1
      - import_role:
          name: test-role
        vars:
          test_role_var1: 'test_var1_local'

  - name: play2
    hosts: test
      - debug: var=test_role_var1

In this play book, the variable 'test_role_var1' is defined for the entire
PLAY 'play1' and it does not affect 'play2'. Here changing import_role to
include_role, makes the variable 'test_role_var1' to be defined only inside
the 'test-role'. This whole sample code is availble in this git repo for
better understanding.

  https://github.com/krsacme/ansible-include-vs-import


But the same import_role is present in deployment too, why is it affecting
minor update only?

The static import will affect only the PLAY wher it is included. And it does
not affect other PLAYs. That is the difference for deployment and minor
update.


Deployment (only relevant content of deploy_steps_playbook.yaml):

    - hosts: Controller:overcloud                                                         
      name: Overcloud deploy step tasks for step 0                                        
      tasks:                                                                              
        - import_tasks: deploy_steps_tasks_step_0.yaml                                    
      tags:                                                                               
        - step0  


Minor Update (only relevant content of update_steps_playbook.yaml):

    - hosts: Controller
      name: Run update
      tasks:
        - include_tasks: update_steps_tasks.yaml
          with_sequence: start=0 end=5
          loop_control:
            loop_var: step
        - import_tasks: Controller/host_prep_tasks.yaml
          when: tripleo_role_name == 'Controller'
        - import_tasks: deploy_steps_tasks_step_0.yaml
          vars:
            step: 0
        - import_tasks: common_deploy_steps_tasks_step_1.yaml


In case of deployment, 'deploy_steps_tasks_step_0.yaml' is used in a separate
PLAY, which will affect only the step0.

In case of minor update, 'deploy_steps_tasks_step_0.yaml' is used along with
'host_prep_tasks.yaml', where tuned is invoked for all the roles with
repsective tuned profile (incase of controller, it should be
throughput-performance). Because of static import of import_role, the variable
'tuned_profile' is importated from the ComptueSriov role inside the 
'deploy_steps_tasks_step_0.yaml' file, which is causing the variable
'tuned_profile' updated as 'cpu-partitioning' for the whole PLAY itself.

Solution to this minor update problem:

option-1) As like deployment separate out the step0 tasks to a different play.
Though this will solve the problem for now, it may cause trouble when there is
a change in step0 which may support differnt Role-specific variables. 

option-2) Move all import_role to include_role, where Role-specific parameter
are used. This will gives the expected behavior, but may use static import
advantages. 

I believe include_role should be the appropriate solution to handle TripleO's
Role-specific parameters, I will raise bug upstream to continue the discussion
and conclude the solution.

Comment 15 Alex McLeod 2020-02-19 12:39:18 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 18 errata-xmlrpc 2020-03-03 09:45:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0655


Note You need to log in before you can comment on or make changes to this bug.