Bug 1787592
| Summary: | [OSP16]Sriov minor update fails in controllers | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Candido Campos <ccamposr> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Saravanan KR <skramaja> |
| Status: | CLOSED ERRATA | QA Contact: | Candido Campos <ccamposr> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 16.0 (Train) | CC: | cfontain, ekuris, kfida, mburns, ramishra, sclewis, skramaja, supadhya |
| Target Milestone: | z1 | Keywords: | Reopened, Triaged |
| Target Release: | 16.0 (Train on RHEL 8.1) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-0.20200131231705.39bf6c2.el8ost | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-03 09:45:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1787459 *** (In reply to Rabi Mishra from comment #2) > > *** This bug has been marked as a duplicate of bug 1787459 *** are you sure its the same issue as bzz 1787459? > are you sure its the same issue as bzz 1787459? What makes you think it's not? The traceback and the error are the same. Also I see the last run (on 2nd Jan) of the job is green[1]. [1]https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-16_director-rhel-virthost-3cont_2comp-ipv4-vlan-sriov/lastBuild/ Controller Role's tuned_profile is "throughput-performance".
ComputeSriov Role's tuned_profile is "cpu-partitioning".
Variable 'tuned_profile' with value 'cpu-partitioning' is applied to
Controller Role, when import_role is used with vars.
- import_role:
name: tuned
vars:
tuned_profile: 'cpu-partitioning'
Eventhough 'cpu-partitioning' profile is defined only for the ComputeSriov
role under the condition of 'ComptueSriov' role name, because of using
import_role, the variable 'tuned_profile' with value 'cpu-partitioning' is
applied to the whole PLAY itself. As per the TripleO's Role-specific
implementation, the Role-specific variables should be applied only to the
specific TripleO Role, and should not affect the other Roles.
Firstly, is the this expected ansible behavior for import_role?
https://docs.ansible.com/ansible/latest/modules/import_role_module.html#notes
As per the ansible import_role documentation, the behavior is expected. "Since
Ansible 2.7 variables defined in vars and defaults for the role are exposed at
playbook parsing time. Due to this, these variables will be accessible to
roles and tasks executed before the location of the import_role task."
For TripleO's Role-specific parameter support, using 'include_role' gives the
expected behavior of apply the variable only to the included role (of the
TripleO's Role) only and not affecting the entier PLAY. I have create a small
gist to understand the behavior.
- name: play1
hosts: test
tasks:
- debug: var=test_role_var1
- import_role:
name: test-role
vars:
test_role_var1: 'test_var1_local'
- name: play2
hosts: test
- debug: var=test_role_var1
In this play book, the variable 'test_role_var1' is defined for the entire
PLAY 'play1' and it does not affect 'play2'. Here changing import_role to
include_role, makes the variable 'test_role_var1' to be defined only inside
the 'test-role'. This whole sample code is availble in this git repo for
better understanding.
https://github.com/krsacme/ansible-include-vs-import
But the same import_role is present in deployment too, why is it affecting
minor update only?
The static import will affect only the PLAY wher it is included. And it does
not affect other PLAYs. That is the difference for deployment and minor
update.
Deployment (only relevant content of deploy_steps_playbook.yaml):
- hosts: Controller:overcloud
name: Overcloud deploy step tasks for step 0
tasks:
- import_tasks: deploy_steps_tasks_step_0.yaml
tags:
- step0
Minor Update (only relevant content of update_steps_playbook.yaml):
- hosts: Controller
name: Run update
tasks:
- include_tasks: update_steps_tasks.yaml
with_sequence: start=0 end=5
loop_control:
loop_var: step
- import_tasks: Controller/host_prep_tasks.yaml
when: tripleo_role_name == 'Controller'
- import_tasks: deploy_steps_tasks_step_0.yaml
vars:
step: 0
- import_tasks: common_deploy_steps_tasks_step_1.yaml
In case of deployment, 'deploy_steps_tasks_step_0.yaml' is used in a separate
PLAY, which will affect only the step0.
In case of minor update, 'deploy_steps_tasks_step_0.yaml' is used along with
'host_prep_tasks.yaml', where tuned is invoked for all the roles with
repsective tuned profile (incase of controller, it should be
throughput-performance). Because of static import of import_role, the variable
'tuned_profile' is importated from the ComptueSriov role inside the
'deploy_steps_tasks_step_0.yaml' file, which is causing the variable
'tuned_profile' updated as 'cpu-partitioning' for the whole PLAY itself.
Solution to this minor update problem:
option-1) As like deployment separate out the step0 tasks to a different play.
Though this will solve the problem for now, it may cause trouble when there is
a change in step0 which may support differnt Role-specific variables.
option-2) Move all import_role to include_role, where Role-specific parameter
are used. This will gives the expected behavior, but may use static import
advantages.
I believe include_role should be the appropriate solution to handle TripleO's
Role-specific parameters, I will raise bug upstream to continue the discussion
and conclude the solution.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0655 |
Description of problem: Version-Release number of selected component (if applicable): How reproducible: deploy osp16 with sriov minor update between: RHOS_TRUNK-16.0-RHEL-8-20191213.n.5 RHOS_TRUNK-16.0-RHEL-8-20191224.n.0 Steps to Reproduce: for i in controller-0 controller-1 controller-2; do openstack overcloud update run --stack overcloud --playbook all --limit $i ; done Actual results: Fails Expected results: Pass Additional info: TASK [tuned : Enable tuned profile] ******************************************** Friday 03 January 2020 12:48:19 +0000 (0:00:00.770) 0:11:34.215 ******** fatal: [controller-2]: FAILED! => {"changed": true, "cmd": ["tuned-adm", "profile", "cpu-partitioning"], "delta": "0:00:00.433576", "end": "2020-01-03 12:48:20.209823", "msg": "non-zero return code", "rc": 1, "start": "2020-01-03 12:48:19.776247", "stderr": "Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed.", "stderr_lines": ["Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed."], "stdout": "", "stdout_lines": []} NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* controller-2 : ok=182 changed=82 unreachable=0 failed=1 skipped=370 rescued=0 ignored=1 Friday 03 January 2020 12:48:20 +0000 (0:00:00.994) 0:11:35.209 ******** =============================================================================== Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log. 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun [-] Exception occured while running the command: RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log. 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun Traceback (most recent call last): 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun super(Command, self).run(parsed_args) 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun return super(Command, self).run(parsed_args) 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun return_code = self.take_action(parsed_args) or 0 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_update.py", line 171, in take_action 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun priv_key=key) 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 1194, in run_update_ansible_action 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun verbosity=verbosity, extra_vars=extra_vars) 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/package_update.py", line 127, in update_ansible 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun raise RuntimeError('Update failed with: {}'.format(payload['message'])) 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log. 2020-01-03 12:48:22.097 86514 ERROR tripleoclient.v1.overcloud_update.MinorUpdateRun 2020-01-03 12:48:22.105 86514 ERROR openstack [-] Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log.: RuntimeError: Update failed with: Ansible failed, check log at /var/lib/mistral/0a9a4c4e-c385-476a-bdbc-b11dfcd134ed/ansible.log. 2020-01-03 12:48:22.105 86514 INFO osc_lib.shell [-] END return value: 1