Bug 2224177
| Summary: | [RFE]overcloud node provision doesn't execute network configuration in parallel | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Keigo Noha <knoha> |
| Component: | python-tripleoclient | Assignee: | OSP Team <rhos-maint> |
| Status: | NEW --- | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 17.1 (Wallaby) | CC: | astupnik, hbrock, hjensas, jslagle, mburns, sbaker |
| Target Milestone: | --- | Keywords: | RFE, Triaged |
| Target Release: | --- | Flags: | hjensas:
needinfo?
(knoha) |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2222869 | ||
I think this bug is invalid? The 'cli-overcloud-node-network-config.yaml' playbook run's outside the role loop without any limit on what roles it executes on, so ansible will run it in paralell on all nodes in all roles. However, the growvols play and any extra playbooks added by the operator does run per-role with per-role "extra_vars". https://opendev.org/openstack/python-tripleoclient/src/branch/stable/wallaby/tripleoclient/utils.py#L2837-L2872 2837 def run_role_playbooks(self, working_dir, roles_file_dir, roles, 2838 network_config=True): 2839 inventory_file = os.path.join(working_dir, 2840 'tripleo-ansible-inventory.yaml') 2841 with open(inventory_file, 'r') as f: 2842 inventory = yaml.safe_load(f.read()) 2843 2844 growvols_play = 'cli-overcloud-node-growvols.yaml' 2845 growvols_path = rel_or_abs_path_role_playbook( 2846 constants.ANSIBLE_TRIPLEO_PLAYBOOKS, growvols_play) 2847 2848 # Pre-Network Config 2849 for role in roles: 2850 if role.get('count', 1) == 0: 2851 continue 2852 2853 role_playbooks = [] 2854 2855 for x in role.get('ansible_playbooks', []): 2856 role_playbooks.append(x['playbook']) 2857 2858 run_role_playbook(self, inventory, roles_file_dir, x['playbook'], 2859 limit_hosts=role['name'], 2860 extra_vars=x.get('extra_vars', {})) 2861 2862 if growvols_path not in role_playbooks: 2863 # growvols was not run with custom extra_vars, run it with defaults 2864 run_role_playbook(self, inventory, 2865 constants.ANSIBLE_TRIPLEO_PLAYBOOKS, 2866 growvols_play, 2867 limit_hosts=role['name']) 2868 2869 if network_config: 2870 # Network Config 2871 run_role_playbook(self, inventory, constants.ANSIBLE_TRIPLEO_PLAYBOOKS, 2872 'cli-overcloud-node-network-config.yaml') |
Description of problem: Current python-tripleoclient executes OS deployment in parallel according to concurrency option. ~~~ class ProvisionNode(command.Command): : def take_action(self, parsed_args): : extra_vars = { "stack_name": parsed_args.stack, "baremetal_deployment": roles, "baremetal_deployed_path": output_path, "ssh_public_keys": ssh_key, "ssh_private_key_file": key, "ssh_user_name": parsed_args.overcloud_ssh_user, "node_timeout": parsed_args.timeout, "concurrency": parsed_args.concurrency, "manage_network_ports": True, "configure_networking": parsed_args.network_config, "configure_networking": parsed_args.network_config, "working_dir": working_dir, "templates": parsed_args.templates, "overwrite": overwrite, } with oooutils.TempDirs() as tmp: oooutils.run_ansible_playbook( playbook='cli-overcloud-node-provision.yaml', inventory='localhost,', workdir=tmp, playbook_dir=constants.ANSIBLE_TRIPLEO_PLAYBOOKS, verbosity=oooutils.playbook_verbosity(self=self), extra_vars=extra_vars, ) ~~~ However, the latter code, configuring network things, is run by per role. ~~~ oooutils.run_role_playbooks(self, working_dir, roles_file_dir, roles, parsed_args.network_config) ~~~ Spine-leaf environment will have many custom roles for leafs. This implementation will increase the execution cycle directly based on the number of roles. To reduce the time of node provisioning, can we run this process in parallel and limit the number of parallel execution at a time to prevent resource starvation?