Bug 2224177

Summary: [RFE]overcloud node provision doesn't execute network configuration in parallel
Product: Red Hat OpenStack Reporter: Keigo Noha <knoha>
Component: python-tripleoclientAssignee: OSP Team <rhos-maint>
Status: NEW --- QA Contact: David Rosenfeld <drosenfe>
Severity: low Docs Contact:
Priority: low    
Version: 17.1 (Wallaby)CC: astupnik, hbrock, hjensas, jslagle, mburns, sbaker
Target Milestone: ---Keywords: RFE, Triaged
Target Release: ---Flags: hjensas: needinfo? (knoha)
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2222869    

Description Keigo Noha 2023-07-20 06:57:55 UTC
Description of problem:
Current python-tripleoclient executes OS deployment in parallel according to concurrency option.
~~~
class ProvisionNode(command.Command):
:
    def take_action(self, parsed_args):
:
        extra_vars = {
            "stack_name": parsed_args.stack,
            "baremetal_deployment": roles,
            "baremetal_deployed_path": output_path,
            "ssh_public_keys": ssh_key,
            "ssh_private_key_file": key,
            "ssh_user_name": parsed_args.overcloud_ssh_user,
            "node_timeout": parsed_args.timeout,
            "concurrency": parsed_args.concurrency,
            "manage_network_ports": True,
            "configure_networking": parsed_args.network_config,
            "configure_networking": parsed_args.network_config,
            "working_dir": working_dir,
            "templates": parsed_args.templates,
            "overwrite": overwrite,
        }

        with oooutils.TempDirs() as tmp:
            oooutils.run_ansible_playbook(
                playbook='cli-overcloud-node-provision.yaml',
                inventory='localhost,',
                workdir=tmp,
                playbook_dir=constants.ANSIBLE_TRIPLEO_PLAYBOOKS,
                verbosity=oooutils.playbook_verbosity(self=self),
                extra_vars=extra_vars,
            )
~~~

However, the latter code, configuring network things, is run by per role.
~~~
        oooutils.run_role_playbooks(self, working_dir, roles_file_dir,
                                    roles, parsed_args.network_config)
~~~

Spine-leaf environment will have many custom roles for leafs.
This implementation will increase the execution cycle directly based on the number of roles.

To reduce the time of node provisioning, can we run this process in parallel and limit the number of parallel execution at a time to prevent resource starvation?

Comment 2 Harald Jensås 2023-08-15 09:18:21 UTC
I think this bug is invalid?
The 'cli-overcloud-node-network-config.yaml' playbook run's outside the role loop without any limit on what roles it executes on, so ansible will run it in paralell on all nodes in all roles.

However, the growvols play and any extra playbooks added by the operator does run per-role with per-role "extra_vars".


https://opendev.org/openstack/python-tripleoclient/src/branch/stable/wallaby/tripleoclient/utils.py#L2837-L2872

2837 def run_role_playbooks(self, working_dir, roles_file_dir, roles,
2838                        network_config=True):
2839     inventory_file = os.path.join(working_dir,
2840                                   'tripleo-ansible-inventory.yaml')
2841     with open(inventory_file, 'r') as f:
2842         inventory = yaml.safe_load(f.read())
2843         
2844     growvols_play = 'cli-overcloud-node-growvols.yaml'
2845     growvols_path = rel_or_abs_path_role_playbook(
2846         constants.ANSIBLE_TRIPLEO_PLAYBOOKS, growvols_play)
2847         
2848     # Pre-Network Config
2849     for role in roles:
2850         if role.get('count', 1) == 0:
2851             continue
2852         
2853         role_playbooks = []
2854                     
2855         for x in role.get('ansible_playbooks', []):
2856             role_playbooks.append(x['playbook'])
2857         
2858             run_role_playbook(self, inventory, roles_file_dir, x['playbook'],
2859                               limit_hosts=role['name'],
2860                               extra_vars=x.get('extra_vars', {}))
2861                 
2862         if growvols_path not in role_playbooks:
2863             # growvols was not run with custom extra_vars, run it with defaults
2864             run_role_playbook(self, inventory,
2865                               constants.ANSIBLE_TRIPLEO_PLAYBOOKS,
2866                               growvols_play,
2867                               limit_hosts=role['name'])
2868 
2869     if network_config:
2870         # Network Config
2871         run_role_playbook(self, inventory, constants.ANSIBLE_TRIPLEO_PLAYBOOKS,
2872                           'cli-overcloud-node-network-config.yaml')