Bug 1897890
| Summary: | OSP16.1 config-download does not scale with increasing number of roles | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Uemit Seren <uemit.seren> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Alex Schultz <aschultz> |
| Status: | CLOSED ERRATA | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | aschultz, astupnik, jhajyahy, jpretori, marjones, mburns, pweeks, schhabdi, slinaber |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 16.2 (Train on RHEL 8.4) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-2.20201217112250.a0330d2.el8ost | Doc Type: | Enhancement |
| Doc Text: |
This enhancement improves the efficiency, performance, and execution time of deployment and update tasks for environments with a large number of roles. The logging output of the deployment process has been improved to include task IDs for better tracking of specific task executions, which can occur at different times. You can use the task IDs to correlate timing and execution when you troubleshoot executions.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-15 07:09:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Uemit Seren
2020-11-15 11:09:59 UTC
We're aware of why this happens but the fix likely won't be available until 16.2. Hi Alex, thanks for the information. Is there an ETA for OSP 16.2 and will it be upstream Train or a newer release ? I couldn't find any information regarding the OSP roadmap beyond OSP 16.1 From a tripleo standpoint, it'll be based on what's available in stable/train on a newer version of RHEL8. The issue with this is the execution strategy of the deployment as part of ansible which we have addressed in Ussuri onward so we'll need to backport it to train and do additional testing. I saw your blog post https://www.redhat.com/en/blog/faster-deployments-red-hat-openstack-platform-deployment-ansible-strategy-plugins Wouldn't the tripleo_free strategy still benefit from doing seperate plays for each role instead of skipping the hosts via a when clause or was this benchmarked/profiled and there is no significant runtime decrease when combining tripleo_free strategy with separate plays ? No it would not because plays are not able to be executed in parallel where tasks are. If you split apart the plays, you have to parallelize the ansible execution which is more complicated. We get something closer to what was invoked under heat with the tasks within a specific play being run in parallel and not limited in their execution order because it's similar to the previous Deployment phases. There is a specific alignment of the execution that has to occur across an entire cloud which is where the plays contributing to this. What we have right now prior to the usage of tripleo_free is that we're running deployment tasks for roles in serial so while role 1 is executing, all the other roles are idle. The tripleo_free switch allows the play to continue on the non-targeted role to the tasks they need. The issue with trying to backport this is that it's big UX changes in order to allow for end users to be able to track what is going on which is why we're targeting 16.2 possible instead of making such a giant shift in 16.1.4 or a later version. Specifically the more roles you have the following sections add additional overhead time as currently written: https://github.com/openstack/tripleo-heat-templates/blob/0fdaaf51ea9c97d89781e691ffcf2666fdde8ab5/common/deploy-steps.j2#L586-L597 https://github.com/openstack/tripleo-heat-templates/blob/0fdaaf51ea9c97d89781e691ffcf2666fdde8ab5/common/deploy-steps.j2#L672-L676 When tripleo_free is used, all the nodes will get to their tasks for execution as they hit this code rather than having to wait for all the previous roles to finish before they start executing. Then all nodes stop proceeding with the deployment until the play is finished and then they start to execute in order. This is similar to how the heat deployment execution process used to occur where all nodes would run their "step 1" tasks at the same time regardless of roles but the whole process waited until "step 1" was fully complete before moving on to the next step. Ah I see. Thanks for the detailed explanation. Deployed job: DFG-df-deployment-16.2-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-no_OC_SSL-ceph-ipv4-geneve-RHELOSP-31889 which has six composable roles. Did a stack update and recorded: Elapsed Time: 0:24:47.198112 Appended six unused roles(CountDefault: 0) to end of roles_data.yaml Did stack update and recorded: Elapsed Time: 0:24:55.587699 Increased time was not seen with unused composable roles. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483 |