Bug 1667257
| Summary: | [rhos-prio] When creating ~600 security group rules using heat, with more than 1 heat-engine thread, neutron server is getting hammered and it eventually timeouts | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Vallee Delisle <dvd> | ||||
| Component: | openstack-heat | Assignee: | Zane Bitter <zbitter> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Ronnie Rasouli <rrasouli> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 10.0 (Newton) | CC: | bhaley, mburns, njohnston, ramishra, sbaker, shardy, zbitter | ||||
| Target Milestone: | async | Keywords: | Triaged, ZStream | ||||
| Target Release: | 10.0 (Newton) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1665239 | Environment: | |||||
| Last Closed: | 2019-05-30 16:01:00 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1665239, 1776950 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
David Vallee Delisle
2019-01-17 21:04:43 UTC
Setting the number of worker to 1 is not an appropriate workaround for the customer. It took 1h41m to create ~600 rules with 1 worker, customer has 4000 rules to create. It would take more than 7h. They need to create this stack on multiple deployments. Blocking other tenants from working with heat for 7h, every time they need to create or update that stack is not acceptable. There is a patch in Ocata that would serve to slow down the creation a bit by adding a 1s delay before starting each resource that is a root of the dependency graph (i.e. doesn't depend on anything): https://review.openstack.org/445355 However, with 4000 rules to create and even the initial ones taking 1-2s, this would probably be insufficient anyway. The best way to solve this, especially in OSP 10, would be in the template itself. Hopefully with 4000 rules the template is being autogenerated and not constructed by hand? Start by picking a batch size, n. For the first n resources, you don't need to do anything. For the next n, add a depends_on in each to make it require (a different) one of the first n. For the third n, make each depend on one from the second n, and so on. This will ensure that only n resources are being created in parallel. e.g. for n=10: a b c d e f g h i j ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | (depends_on) k l m n o p q r s t ^ ^ ^ ^ ^ ^ | | | | | | (depends_on) u v w x y z You can then tune the batch size to whatever works without overwhelming neutron. Each worker can process more than one resource (it switches between greenthreads while waiting for I/O), so with 144 workers it's likely that it's trying to do substantially all of the resources at roughly the same time. I would start by generating a template with a batch sixe of say 20 and seeing how it goes from there. Created attachment 1521555 [details]
Script to fix up template
Oops, I should have been updating the neutron bug, 1665239, will do that now. The neutron bug is waiting for QA now. https://bugzilla.redhat.com/show_bug.cgi?id=1665239 Fixed in neutron. |