+++ This bug was initially created as a clone of Bug #1665239 +++ Customer is trying to create a lot of security group rules and at some point, neutron fails with "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." We also see a buildup in the time it takes neutron server to reply to POST /v2.0/security-group-rules.json, the first ones take less than a second to complete, up to more than 200s and this is when we start seeing 500s instead of 201s. In my lab, with 3 workers running (one on each controllers), I saw up to 12k requests on /v2.0/extensions.json per minute. On my customer's lab running with 144 workers (across 3 controllers), they see up to 166k requests on that endpoint, every minute [2]. We suspect that this is either a symptom, or cause, but that it's related to the issue. When disable heat-engine on 2 out of 3 controllers, setting the num_engine_worker to 1 on the remaining controller, incrementing the stack_action_timeout and token expiration timeout, we are able to complete the creation of a 600 security group rule stack. We see that the POSTs to the neutron api are taking an average of 0-2s to complete, and never more than 10s. Side issue, when creating a security group rule stack that failed, and when we try to update it later, heat is failing when trying to create "duplicate rules". We would expect that heat would take an inventory of what's currently created and don't try to create them again, or would simply not fail when a dupe is found. [1] ~~~ 2019-01-15 19:47:19.580 988415 INFO neutron.wsgi [req-a08cf5a3-f78c-4db7-a695-12fa6c633178 <SNIP> "POST /v2.0/security-groups.json HTTP/1.1" 201 1585 0.177781 2019-01-15 19:47:19.815 988415 INFO neutron.wsgi [req-ffcf0c33-92f7-433f-956f-83447192e780 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 692 0.209263 2019-01-15 19:47:20.317 988415 INFO neutron.wsgi [req-9ba6afe2-da48-4383-83d7-bfb234c319e5 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 691 0.204312 2019-01-15 19:51:32.691 988415 INFO neutron.wsgi [req-7803947e-50ad-44c9-b11b-117bf7c6f85f <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 16.791251 2019-01-15 19:51:36.343 988415 INFO neutron.wsgi [req-b9d4092a-26c6-4480-a00d-d3f0ec4ac9c8 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 14.068940 2019-01-15 19:51:36.698 988415 INFO neutron.wsgi [req-7afc5146-c460-40dc-a087-d1f276271427 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 1.517953 2019-01-15 19:51:49.328 988415 INFO neutron.wsgi [req-d4de311b-71d4-43c6-bd6c-0361aba957ae <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 9.610927 2019-01-15 19:51:50.892 988415 INFO neutron.wsgi [req-25eb98b2-f7aa-45f0-bf9c-bcb754dac184 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 702 14.876030 2019-01-15 19:51:58.667 988415 INFO neutron.wsgi [req-d538e36f-db02-4f2c-b92c-2d522804aba5 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 14.241387 2019-01-15 19:52:01.714 988415 INFO neutron.wsgi [req-0db87c71-4d02-42d1-9921-e31c5ff1a05f <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 702 14.407799 2019-01-15 19:52:10.841 988415 INFO neutron.wsgi [req-da9f14f9-9c45-42a4-8ac1-3330440648f2 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 26.159587 2019-01-15 19:52:16.303 988415 INFO neutron.wsgi [req-96e571bf-6b34-425b-8f7e-5e036eb1f820 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 15.903058 2019-01-15 19:52:16.654 988415 INFO neutron.wsgi [req-e6a94f99-52cc-4209-9407-4c79f6ded4f4 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 13.617583 2019-01-15 19:52:29.603 988415 INFO neutron.wsgi [req-767ff22e-1501-429f-8456-f79136741563 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 701 16.726857 2019-01-15 19:52:36.672 988415 INFO neutron.wsgi [req-277c428a-c2b4-4306-9f8a-93ec6c330f2d <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 26.011657 2019-01-15 19:52:39.039 988415 INFO neutron.wsgi [req-d046d305-a858-4197-9aff-d1aa92cc5013 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 18.620860 2019-01-15 19:52:39.129 988415 INFO neutron.wsgi [req-1387dd6d-4b7d-4ea1-a2d3-a8025691339d <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 21.609913 2019-01-15 19:52:45.177 988415 INFO neutron.wsgi [req-e423bac0-d3a3-4bd5-80a7-26ecc5acf992 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 20.025870 2019-01-15 19:52:46.028 988415 INFO neutron.wsgi [req-1627776b-69cc-450b-af0f-6a1cd46b3156 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 22.658502 2019-01-15 19:52:51.165 988415 INFO neutron.wsgi [req-2302153e-7e74-453a-92a5-977d5fe8241c <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 22.737595 2019-01-15 19:52:51.180 988415 INFO neutron.wsgi [req-39c624e5-7561-4991-9b52-b1158c35e36d <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 18.844666 2019-01-15 19:52:54.547 988415 INFO neutron.wsgi [req-cfac5a7e-2f77-41af-9b4e-2a9eaf5834ca <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 24.258442 2019-01-15 19:52:54.662 988415 INFO neutron.wsgi [req-5f3307a8-c245-4072-8d37-8515602819c9 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 15.109443 2019-01-15 19:52:59.831 988415 INFO neutron.wsgi [req-b0a3583f-87bc-4fd8-8b50-2b5f85733b82 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 15.903552 2019-01-15 19:53:06.112 988415 INFO neutron.wsgi [req-434d319b-d064-4867-b649-7ce8476472d7 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 35.665496 2019-01-15 19:53:06.530 988415 INFO neutron.wsgi [req-b5698a80-8bd1-40d0-aa65-9aa7b036d60d <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 17.874464 2019-01-15 19:53:06.886 988415 INFO neutron.wsgi [req-d1b19d79-3162-4c22-879b-e9f6a0d0be48 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 14.455133 2019-01-15 19:53:15.236 988415 INFO neutron.wsgi [req-b58bd74a-6b79-4509-b248-c0bfd4495851 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 16.102678 2019-01-15 19:53:18.277 988415 INFO neutron.wsgi [req-e4cc92f1-1a9d-48e2-885c-635a6b229150 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 701 45.369257 2019-01-15 19:53:20.133 988415 INFO neutron.wsgi [req-0050eec3-436e-402f-9329-7c6e17646f05 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 14.844351 2019-01-15 19:53:20.989 988415 INFO neutron.wsgi [req-27b1329d-be50-4d50-9388-1a3e0625ba40 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 44.393658 2019-01-15 19:53:27.791 988415 INFO neutron.wsgi [req-8e27c892-1cff-4612-b049-5bd7a7f89d28 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 21.656269 2019-01-15 19:53:32.453 988415 INFO neutron.wsgi [req-650d4517-e9d1-41a2-b6ad-5608f9c700c0 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 20.101678 2019-01-15 19:53:34.202 988415 INFO neutron.wsgi [req-6c6ba3fb-ce0c-4b97-999c-5c830821d6ec <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 15.787255 2019-01-15 19:53:42.130 988415 INFO neutron.wsgi [req-b671cc75-6ea5-445b-bbd6-5ada6fea7be3 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 707 54.972306 2019-01-15 19:53:45.845 988415 INFO neutron.wsgi [req-e08068d7-db37-4400-b306-74f0f1172125 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 23.388320 2019-01-15 19:53:58.979 988415 INFO neutron.wsgi [req-086f5e3d-1b75-4fd5-a06c-fc14471fcb2c <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 30.182090 2019-01-15 19:54:01.465 988415 INFO neutron.wsgi [req-d38479ce-841d-48be-afe2-d5a2ffa3c606 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 54.739885 2019-01-15 19:54:10.876 988415 INFO neutron.wsgi [req-30837d47-30ec-48d7-af7a-a395406c6445 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 39.601803 2019-01-15 19:54:25.617 988415 INFO neutron.wsgi [req-ecf947f2-398b-490f-a5d0-cc57215cb78e <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 702 44.332826 2019-01-15 19:54:40.077 988415 INFO neutron.wsgi [req-3921fd53-45a1-441d-ade4-dd608dc151d3 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 47.553737 2019-01-15 19:54:52.544 988415 INFO neutron.wsgi [req-7c719f4f-72dd-40ec-bc77-ddc839183408 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 154.134486 2019-01-15 19:54:53.516 988415 INFO neutron.wsgi [req-e9707c8b-d868-4c88-89ed-6930b62958cc <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 706 28.462749 2019-01-15 19:55:14.696 988415 INFO neutron.wsgi [req-a5c3b87b-bec0-411c-a32b-d91f63daa7ae <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 118.773054 2019-01-15 19:55:30.893 988415 INFO neutron.wsgi [req-d3a9210a-510a-4d5b-b795-182d4800ab32 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 102.303439 2019-01-15 19:55:47.641 988415 INFO neutron.wsgi [req-b5bdf2f7-2521-40af-8618-f4b11c892d6a <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 705 106.655262 2019-01-15 19:55:56.359 988415 INFO neutron.wsgi [req-2530ec41-7ebb-465e-9f29-5d33728c1095 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 188.876560 2019-01-15 19:56:02.326 988415 INFO neutron.wsgi [req-7c456e85-9c7d-4098-aab3-f41196a6b136 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 704 107.206438 2019-01-15 19:56:16.989 988415 INFO neutron.wsgi [req-80a1b4e1-8f5e-4a43-8a8f-d8f323169fc1 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 702 143.618252 2019-01-15 19:56:18.348 988415 INFO neutron.wsgi [req-bb2b5c17-7e00-4a2f-8319-59054a91cb65 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 199.180581 2019-01-15 19:56:28.786 988415 INFO neutron.wsgi [req-bb911917-cde4-4dbc-b84c-dc7f0d15d990 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 183.594454 2019-01-15 19:56:34.992 988415 INFO neutron.wsgi [req-700ea889-3c41-427e-894e-c5ead319dda3 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 207.171425 2019-01-15 19:56:41.027 988415 INFO neutron.wsgi [req-ddb79694-a729-4c5e-872f-6d41fc69bfa4 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 193.279332 2019-01-15 19:56:47.757 988415 INFO neutron.wsgi [req-330def83-1180-4a03-af6e-61c6666e314b <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 215.920990 2019-01-15 19:56:49.348 988415 INFO neutron.wsgi [req-472aa58f-d6af-44c5-bc9f-c0e870b252c8 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 213.843158 2019-01-15 19:56:50.166 988415 INFO neutron.wsgi [req-330597d3-1e03-4718-acc7-7816edab26a5 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 201 703 137.999861 2019-01-15 19:56:50.194 988415 INFO neutron.wsgi [req-76d9497e-64be-46f8-9817-e372be743f9b <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 211.362064 2019-01-15 19:56:54.332 988415 INFO neutron.wsgi [req-a65f78ed-0043-458e-b914-1cf809416190 <SNIP> "POST /v2.0/security-group-rules.json HTTP/1.1" 500 344 215.126256 ~~~ [2] ~~~ 37167 2019-01-07 20:22 166220 2019-01-07 20:23 107487 2019-01-07 20:24 518 2019-01-07 20:25 7577 2019-01-07 20:26 3884 2019-01-07 20:27 372 2019-01-07 20:28 471 2019-01-07 20:32 1067 2019-01-07 20:33 147207 2019-01-07 20:34 158324 2019-01-07 20:35 1747 2019-01-07 20:36 817 2019-01-07 20:37 68 2019-01-07 20:38 1283 2019-01-07 20:39 ~~~
Setting the number of worker to 1 is not an appropriate workaround for the customer. It took 1h41m to create ~600 rules with 1 worker, customer has 4000 rules to create. It would take more than 7h. They need to create this stack on multiple deployments. Blocking other tenants from working with heat for 7h, every time they need to create or update that stack is not acceptable.
There is a patch in Ocata that would serve to slow down the creation a bit by adding a 1s delay before starting each resource that is a root of the dependency graph (i.e. doesn't depend on anything): https://review.openstack.org/445355 However, with 4000 rules to create and even the initial ones taking 1-2s, this would probably be insufficient anyway. The best way to solve this, especially in OSP 10, would be in the template itself. Hopefully with 4000 rules the template is being autogenerated and not constructed by hand? Start by picking a batch size, n. For the first n resources, you don't need to do anything. For the next n, add a depends_on in each to make it require (a different) one of the first n. For the third n, make each depend on one from the second n, and so on. This will ensure that only n resources are being created in parallel. e.g. for n=10: a b c d e f g h i j ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | (depends_on) k l m n o p q r s t ^ ^ ^ ^ ^ ^ | | | | | | (depends_on) u v w x y z You can then tune the batch size to whatever works without overwhelming neutron. Each worker can process more than one resource (it switches between greenthreads while waiting for I/O), so with 144 workers it's likely that it's trying to do substantially all of the resources at roughly the same time. I would start by generating a template with a batch sixe of say 20 and seeing how it goes from there.
Created attachment 1521555 [details] Script to fix up template
Oops, I should have been updating the neutron bug, 1665239, will do that now.
The neutron bug is waiting for QA now. https://bugzilla.redhat.com/show_bug.cgi?id=1665239
Fixed in neutron.