Created attachment 1875624 [details] OCP density heavy 120 node NB database. Description of problem: In a large scale deployment, e.g., during a density-heavy OpenShift scale test running a cluster of 120 nodes, 10K pods, and 10K load balancers applied to all node logical switches and routers, northd spends a large amount of time processing and generating logical flows that implement the load balancing in the logical router ppipeline. With the attached database, focusing on a single load balancer that corresponds to an OCP service (Service_2626e963-load-cluster-preupgrade-20220405-942/cluster-density-942-5_TCP_cluster): $ ovn-nbctl list load_balancer Service_2626e963-load-cluster-preupgrade-20220405-942/cluster-density-942-5_TCP_cluster _uuid : 3de1afff-e606-41d2-b3ab-43499b8690c8 external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="2626e963-load-cluster-preupgrade-20220405-942/cluster-density-942-5"} health_check : [] ip_port_mappings : {} name : "Service_2626e963-load-cluster-preupgrade-20220405-942/cluster-density-942-5_TCP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : tcp selection_fields : [] vips : {"172.30.46.142:443"="10.153.5.214:8443,10.160.2.79:8443", "172.30.46.142:80"="10.153.5.214:8080,10.160.2.79:8080"} Checking the flows corresponding to one of the VIPs: table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 172.30.46.142 && tcp && reg9[16..31] == 443 && ct_mark.natted == 1), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 172.30.46.142 && tcp && reg9[16..31] == 443), action=(flags.force_snat_for_lb = 1; ct_lb_mark(backends=10.153.5.214:8443,10.160.2.79:8443);) We notice that these are almost generic, the only per-datapath component being the "flags.force_snat_for_lb = 1". It actually turns out that eventually these flows get "merged" for all gateway routers and applied on a single datapath_group: $ ovn-sbctl list logical_flow 24b39d1b _uuid : 24b39d1b-b069-4412-9474-a7dd0e453f0b actions : "flags.force_snat_for_lb = 1; next;" controller_meter : [] external_ids : {source="northd.c:10066", stage-hint="3de1afff", stage-name=lr_in_dnat} logical_datapath : [] logical_dp_group : 285bc43c-85f0-4b5c-803e-172e31508efd match : "ct.est && ip4 && reg0 == 172.30.46.142 && tcp && reg9[16..31] == 443 && ct_mark.natted == 1" pipeline : ingress priority : 120 table_id : 6 tags : {} hash : 0 $ ovn-sbctl list logical_dp_group 285bc43c-85f0-4b5c-803e-172e31508efd _uuid : 285bc43c-85f0-4b5c-803e-172e31508efd datapaths : [00723364-3e2e-46c1-9a40-1f47518807dd, 0111c64e-1d61-4d1d-a06b-56a1e8b4a75d, 016e88f0-fbf7-4bbf-95f7-6aaf17f57db6, 039ee72e-81d3-455f-bacb-abfba4d05b52, 0484aa49-b094-45d0-aa3f-892ed43bae15, 06e5650e-a0f1-48ff-a322-149061889f6a, 091e838d-2b17-4494-af51-e2f8fbb64ace, 09a781b9-3bd5-43f2-809a-39a23449a17b, 0cfe261e-4daa-46c9-bcb8-7ba16d0f94ae, 0eb81599-ce87-4e94-80de-e3ed4126b9e5, 0fa05a39-2128-46ce-ae32-439caea2bd83, 1080922b-4bd5-4880-8916-0dbacc970d6f, 154501bc-c45c-4641-9aa7-87565b8d6f13, 17602971-6f43-4746-879f-e4358e9809aa, 177a3e92-43a7-4ae1-ade3-ac254d20d590, 19a40adf-df52-4935-b260-aade13e92279, 1b018cc6-9f52-46b1-a725-fb55657d6b11, 1b8f62b6-b992-434f-8567-d64317c0a5e3, 1bd4835e-15c2-4b40-a071-8aae676c2c08, 1c95248b-2743-402a-8ab1-cf45264e7f26, 1cc46e8f-1b8b-45f6-8a50-e9b4cb46b5b2, 1df171b8-9bfc-4f38-a29a-f826c4c3c73e, 24224513-dc87-42f0-a217-d6989fd56cef, 28c135bc-5060-4690-8283-958b3ea57177, 2aaabbf4-407f-41c7-adfb-3dde6fa03e53, 2baa6583-9ab1-4895-a826-1065e5db4771, 2fadb9ed-d176-4c13-a9b6-c254ff21e251, 306c9fcd-8eb6-49de-9c11-07ed54e3756b, 32047b58-ec28-441d-b4c2-d9dde4090637, 3394d089-4040-47d0-8266-5153be572342, 33c12226-7eb9-44be-9dcc-1f6c76c7e21d, 3698e96d-8dc3-4546-b419-eb74ff971b10, 3e5aef45-b22a-49e5-af78-6191216f7c5b, 3fc799ed-0c0e-47b8-81ed-aea8cb12112f, 4031722a-ae25-4403-94bd-d7f814568280, 406d2464-c346-4ff9-96ce-b816a033fdf3, 408769b0-aa7f-4510-a93c-e9a9919763ac, 41995cc9-009d-46dc-97cc-0c74edaaa0f4, 4523648a-c188-4a90-a7aa-79ae3ad6dba7, 45a41e03-e8de-460c-9dde-1c830347b8cd, 466a0d6e-efbe-41b2-82d2-858b4ea9901a, 47f54b54-6586-4e8b-8c96-d3c51d14bb6a, 4944a146-657e-4bda-8579-63ed5b7a9ac2, 4b76831b-bd35-4c61-9fed-053859bd98f5, 529893a6-f3db-4638-bcec-21efd84502db, 52a2201b-f50d-4ac6-a772-704d9b3cefaf, 5b6b79cf-660e-4790-ad25-24da82547c67, 5c6fa68e-4fd3-4fc9-829d-b0590f5038d2, 5cafa9fb-ef88-4f12-a262-65c3e8aefb5e, 5cc09a20-b115-47fe-973c-94c3557607fc, 5fa47239-a223-415b-b1a4-6c82dfe8e77d, 5fe138a8-9260-4ae3-80fa-0b4cc84e3eac, 6062fec6-2fea-4ba1-8b02-4257f57fbe13, 61efa618-e581-407e-8de0-4dc938906d2e, 63385570-0db4-4e83-947a-5f082821595f, 65c35109-79e6-4de9-9503-dd67e01e742e, 6760e784-41ee-4196-a02a-6514c50d4a69, 69347667-53a5-49ea-987f-c07cd1cb4f72, 6adfc7d6-badc-4201-9d37-a91ee321bcd1, 6bddaacf-6415-4112-b728-5dba8b8df8f2, 6c64bad5-d7f9-48f8-bff8-fb1cab6e3e2a, 6ffe6d4c-b6b2-4abd-90ea-65b5b6e59d19, 709ea29d-3205-4c96-bc9f-c60c7f97a061, 72437a2a-2c31-4777-a48b-e0c68cd8b149, 7314e15b-4fc0-4df3-8e23-45a104a5c50b, 73299c27-cf69-4323-8bd4-4abf78d77ad4, 7648f21d-f516-4ecf-9f83-032e2370ac38, 7832c190-7170-406f-bbe6-136d6922ec2e, 794cbe74-3d49-4465-9f06-194db10bdb1b, 7c0133eb-49b2-4b0e-8d3a-cb3ba1b6742d, 7d73aa68-f42d-4d08-967d-97f74473aae5, 7e32e54b-bb90-4014-bd03-c190ed3bdf42, 7e92d834-9136-4bd5-9ab3-67e2b9b6141f, 7f0b1aa9-f1de-4888-a184-85c3cff1cacf, 7f2469f1-618f-4b20-ae36-8cbf30b02cd9, 80445208-7d01-4e5e-9acb-a11b44bc3ffc, 805e0655-807b-4421-8a75-b762d453c600, 84d5f2f9-702d-477f-912a-38ad3d988efb, 85a588b4-ffff-4fdf-8854-b35d7fd20e52, 8715984e-a603-459f-a390-2d5480a19d88, 87fc6c6a-a95c-4bd0-b5fd-633458640a08, 889ac01b-6ae7-4663-84a2-673eaff5edaf, 88f45b38-433b-44b9-bbfe-694251f39bfa, 89150e36-08b5-4870-80a9-93a9737980f5, 904e98bf-870c-4266-91c7-8acb3a37940b, 90bf59f1-307f-43ab-b876-c44e6e7aeff9, 91204e01-9915-4966-8a2c-343888220209, 9581678f-6a44-498c-bd8c-054f2978d176, 988ee049-cc00-48b7-9f20-65c8ba466457, 9d8dc2f3-ef50-4dab-b313-11745ac02f99, 9dcb933a-3abe-4071-828c-c2394bc3d0dd, 9e2b88af-95fd-42ab-97e5-2dbcf5f2f6e7, a3289f9b-c21d-4bcd-8064-efa772328c43, a528d4d1-5534-4c8b-b7fd-ccf60dfbbcf6, a860e919-fadd-440b-a433-0974a706878e, a8f959de-ef15-44a7-9eb6-1fbb194689ac, adf9a254-1073-4665-b942-418afca79c33, b59b8867-b856-43bc-b042-085506e5f79b, b885c73e-e678-419d-99a8-3ab11a8f7ad4, bb065da6-44ab-4d52-baba-2104c17ceabd, bb47566a-670d-460d-b095-a8932a865461, c2aa77c8-6f9f-4575-a4dc-c4860fbe48a6, c382b3c4-793d-42a7-92a5-5f8842964069, c4f453a6-918d-4098-b05b-49faa5ac398a, c76b58ea-6d90-45d9-9e1a-7e248b235039, cc9d8595-faa2-4664-9ac0-0e45de62a89a, cd359af0-bce9-4e41-89c5-d527db64b412, d2a8c164-8e05-493d-acd6-04f78a0e4ebf, d3a72fbe-05ee-4ac5-ba9e-8d27f9d92fd9, d9196167-f375-4a98-9925-2337ce78f87f, db0fb7c1-455c-4444-885b-cd7211c101dc, db1bd0dd-4a6f-4180-b70b-f098f4dcdfb0, dc58f2a0-e7b2-4a72-a1d7-3a956cd488e5, df0473ef-7bb5-4c03-8b0b-d2d51f92641b, df1f19eb-fa07-4086-9cbc-cb136c0df953, e6b5aeef-6fb8-46c8-b5b0-bc54d07a76db, eb660488-4fa0-4a6f-a847-4d21f20b8b76, ef8c1276-96d5-4ad9-a0de-e65da539af42, f24d89bf-86b2-4f5d-847b-1df0619f351b, f2759e84-7c45-49e4-a8ea-17a7064e871d, f7c08540-9d72-4775-8c29-0a9d33baa58b, fa3c4a96-ef23-461a-84ea-4ca19c706edd, fa55205a-0549-4de6-829c-dad29dd5e4e6, fa6abd41-c99c-4f4f-849f-4eb36000535d, fb0aa3d1-8a45-4275-a879-d912fafa18d7, fb1755c6-1235-4595-98ce-be7bbe3c0ec4, fe8781af-fbe1-4c4c-aa50-1f58abdd0afe] The problem is that the code in ovn-northd that generates these flows doesn't take care of the fact that multiple routers may share the same load balancer and force_snat configuration. This makes it inefficient. Measuring how long it takes to build these flows on a test machine we see that: (1) the time to prepend inside the per-logical-router loop the "flags.force_snat_for_lb = 1" to 'action' (which is precomputed for all routers) adds up to ~300ms out of the total of ~4000ms to compute logical flows. https://github.com/ovn-org/ovn/blob/34f29acdfe216899bbdb51a74859af62b0c75d6c/northd/northd.c#L9983 (2) for routers that don't have "distributed gateway ports" (gateway routers), the 'new_match_p' and 'est_match_p' strings are actually the ones that were precomputed outside the loop. https://github.com/ovn-org/ovn/blob/34f29acdfe216899bbdb51a74859af62b0c75d6c/northd/northd.c#L9959 In such cases, calling ovn_lflow_add_with_hint() and ovn_lflow_add_with_hint__() for every router on which the LB is applied is inefficient and adds up to ~1500ms out of the total of ~4000ms to compute logical flows. A potential solution that might duplicate some code but should behave in a better way is to first walk the list of routers on which a LB is applied and partition it into: a) gateway routers (od->n_l3dgw_ports == 0) with snat_type == SKIP_SNAT b) gateway routers (od->n_l3dgw_ports == 0) with snat_type == FORCE_SNAT c) gateway routers (od->n_l3dgw_ports == 0) with snat_type == NO_FORCE_SNAT d) non-gateway routers. We can then write dedicated functions to generate the LB-VIP related logical flows for all 4 sub-cases above. For cases a)-c) because the match and action can be precomputed for all applicable routers, we could just use ovn_lflow_add_at_with_hash() combined with ovn_dp_group_add_with_reference() like we currently do for load balancers in the logical switch pipeline.
upstream series: https://patchwork.ozlabs.org/project/ovn/list/?series=298225