Created attachment 1702209 [details] ovn log files Description of problem: sb-db cluster node's memory consumption grows to 12G for 100 nodes , 9K Pods and around 1800 services. ``` ovnkube-master-hbvrv sbdb 40m 12393Mi ``` sb-db cluster was running with election-timer of 30 seconds. Cluster node's memory growth is increasing consistenty ovnkube-master-hbvrv sbdb 2m 832Mi Tue Jul 21 19:48:15 UTC 2020 Tue Jul 21 20:59:06 UTC 2020 ovnkube-master-hbvrv sbdb 46m 958Mi Tue Jul 21 21:02:30 UTC 2020 ovnkube-master-hbvrv sbdb 45m 1184Mi Tue Jul 21 21:05:54 UTC 2020 ovnkube-master-hbvrv sbdb 34m 1841Mi Tue Jul 21 21:23:02 UTC 2020 ovnkube-master-hbvrv sbdb 30m 4062Mi Tue Jul 21 21:43:51 UTC 2020 ovnkube-master-hbvrv sbdb 34m 5663Mi Tue Jul 21 22:01:24 UTC 2020 ovnkube-master-hbvrv sbdb 36m 7393Mi Tue Jul 21 22:19:09 UTC 2020 ovnkube-master-hbvrv sbdb 578m 9147Mi Tue Jul 21 22:33:31 UTC 2020 ovnkube-master-hbvrv sbdb 12m 9128Mi Tue Jul 21 22:40:44 UTC 2020 ovnkube-master-hbvrv sbdb 16m 9466Mi Tue Jul 21 22:47:57 UTC 2020 ovnkube-master-hbvrv sbdb 6m 9992Mi Tue Jul 21 22:55:10 UTC 2020 ovnkube-master-hbvrv sbdb 822m 10616Mi Tue Jul 21 22:58:46 UTC 2020 ovnkube-master-hbvrv sbdb 282m 9973Mi Tue Jul 21 23:02:27 UTC 2020 ovnkube-master-hbvrv sbdb 44m 10458Mi Tue Jul 21 23:20:29 UTC 2020 ovnkube-master-hbvrv sbdb 154m 9974Mi Tue Jul 21 23:24:14 UTC 2020 ovnkube-master-hbvrv sbdb 429m 10631Mi Tue Jul 21 23:27:49 UTC 2020 ovnkube-master-hbvrv sbdb 576m 9974Mi Tue Jul 21 23:31:24 UTC 2020 ovnkube-master-hbvrv sbdb 92m 10745Mi Tue Jul 21 23:34:58 UTC 2020 ovnkube-master-hbvrv sbdb 994m 9994Mi Tue Jul 21 23:52:52 UTC 2020 ovnkube-master-hbvrv sbdb 492m 11579Mi Tue Jul 21 23:56:25 UTC 2020 ovnkube-master-hbvrv sbdb 30m 9975Mi Tue Jul 21 23:59:58 UTC 2020 ovnkube-master-hbvrv sbdb 5m 9975Mi Wed Jul 22 00:03:42 UTC 2020 Wed Jul 22 00:07:44 UTC 2020 ovnkube-master-hbvrv sbdb 131m 10809Mi Wed Jul 22 00:11:19 UTC 2020 ovnkube-master-hbvrv sbdb 668m 11464Mi Wed Jul 22 00:14:56 UTC 2020 ovnkube-master-hbvrv sbdb 132m 10809Mi ovnkube-master-hbvrv sbdb 338m 12156Mi Wed Jul 22 00:46:52 UTC 2020 ovnkube-master-hbvrv sbdb 8m 11492Mi Wed Jul 22 00:50:24 UTC 2020 ovnkube-master-hbvrv sbdb 80m 10667Mi Wed Jul 22 01:15:06 UTC 2020 ovnkube-master-hbvrv sbdb 97m 10667Mi Wed Jul 22 01:18:38 UTC 2020 ovnkube-master-hbvrv sbdb 48m 10667Mi Wed Jul 22 01:22:10 UTC 2020 ovnkube-master-hbvrv sbdb 671m 11490Mi Wed Jul 22 01:25:42 UTC 2020 ovnkube-master-hbvrv sbdb 18m 11489Mi Wed Jul 22 01:29:14 UTC 2020 ovnkube-master-hbvrv sbdb 87m 10976Mi Wed Jul 22 02:18:36 UTC 2020 ovnkube-master-hbvrv sbdb 35m 11419Mi Wed Jul 22 04:39:43 UTC 2020 Wed Jul 22 04:43:45 UTC 2020 ovnkube-master-hbvrv sbdb 15m 11419Mi Wed Jul 22 04:47:16 UTC 2020 Wed Jul 22 04:51:18 UTC 2020 ovnkube-master-hbvrv sbdb 1715m 3319Mi Wed Jul 22 04:54:50 UTC 2020 ovnkube-master-hbvrv sbdb 991m 15412Mi Wed Jul 22 04:58:21 UTC 2020 ovnkube-master-hbvrv sbdb 0m 3999Mi Wed Jul 22 05:01:52 UTC 2020 ovnkube-master-hbvrv sbdb 0m 0Mi Wed Jul 22 05:05:24 UTC 2020 At one point of a time memory consumption reached to 15G. Seems like oom_killer kills the sb-db based on it's oom_score at one point and that currupts the db and two of the 3 nodes doesn't restart because of the currupt db. This memory bloating + the nbdb memory bloating on the master node, causes OOM for other components running on the node where master pods are running. That results in failure to provision the pods and CNI Apis are timing out. Must-gather collection fails to i collected logs from the specific pods. All the Logs (ovnkube-master logs, ovnkube-ndoe logs, memory growth logs for all ovn-kubernetes component, cluster status etc) are attached. Version-Release number of selected component (if applicable): ovn4.5.rc7 openvswitch2.13-2.13.0-29.el7fdp.x86_64 ovn2.13-2.13.0-31.el7fdp.x86_64 How reproducible: Always Steps to Reproduce: 1. Install cluster using 4.5.0-rc.7 2. Deploy a 100 nodes cluster 3. Run Cluster-density (Mastervertical) with at least 1000 namespaces Actual results: Pod networking configuration fails or sometime pod annotation request fails if the api server is killed because of the memory bloating. Expected results: Pod creation should not fail. Additional info: This bug is related to the following bugzilla that tracks the nb-db memory issue. https://bugzilla.redhat.com/show_bug.cgi?id=1855408
Anil, will this still be a problem after https://github.com/ovn-org/ovn-kubernetes/pull/1711 lands?
Over to Numan as he's working on a patch to reduce the number of flows for each reject ACL.
Hi Anil, Is it possible to attach the OVN north db file ?
Hi Numan, I observed this issue in parallel to the issue reported in bug https://bugzilla.redhat.com/show_bug.cgi?id=1855408 And the logs are uploaded here : https://drive.google.com/file/d/18dIf6qNP3IQvQOlVAOVjH6ppZuF-jKoV/view?usp=sharing
Submitted the patches for review - https://patchwork.ozlabs.org/project/ovn/list/?submitter=77669 which reduces the number of lflows in sb db. With the attached db in this bz and with OVN master, ovn-northd crashes on my laptop with 16gb memory. With OVN master + the above patches, ovn-northd didn't crash. These patches would certainly help in ovn-northd memory usage.
Without this patch, the number of logical flows with the attached db is 1869383 and with the patches it is - 667979.
Thanks Numan. Similarly on another scale setup I found tons of lflows per service hairpin. I saw 434000 of these: table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.0.83 && tcp && tcp.dst==6443)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; ip4.dst <-> ip4.src; outport <-> inport; next(pipeline=egress,table=5); };) table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.1.1 && tcp && tcp.dst==80)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; ip4.dst <-> ip4.src; outport <-> inport; next(pipeline=egress,table=5); } They look like hairpin flows to me. Can you confirm what they are? Could we cut those down? The action always seems to be the same, but the match criteria hits each service.
(In reply to Tim Rozet from comment #7) > Thanks Numan. Similarly on another scale setup I found tons of lflows per > service hairpin. I saw 434000 of these: > > table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est > || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.0.83 && > tcp && tcp.dst==6443)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; > ip4.dst <-> ip4.src; outport <-> inport; next(pipeline=egress,table=5); };) > table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est > || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.1.1 && tcp > && tcp.dst==80)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; ip4.dst <-> > ip4.src; outport <-> inport; next(pipeline=egress,table=5); } > > They look like hairpin flows to me. Can you confirm what they are? Could we > cut those down? The action always seems to be the same, but the match > criteria hits each service. Yes. These are hairpin flows. I'm also looking into cut down these flows if possible.
(In reply to Numan Siddique from comment #8) > (In reply to Tim Rozet from comment #7) > > Thanks Numan. Similarly on another scale setup I found tons of lflows per > > service hairpin. I saw 434000 of these: > > > > table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est > > || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.0.83 && > > tcp && tcp.dst==6443)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; > > ip4.dst <-> ip4.src; outport <-> inport; next(pipeline=egress,table=5); };) > > table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est > > || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==172.30.1.1 && tcp > > && tcp.dst==80)), action=(reg0 = 0; icmp4 { eth.dst <-> eth.src; ip4.dst <-> > > ip4.src; outport <-> inport; next(pipeline=egress,table=5); } > > > > They look like hairpin flows to me. Can you confirm what they are? Could we > > cut those down? The action always seems to be the same, but the match > > criteria hits each service. > > Yes. These are hairpin flows. I'm also looking into cut down these flows if > possible. Correction. The flows you listed here are not hairpin flows, but lflows added for reject ACL actions. But definitely there are many hairpin flows. In the attached db, there are around 250000 hairpin flows. I'm working in reducing these lflows.
The patches to address hairpin flows are submitted for review - https://bugzilla.redhat.com/show_bug.cgi?id=1859924 There is another BZ for the hairpin flows - https://bugzilla.redhat.com/show_bug.cgi?id=1833373
(In reply to Numan Siddique from comment #13) > The patches to address hairpin flows are submitted for review - > https://bugzilla.redhat.com/show_bug.cgi?id=1859924 > this one - https://patchwork.ozlabs.org/project/ovn/list/?series=209175 > There is another BZ for the hairpin flows - > https://bugzilla.redhat.com/show_bug.cgi?id=1833373
*** Bug 1885713 has been marked as a duplicate of this bug. ***
*** Bug 1884049 has been marked as a duplicate of this bug. ***
*** Bug 1891002 has been marked as a duplicate of this bug. ***
Adding testblocker flag as per email from Dustin Fri Dec 4th 4:16pm
Per comment 27 the issue is fixed in ovn2.13-20.12.0-1 and later.
Fix shipped in FDP 21.A in ovn2.13-20.12.0-1
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days