2022-07-07T17:42:48.798533966Z I0707 17:42:48.798447 15 trace.go:205] Trace[17745800]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:6c114e16-15ac-4920-b8ed-a01cbd710909,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.762) (total time: 12035ms): 2022-07-07T17:42:48.889007528Z I0707 17:42:48.888869 15 trace.go:205] Trace[859805328]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:49ad7f2b-9c09-4d0b-9068-4e6fa606a4f8,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.761) (total time: 12127ms): 2022-07-07T17:42:49.272283449Z I0707 17:42:49.272192 15 trace.go:205] Trace[1792712204]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:e0269f80-38be-4c03-8317-6b63fd798ceb,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.723) (total time: 12548ms): 2022-07-07T17:42:49.596468418Z I0707 17:42:49.596368 15 trace.go:205] Trace[1248373466]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:9fb86c6a-4879-42de-bcce-c987b1c77931,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.763) (total time: 12832ms): 2022-07-07T17:42:49.771518271Z I0707 17:42:49.756590 15 trace.go:205] Trace[235168718]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:ea512964-5969-4612-b013-e9c63a4e72e0,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.751) (total time: 13005ms): ovnkube master would list pods directly from the API (not from a shared informer cache) whenever a node add/update event happened, and worse wasn't listing from the apiserver's cache either (by setting ResourceVersion:"0" in the ListOptions). This places a bunch of load on the apiserver/etcd that's unecessary.
Verified on 4.12.0-0.nightly-2022-08-15-150248 - Comparison of 4.11.rc1 vs 4.12.0-0.nightly-2022-08-15-150248 - cluster-density workload with 1500 iterations on 120 nodes on AWS 4.11.rc1 - api-server cpu regular spikes to 10 cores and max 19GB rss memory - etcd cpu regular spikes to 1.5 cores and max 3GB rss memory 4.12.nightly - api-server cpu regular spikes to 2 cores and max 12GB memory - etcd cpu regular spikes to .7 cores and max 1.4GB rss memory cluster-density workload successful on both version, significant reduction in cpu/memory on 4.12 with this fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399