2115479 – ovnkube direct-lists pods on a node when the node object changes

Bug 2115479 - ovnkube direct-lists pods on a node when the node object changes

Summary: ovnkube direct-lists pods on a node when the node object changes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Dan Williams
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:	perfscale-ovn
Depends On:
Blocks:	2108679 2115481
TreeView+	depends on / blocked

Reported:	2022-08-04 18:52 UTC by Dan Williams
Modified:	2023-01-17 19:54 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2115481 (view as bug list)
Environment:
Last Closed:	2023-01-17 19:54:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 1214	0	None	Merged	Bug 2111534: Downstream Merge: 27-07-2022	2022-08-04 18:53:26 UTC
Red Hat Product Errata	RHSA-2022:7399	0	None	None	None	2023-01-17 19:54:35 UTC

Description Dan Williams 2022-08-04 18:52:43 UTC

2022-07-07T17:42:48.798533966Z I0707 17:42:48.798447      15 trace.go:205] Trace[17745800]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:6c114e16-15ac-4920-b8ed-a01cbd710909,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.762) (total time: 12035ms):
2022-07-07T17:42:48.889007528Z I0707 17:42:48.888869      15 trace.go:205] Trace[859805328]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:49ad7f2b-9c09-4d0b-9068-4e6fa606a4f8,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.761) (total time: 12127ms):
2022-07-07T17:42:49.272283449Z I0707 17:42:49.272192      15 trace.go:205] Trace[1792712204]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:e0269f80-38be-4c03-8317-6b63fd798ceb,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.723) (total time: 12548ms):
2022-07-07T17:42:49.596468418Z I0707 17:42:49.596368      15 trace.go:205] Trace[1248373466]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:9fb86c6a-4879-42de-bcce-c987b1c77931,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.763) (total time: 12832ms):
2022-07-07T17:42:49.771518271Z I0707 17:42:49.756590      15 trace.go:205] Trace[235168718]: "List" url:/api/v1/pods,user-agent:ip-10-0-149-185/ovnkube@bd4f2094aeb5 (linux/amd64) kubernetes/,audit-id:ea512964-5969-4612-b013-e9c63a4e72e0,client:10.0.196.114,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (07-Jul-2022 17:42:36.751) (total time: 13005ms):


ovnkube master would list pods directly from the API (not from a shared informer cache) whenever a node add/update event happened, and worse wasn't listing from the apiserver's cache either (by setting ResourceVersion:"0" in the ListOptions).

This places a bunch of load on the apiserver/etcd that's unecessary.

Comment 1 Mike Fiedler 2022-08-17 21:30:31 UTC

Verified on 4.12.0-0.nightly-2022-08-15-150248

- Comparison of 4.11.rc1 vs 4.12.0-0.nightly-2022-08-15-150248
- cluster-density workload with 1500 iterations on 120 nodes on AWS

4.11.rc1 - api-server cpu regular spikes to 10 cores and max 19GB rss memory
         - etcd cpu regular spikes to 1.5 cores and max 3GB rss memory

4.12.nightly - api-server cpu regular spikes to 2 cores and max 12GB memory
             - etcd cpu regular spikes to .7 cores and max 1.4GB rss memory

cluster-density workload successful on both version, significant reduction in cpu/memory on 4.12 with this fix.

Comment 4 errata-xmlrpc 2023-01-17 19:54:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Note You need to log in before you can comment on or make changes to this bug.