2118848 – Backport: [ovs-dev] netdev-linux: skip some internal kernel stats gathering

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2118848 - Backport: [ovs-dev] netdev-linux: skip some internal kernel stats gathering

Summary: Backport: [ovs-dev] netdev-linux: skip some internal kernel stats gathering

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	openvswitch2.16
Sub Component:
Version:	RHEL 8.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Aaron Conole
QA Contact:	Hekai Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-16 23:11 UTC by Jonathan Maxwell
Modified:	2023-01-20 10:52 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openvswitch2.16-2.16.0-103.el8fdp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-03 00:30:52 UTC
Target Upstream Version:
Embargoed:
Flags:	hewang: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2087604	high	CLOSED	UDP Packet loss in OpenShift using IPv6	2022-11-23 00:29:48 UTC
Red Hat Issue Tracker	FD-2212	None	None	None	2022-08-16 23:19:20 UTC
Red Hat Product Errata	RHBA-2022:7390	None	None	None	2022-11-03 00:30:55 UTC

Internal Links: 2087604

Comment 10 hnhan 2022-09-29 12:38:41 UTC

TL;DR: @hewang and @jmaxwell - My testbed can reproduce the drops. 

I do not know enough about the customer's testbed to replicate the env exactly, but on my testbed, with same iperf3 params (iperf3 -6 -u -b200m -ll256), I see packet drops. I applied various tunings incrementally. The test with a 30-pods deployment churns shows packet drops regardless of the tuning. 

With OCP version 4.10.18: (https://docs.google.com/spreadsheets/d/1v9-pc2cu25DXbbaHAytplWs7lLM68OTz6V4LdCMEyI4/edit#gid=1826112686)
  Intra-node: DROP
  Inter-node: DROP

With OCP 4.11.2: (https://docs.google.com/spreadsheets/d/1v9-pc2cu25DXbbaHAytplWs7lLM68OTz6V4LdCMEyI4/edit#gid=2064496500)
  Intra-node: NO drop  <====================== Only scenario with NO drop
  Inter-node: DROP

Tunings:
  - disable prometheus
  - increase rx NIC ring size
  - apply sysctl socket mem and backlog params
  - Change ovs-vswitchd to sched_rt
  - Other scheduling, renice tweaks not documented in the above google sheets.

Comment 11 Flavio Leitner 2022-10-11 17:19:39 UTC

Patch backported:
https://gitlab.cee.redhat.com/nst/openvswitch/openvswitch2.16/-/commit/ce553c99e2f9b8b3784ac66a759c363528c67c8b

Comment 12 OvS team 2022-10-11 19:47:56 UTC

* Tue Oct 11 2022 Aaron Conole <aconole> - 2.16.0-103
- netdev-linux: Skip some internal kernel stats gathering. [RH git: ce553c99e2] (#2118848)
    For netdev_linux_update_via_netlink(), hint to the kernel that
    we do not need it to gather netlink internal stats when we want
    to update the netlink flags, as those stats are not rendered
    within OVS.
    
    Background:
    ovs-vswitchd can spend quite a bit of time blocked by the kernel
    during netlink calls, especially systems with many cores. This
    time is dominated by the kernel-side internal stats gathering
    mechanism in netlink, specifically:
      inet6_fill_link_af
        inet6_fill_ifla6_attrs
          __snmp6_fill_stats64
    
    In Linux 4.4+, there exists a hint for netlink requests to not
    trigger the ipv6 stats gathering mechanism, which greatly reduces
    the amount of time that ovs-vswitchd is on CPU.
    
    Testing and Results:
    Tested booting 320 VM's and measuring OVS utilization with perf
    record, then visualized into a flamegraph using a patched version
    of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst
    by this issue.
    
    Before bridge_run() == 11.3% of samples
    After bridge_run() == 3.4% of samples
    
    Note that there are at least two observed netlink calls under
    bridge_run that are still kernel stats heavy after this patch:
    
    Call 1:
      bridge_run -> netdev_run -> route_table_run -> route_table_reset ->
        ovs_router_insert -> ovs_router_insert__ -> get_src_addr ->
          netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs
    
    Since the actual netlink call is coming from getifaddrs() in glibc,
    fixing would likely involve either duplicating glibc code in ovs
    source or patch glibc.
    
    Call 2:
      bridge_run -> iface_refresh_stats -> netdev_get_stats ->
        netdev_linux_get_stats -> get_stats_via_netlink
    
    This does use netlink based stats; however, it isn't immediately
    clear if just dropping the stats from inet6_fill_link_af would
    impact anything or not. Given this call is more intermittent, its
    of lesser concern.
    
    Acked-by: Greg Smith <gasmith>
    Signed-off-by: Jon Kohler <jon>
    Signed-off-by: Ilya Maximets <i.maximets>
    
    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2118848

Comment 18 errata-xmlrpc 2022-11-03 00:30:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.16 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7390

Note You need to log in before you can comment on or make changes to this bug.