Bug 1944220

Summary: [OVN-SCALE] ovn-controller: pinctrl_run takes 30% of CPU time without pinctrl config related changes
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.13Assignee: lorenzo bianconi <lorenzo.bianconi>
Status: CLOSED CURRENTRELEASE QA Contact: Ehsan Elahi <eelahi>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 20.HCC: ctrautma, jiji, jishi, lorenzo.bianconi, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-13 07:21:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OVN NB database. none

Description Dumitru Ceara 2021-03-29 14:38:26 UTC
Created attachment 1767379 [details]
OVN NB database.

Description of problem:

With the attached OVN NB database extracted from a scale test run, and with the following interfaces bound to a single node OVN deployment:

lports=(lp_17.1.0.9 lp_17.1.0.10 lp_17.1.0.11 lp_17.1.0.12 lp_17.1.0.13 lp_17.1.0.14 lp_17.1.0.15 lp_17.1.0.16 lp_17.1.0.17 lp_17.1.0.18)
for lp in ${lports[@]}; do
    ovs-vsctl add-port br-int $lp \
        -- set interface $lp type=internal \
        -- set interface $lp external_ids:iface-id=$lp
done

To avoid SB/OVS disconnects also increase timeouts:
ovn-sbctl set connection . inactivity_probe=180000
ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000

Perf reports:
-   31.82%     3.53%  ovn-controller  ovn-controller  [.] pinctrl_run
   - 28.29% pinctrl_run
      + 17.33% smap_get_bool
      + 8.46% datapath_is_switch
        0.93% __strcasecmp_l_avx

This happens even though there's no real pinctrl config related change.

Based on initial analysis, the following functions seem to contribute to the CPU usage:
- prepare_ipv6_prefixd() iterates on all local datapath peer ports and checks if ipv6_prefix_delegation is enabled.  It's probably beneficial to maintain a list of port bindings with options.ipv6_prefix_delegation=true.
- prepare_ipv6_ras() iterates on all local datapath peer ports and checks if ipv6_ra_send_periodic is enabled.  It's probably beneficial to maintain a list of port bindings with options.ipv6_ra_send_periodic=true.