Bug 1280435 - 'ip addr flush' much slower than in RHEL6?
'ip addr flush' much slower than in RHEL6?
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: iproute (Show other bugs)
All Linux
medium Severity low
: rc
: ---
Assigned To: Jakub Sitnicki
BaseOS QE Security Team
Depends On:
  Show dependency treegraph
Reported: 2015-11-11 12:51 EST by Phil Sutter
Modified: 2018-04-19 08:26 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
ip_addr_flush_reproducer.sh (558 bytes, application/x-shellscript)
2016-01-18 07:58 EST, Phil Sutter
no flags Details
perf record for promote_secondaries=1 (10.39 MB, application/octet-stream)
2016-04-21 07:17 EDT, Phil Sutter
no flags Details
perf record for promote_secondaries=0 (2.45 MB, application/octet-stream)
2016-04-21 07:18 EDT, Phil Sutter
no flags Details

  None (edit)
Description Phil Sutter 2015-11-11 12:51:02 EST
This might be a regression: During testing, flushing many (~40k) addresses from an interface took very long in RHEL7 compared to the same on RHEL6.
Comment 3 Phil Sutter 2016-01-18 07:58 EST
Created attachment 1115847 [details]
Comment 7 Jaroslav Aster 2016-04-14 11:11:35 EDT
Hi Phil,

I can confirm it. Tested on all rhel-6 architectures and x86_64 rhel-7. There is a huge performance regression between rhel-6 and rhel-7.
Comment 10 Phil Sutter 2016-04-15 06:51:24 EDT
I ran a few more tests:

RHEL7, upstream kernel, RHEL iproute: 0m53.384s
RHEL7, upstream kernel, upstream iproute: 1m2.577s
RHEL6, RHEL6 kernel, RHEL iproute: 0m8.199s
RHEL6, RHEL6 kernel, upstream iproute: 0m7.594s

So this very much looks like a kernel issue, at least iproute version seems unrelated.

As the output in comment 4 shows, iproute flushes many more addresses on RHEL7 than on RHEL6. This reminded me of the 'promote_secondaries' sysctl setting, which is indeed disabled in RHEL6 and enabled in RHEL7. Running the test again in RHEL7 with promote_secondaries disabled helps in run time, but shows a new error:

flushing all ip addresses
Failed to send flush request: No buffer space available
Flush terminated

real	0m31.644s
user	0m0.000s
sys	0m31.305s

This needs further investigation, as well as a possible way to disable promote_secondaries temporarily while flushing the interface as it hinders operation.
Comment 11 Phil Sutter 2016-04-21 07:15:33 EDT
Regarding the error message printed with promote_secondaries=0:

recv() in rtnl_send_check() sets errno to ENOBUFS. This call is just an early check for errors, so it comes from the kernel. Another prove to this is that with upstream kernel the error message does not show.

Despite what one might think, the flush completes in both cases, so this is again rather a cosmetic issue.

As for the performance issue, I have created perf records for promote_secondaries on/off on RHEL kernel, preliminary analysis did not yield a result yet, though. Address deletion is obviously quite complicated due to the necessary management of routing table adjustments.
Comment 12 Phil Sutter 2016-04-21 07:17 EDT
Created attachment 1149429 [details]
perf record for promote_secondaries=1
Comment 13 Phil Sutter 2016-04-21 07:18 EDT
Created attachment 1149430 [details]
perf record for promote_secondaries=0
Comment 14 Phil Sutter 2016-08-04 08:44:04 EDT
Removing devel_ack+ since it is still unclear where the performance regression comes from and whether it can be fixed or simply has to be accepted as a side effect of increased complexity in RHEL7 kernel over RHEL6.
Comment 17 Jakub Sitnicki 2018-03-23 12:12:24 EDT
Thanks to Phil for providing a reproducer.

Gave it a run on RHEL7 and FC27 VMs. With 40k addresses the flush took:

* 1m2.566s on RHEL7, 3.10.0-861.el7, iproute-4.11.0-14.el7
* 0m15.713s on FC27, 4.15.7-300.fc27, iproute-4.15.0-1.fc27

So latest(ish) stable kernel is not as fast as RHEL6 (basing on Phil's report here, I haven't tested it yet), but much better than RHEL7.

Also, on Fedora the flush happens in just 2 rounds, just as on RHEL6, and not all the addresses are accounted for.

Will look what has changed upstream that could have brought back the performance.

Note You need to log in before you can comment on or make changes to this bug.