Red Hat Bugzilla – Bug 1280435
'ip addr flush' much slower than in RHEL6?
Last modified: 2018-04-19 08:26:11 EDT
This might be a regression: During testing, flushing many (~40k) addresses from an interface took very long in RHEL7 compared to the same on RHEL6.
Created attachment 1115847 [details]
I can confirm it. Tested on all rhel-6 architectures and x86_64 rhel-7. There is a huge performance regression between rhel-6 and rhel-7.
I ran a few more tests:
RHEL7, upstream kernel, RHEL iproute: 0m53.384s
RHEL7, upstream kernel, upstream iproute: 1m2.577s
RHEL6, RHEL6 kernel, RHEL iproute: 0m8.199s
RHEL6, RHEL6 kernel, upstream iproute: 0m7.594s
So this very much looks like a kernel issue, at least iproute version seems unrelated.
As the output in comment 4 shows, iproute flushes many more addresses on RHEL7 than on RHEL6. This reminded me of the 'promote_secondaries' sysctl setting, which is indeed disabled in RHEL6 and enabled in RHEL7. Running the test again in RHEL7 with promote_secondaries disabled helps in run time, but shows a new error:
flushing all ip addresses
Failed to send flush request: No buffer space available
This needs further investigation, as well as a possible way to disable promote_secondaries temporarily while flushing the interface as it hinders operation.
Regarding the error message printed with promote_secondaries=0:
recv() in rtnl_send_check() sets errno to ENOBUFS. This call is just an early check for errors, so it comes from the kernel. Another prove to this is that with upstream kernel the error message does not show.
Despite what one might think, the flush completes in both cases, so this is again rather a cosmetic issue.
As for the performance issue, I have created perf records for promote_secondaries on/off on RHEL kernel, preliminary analysis did not yield a result yet, though. Address deletion is obviously quite complicated due to the necessary management of routing table adjustments.
Created attachment 1149429 [details]
perf record for promote_secondaries=1
Created attachment 1149430 [details]
perf record for promote_secondaries=0
Removing devel_ack+ since it is still unclear where the performance regression comes from and whether it can be fixed or simply has to be accepted as a side effect of increased complexity in RHEL7 kernel over RHEL6.
Thanks to Phil for providing a reproducer.
Gave it a run on RHEL7 and FC27 VMs. With 40k addresses the flush took:
* 1m2.566s on RHEL7, 3.10.0-861.el7, iproute-4.11.0-14.el7
* 0m15.713s on FC27, 4.15.7-300.fc27, iproute-4.15.0-1.fc27
So latest(ish) stable kernel is not as fast as RHEL6 (basing on Phil's report here, I haven't tested it yet), but much better than RHEL7.
Also, on Fedora the flush happens in just 2 rounds, just as on RHEL6, and not all the addresses are accounted for.
Will look what has changed upstream that could have brought back the performance.