Bug 1582317
Summary: | subscription-manager takes too long to gather facts if host has hundreds of NICs | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Francisco Garcia <fgarciad> |
Component: | subscription-manager | Assignee: | Jiri Hnidek <jhnidek> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat subscription-manager QE Team <rhsm-qe> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 7.5 | CC: | csnyder, fgarciad, jhnidek, jsefler, khowell, lbalhar, molasaga, redakkan, skallesh, sshtein |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-06 12:57:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Francisco Garcia
2018-05-24 20:50:30 UTC
I did investigation of this issue on Fedora29 and the latest Centos7. I can got following results * 250 number of dummy NICs ... 2 seconds * 500 number of dummy NICs ... 13 seconds * 750 number of dummy NICs ... 30 seconds Following code is very quick for 750 NICs (result withing 1 second): import ethtool info = ethtool.get_interfaces_info(ethtool.get_devices()) Following code takes about 5 seconds for 750 dummy NICs: import ethtool ipv4_addrs = [] interfaces_info = ethtool.get_interfaces_info(ethtool.get_devices()) for info in interfaces_info: ipv4_addrs.append(info.get_ipv4_addresses()) Adding following code increases total time to 10 seconds. info.get_ipv6_addresses() info.mac_address When we try to gather similar information using: "ip addr sho type dummy", then we get similar set information within one second. We spent about 20 seconds (when 750 NICs is used) somewhere else. It requires further investigation. I also tried different Python module called 'netifaces' for gathering information about NICs. Here is testing code: import netifaces addrs = [] interfaces = netifaces.interfaces() for interface in interfaces: addrs.append(netifaces.ifaddresses(interface)) Here is result for 750 dummy NICs: [root@localhost ~]# time python ./test_netifaces.py real 0m5.924s user 0m1.292s sys 0m4.632s Compared to ip command for 750 NICs: [root@localhost ~]# time ip addr show type dummy > /dev/null real 0m0.024s user 0m0.011s sys 0m0.013s I am investigating it and here are my results. I am using VM with Fedora 29, the latest python-ethtool and Python 3.7.2. I have 750 dummy NICs with one IPv4 and one IPv6 address each.
It seems that there is some cache for the whole interface because the function which is called first takes 5 times longer than the later one.
For example:
> import ethtool
>
> ipv4_addrs = []
> ipv6_addrs = []
> mac_addrs = []
>
> interfaces_info = ethtool.get_interfaces_info(ethtool.get_devices())
>
> for info in interfaces_info:
> ipv4_addrs.append(info.get_ipv4_addresses())
> ipv6_addrs.append(info.get_ipv6_addresses())
> mac_addrs.append(info.mac_address)
takes ~ 31 seconds with this cProfile output:
$ python3 -m cProfile -s tottime script3.py
4056 function calls in 31.623 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
752 15.673 0.021 15.673 0.021 {method 'get_ipv4_addresses' of 'ethtool.etherinfo' objects}
1 12.357 12.357 31.623 31.623 script3.py:1(<module>)
752 3.562 0.005 3.562 0.005 {method 'get_ipv6_addresses' of 'ethtool.etherinfo' objects}
When I switch the get_ipv4_addresses() call and the get_ipv6_addresses() call, the overall time is the same but the cProfile output is
$ python3 -m cProfile -s tottime script4.py
4056 function calls in 33.377 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
752 16.527 0.022 16.527 0.022 {method 'get_ipv6_addresses' of 'ethtool.etherinfo' objects}
1 13.169 13.169 33.377 33.377 script4.py:1(<module>)
752 3.660 0.005 3.660 0.005 {method 'get_ipv4_addresses' of 'ethtool.etherinfo' objects}
As Jan pointed out on IRC, it seems that all the tools using netlink C interface has the same performance problem with that many NICs.
I am not sure how I can help more.
Can we filter the interfaces that we gather? IMHO it should do the trick. get_devices() is quite quick, so if we filter get_devices output before we pass it to get_interfaces_info, we should be able to save a lot of time. What do you think? Will it be acceptable? You can also use get_active_devices(). But it depends on what you need. Do you need all the NICs or just the default one for Ipv4 and IPv6 traffic? In Satellite we have a configuration property that defines which interfaces would be ignored when facts are uploaded into Satellite. It would be really nice if such a setting would be exposed somehow by the subscription manager. Moving to MODIFIED as the fix has been merged upstream and will be included when we rebase from upstream for RHEL 7.7. Shimon, If that is something that you would like, please file a separate RFE for it. We will consider it along with other work if that is done but it will not be included as a part of the fix for this bug. Reproducing the issue with old build: ==================================== # subscription-manager version server type: This system is currently not registered. subscription management server: 2.3.17-1 subscription management rules: 5.35 subscription-manager: 1.20.10-1.el7 # for i in $(seq 1 4000); do ip link add name foo-veth-$i type dummy ; done # time ip link show 3840: foo-veth-3836: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether c6:47:00:e6:9e:aa brd ff:ff:ff:ff:ff:ff 3584: foo-veth-3580: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:b2:99:a5:7b:1c brd ff:ff:ff:ff:ff:ff <snip> [root@intel-piketon-01 ~]# time subscription-manager facts cpu.core(s)_per_socket: 2 cpu.cpu(s): 4 cpu.cpu_socket(s): 1 cpu.thread(s)_per_core: 2 cpu.topology_source: kernel /sys cpu sibling lists distribution.id: Maipo distribution.name: Red Hat Enterprise Linux Server distribution.version: 7.5 distribution.version.modifier: beta dmi.baseboard.manufacturer: Intel dmi.baseboard.product_name: To be filled by O.E.M. dmi.baseboard.serial_number: To be filled by O.E.M. dmi.baseboard.version: To be filled by O.E.M. <snip> system.default_locale: en_IN.UTF-8 uname.machine: x86_64 uname.nodename: intel-piketon-01.khw1.lab.eng.bos.redhat.com uname.release: 3.10.0-843.el7.x86_64 uname.sysname: Linux uname.version: #1 SMP Wed Jan 31 12:09:22 UTC 2018 virt.host_type: Not Applicable virt.is_guest: False real 10m35.751s --> Notice it takes 10min to get the facts list user 1m27.319s sys 9m7.906s # time subscription-manager register Registering to: subscription.rhsm.redhat.com:443/subscription Username: QualityAssurance Password: The system has been registered with ID: 58bb738d-cedd-4d80-9fb6-5bbba07559b8 The registered system name is: intel-piketon-01.khw1.lab.eng.bos.redhat.com real 21m26.672s --> 21 mins to register with system user 2m51.058s sys 17m58.110s Demonstrating the improved results after the fix: ================================================= # rpm -qa subscription-manager --changelog | grep 1582317 - 1582317: Do not collect hardware facts twice; ENT-653 (jhnidek) # subscription-manager version server type: Red Hat Subscription Management subscription management server: 2.3.17-1 subscription management rules: 5.35 subscription-manager: 1.24.13-1.el7 # for i in $(seq 1 4000); do ip link add name foo-veth-$i type dummy ; done # time ip link show 3840: foo-veth-3837: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 2e:4d:4c:28:03:10 brd ff:ff:ff:ff:ff:ff 3584: foo-veth-3581: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether a6:35:06:b5:ad:32 brd ff:ff:ff:ff:ff:ff <snip> real 0m0.163s user 0m0.016s sys 0m0.076s # time subscription-manager facts cpu.core(s)_per_socket: 1 cpu.cpu(s): 2 cpu.cpu_socket(s): 2 <snip> real 3m15.214s ---> with the fix, its now takes 3 mins to get the facts list user 0m32.209s sys 2m35.002s # time subscription-manager register Registering to: subscription.rhsm.redhat.com:443/subscription Username: QualityAssurance Password: The system has been registered with ID: 2dd4137e-cf0d-4ccb-ba66-9b6875dba9c9 The registered system name is: kvm-01-guest17.lab.eng.rdu2.redhat.com real 6m54.865s --> and registration takes 6mins user 1m5.246s sys 4m47.287s As demonstrated above , the subscription manager response have improved with the fix. Moving the bug to verified based on the observations. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2008 |