Bug 1268382
| Summary: | virsh iface-list --all takes a very long time to complete on host with large number of interfaces | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Laine Stump <laine> | |
| Component: | netcf | Assignee: | Laine Stump <laine> | |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.2 | CC: | dyuan, jishao, jsuchane, laine, mzhan | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | netcf-0.2.8-2.el7 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1268384 (view as bug list) | Environment: | ||
| Last Closed: | 2016-11-04 07:52:02 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1288337 | |||
# rpm -q libvirt netcf
libvirt-1.2.17-13.el7.x86_64
netcf-0.2.8-1.el7.x86_64
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz
Stepping: 3
CPU MHz: 3563.375
BogoMIPS: 6385.71
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-3
# free -g
total used free shared buff/cache available
Mem: 7 3 1 0 1 2
Swap: 0 0 0
0. prepare xml file for define iface
# cat iface-X.xml
<interface type='ethernet' name='ethX'>
<start mode='none'/>
<protocol family='ipv4'>
<dhcp peerdns='yes'/>
</protocol>
<protocol family='ipv6'>
<autoconf/>
</protocol>
</interface>
# for i in {1..512};do sed -e s/X/$i/ iface-X.xml > iface-$i.xml;done
1. Define 512 iface, check cpu usage of libvirtd process
# time for i in {1..512};do virsh iface-define iface-$i.xml;done
Interface eth1 defined from iface-1.xml
...
Interface eth512 defined from iface-512.xml
real 0m50.221s
user 0m2.868s
sys 0m1.623s
During define interface, the usage of libvirtd will go up to about 80%.
# top -i
Tasks: 211 total, 2 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.7 us, 0.4 sy, 0.0 ni, 96.6 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7734908 total, 1646404 free, 4355384 used, 1733120 buff/cache
KiB Swap: 511996 total, 511996 free, 0 used. 2758688 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9248 root 20 0 1153524 224748 13756 S 81.2 2.9 21:52.09 libvirtd
2. Undefine the 512 iface, check cpu usage of libvirtd process
# time for i in {1..512};do virsh iface-undefine eth$i;done
....
Interface eth512 undefined
real 0m31.393s
user 0m2.828s
sys 0m1.551s
During undefine interface, the usage of libvirtd will go up to about 90%,
$ top
top - 16:33:04 up 6:16, 10 users, load average: 0.93, 0.59, 0.41
Tasks: 211 total, 1 running, 210 sleeping, 0 stopped, 0 zombie
%Cpu(s): 24.6 us, 1.8 sy, 0.0 ni, 73.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 64.3/7734908 [ ]
KiB Swap: 0.0/511996 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9248 root 20 0 1153524 220356 13756 S 93.0 2.8 23:05.67 /usr/sbin/libvirtd
3. Get iface list, check cpu usage of libvirtd process
# time virsh iface-list
Name State MAC Address
---------------------------------------------------
eth0 active a0:48:1c:86:67:9c
lo active 00:00:00:00:00:00
real 0m6.143s
user 0m0.006s
sys 0m0.003s
cpu usage of libvirtd process will go up to about 90%
$ top
top - 17:18:09 up 7:01, 9 users, load average: 0.54, 0.46, 0.31
Tasks: 212 total, 1 running, 211 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.0 us, 5.0 sy, 0.0 ni, 72.8 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7734908 total, 1987444 free, 4000940 used, 1746524 buff/cache
KiB Swap: 511996 total, 511996 free, 0 used. 3119156 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
# time virsh iface-list --all
Name State MAC Address
---------------------------------------------------
eno3 inactive
eth0 active a0:48:1c:86:67:9c
eth1 inactive
...
lo active 00:00:00:00:00:00
real 0m14.882s
user 0m0.040s
sys 0m0.032s
cpu usage of libvirtd process will go up to about 90%+
$ top
top - 17:18:55 up 7:01, 9 users, load average: 0.47, 0.44, 0.32
Tasks: 212 total, 1 running, 211 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.8 us, 4.3 sy, 0.0 ni, 72.6 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7734908 total, 1971972 free, 4012564 used, 1750372 buff/cache
KiB Swap: 511996 total, 511996 free, 0 used. 3103720 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9248 root 20 0 1153524 220780 13924 S 92.4 2.9 27:20.41 libvirtd
4. Get interface xml, check cpu usage of libvirtd process
# time for i in {1..512};do virsh iface-dumpxml eth$i;done
...
<interface type='ethernet' name='eth512'>
<start mode='none'/>
<protocol family='ipv4'>
<dhcp peerdns='yes'/>
</protocol>
<protocol family='ipv6'>
<autoconf/>
</protocol>
</interface>
real 1m25.222s
user 0m2.965s
sys 0m1.702s
cpu usage of libvirtd process will go up to more than 90%.
$ top
top - 17:29:26 up 7:12, 9 users, load average: 0.76, 0.43, 0.32
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 25.1 us, 1.7 sy, 0.0 ni, 73.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7734908 total, 1958280 free, 4021692 used, 1754936 buff/cache
KiB Swap: 511996 total, 511996 free, 0 used. 3090340 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9248 root 20 0 1153524 221568 13924 S 94.3 2.9 29:50.62 libvirtd
Hi laine,
I try to verify the issue and get result as below
As we can see that in the step2 virsh iface-list takes 1.128s > 1 second.
in step2 virsh iface-list --all just takes 0.712s which is much lower than 1.128s
Is it right for virsh iface-list ?
# rpm -q libvirt
libvirt-2.0.0-2.el7.x86_64
# rpm -q netcf
netcf-0.2.8-2.el7.x86_64
1.# time for i in {1..512};do virsh iface-define iface-$i.xml;done
real 0m52.020s
user 0m2.748s
sys 0m1.661s
2. # time virsh iface-list
Name State MAC Address
---------------------------------------------------
eth0 active 00:23:24:80:99:c6
lo active 00:00:00:00:00:00
real 0m1.128s
user 0m0.004s
sys 0m0.004s
3. # time virsh iface-list --all
....
real 0m0.712s
user 0m0.022s
sys 0m0.021s
4. # time for i in {1..512};do virsh iface-dumpxml eth$i;done
...
real 0m6.209s
user 0m2.719s
sys 0m1.639s
5. # time for i in {1..512};do virsh iface-undefine eth$i;done
real 0m26.912s
user 0m2.640s
sys 0m1.587s
(In reply to Jingjing Shao from comment #5) > As we can see that in the step2 virsh iface-list takes 1.128s > 1 second. > in step2 virsh iface-list --all just takes 0.712s which is much lower than > 1.128s > Is it right for virsh iface-list ? > > 2. # time virsh iface-list > Name State MAC Address > --------------------------------------------------- > eth0 active 00:23:24:80:99:c6 > lo active 00:00:00:00:00:00 > > real 0m1.128s > user 0m0.004s > sys 0m0.004s > > > 3. # time virsh iface-list --all > .... > real 0m0.712s > user 0m0.022s > sys 0m0.021s I guess you're asking if it's normal that "virsh iface-list" takes longer than "virsh iface-list --all", and the answer is "yes, that is expected". The reason is that when you run "virsh iface-list --all", a single call is made to netcf's ncf_list_interfaces(), which returns all interfaces on the host, but a call to plain "virsh iface-list" will first call ncf_list_interfaces(), and then spend extra time filtering that complete list to eliminate all interfaces that aren't active. The important thing to look for in determining if the bug has been fixed is the relative time for "virsh iface-list --all" on the host with the older netcf vs. the host with the newer netcf. (I did this just now on my RHEL7 system with 514 bridge+vlan combinations, and this time went from 1m14.334s down to 0m1.833s). (In reply to Laine Stump from comment #6) > (In reply to Jingjing Shao from comment #5) > > > As we can see that in the step2 virsh iface-list takes 1.128s > 1 second. > > in step2 virsh iface-list --all just takes 0.712s which is much lower than > > 1.128s > > Is it right for virsh iface-list ? > > > > > 2. # time virsh iface-list > > Name State MAC Address > > --------------------------------------------------- > > eth0 active 00:23:24:80:99:c6 > > lo active 00:00:00:00:00:00 > > > > real 0m1.128s > > user 0m0.004s > > sys 0m0.004s > > > > > > 3. # time virsh iface-list --all > > .... > > real 0m0.712s > > user 0m0.022s > > sys 0m0.021s > > I guess you're asking if it's normal that "virsh iface-list" takes longer > than "virsh iface-list --all", and the answer is "yes, that is expected". > The reason is that when you run "virsh iface-list --all", a single call is > made to netcf's ncf_list_interfaces(), which returns all interfaces on the > host, but a call to plain "virsh iface-list" will first call > ncf_list_interfaces(), and then spend extra time filtering that complete > list to eliminate all interfaces that aren't active. > > The important thing to look for in determining if the bug has been fixed is > the relative time for "virsh iface-list --all" on the host with the older > netcf vs. the host with the newer netcf. (I did this just now on my RHEL7 > system with 514 bridge+vlan combinations, and this time went from 1m14.334s > down to 0m1.833s). Thanks laine's explanation. Reproduce the bug with libvirt-1.2.17-13.el7.x86_64 and netcf-0.2.8-1.el7.x86_64 cat br-X.xml <interface type="bridge" name="brX"> <start mode="onboot"/> <protocol family="ipv4"> </protocol> <bridge stp="off"> <interface type="vlan" name="eno1.X"> <vlan tag="X"> <interface name="eno1"/> </vlan> </interface> </bridge> </interface> # for i in {1..514};do sed -e s/X/$i/ br-X.xml > br-$i.xml;done # time for i in {1..514};do virsh iface-define br-$i.xml;done Interface br1 defined from br-1.xml Interface br2 defined from br-2.xml Interface br3 defined from br-3.xml Interface br4 defined from br-4.xml Interface br5 defined from br-5.xml .... Interface br513 defined from br-513.xml Interface br514 defined from br-514.xml real 6m30.884s user 0m3.366s sys 0m1.987s # time for i in {1..514};do virsh iface-start br$i;done Interface br1 started Interface br2 started ..... Interface br510 started Interface br511 started Interface br514 started real 7m32.715s user 0m3.348s sys 0m1.971s # time virsh iface-list --all Name State MAC Address --------------------------------------------------- br1 active 44:37:e6:95:03:e4 br10 active 44:37:e6:95:03:e4 br100 active 44:37:e6:95:03:e4 br101 active 44:37:e6:95:03:e4 .... br512 active 44:37:e6:95:03:e4 br513 active 44:37:e6:95:03:e4 br514 active 44:37:e6:95:03:e4 real 4m2.931s user 0m0.047s sys 0m0.055s # time virsh iface-list Name State MAC Address --------------------------------------------------- br1 active 44:37:e6:95:03:e4 br10 active 44:37:e6:95:03:e4 br100 active 44:37:e6:95:03:e4 br101 active 44:37:e6:95:03:e4 ... br511 active 44:37:e6:95:03:e4 br514 active 44:37:e6:95:03:e4 real 4m2.336s user 0m0.056s sys 0m0.045s And verify it with libvirt-2.0.0-3.el7.x86_64 and netcf-0.2.8-2.el7.x86_64 # time virsh iface-list --all Name State MAC Address --------------------------------------------------- br1 active 44:37:e6:95:03:e4 br10 active 44:37:e6:95:03:e4 br100 active 44:37:e6:95:03:e4 br101 active 44:37:e6:95:03:e4 .... br513 active 44:37:e6:95:03:e4 br514 active 44:37:e6:95:03:e4 real 0m0.425s user 0m0.027s sys 0m0.017s # # time virsh iface-list --all Name State MAC Address --------------------------------------------------- br1 active fa:36:0e:22:8c:fd br10 active c6:5e:4d:d5:e2:2d ..... br511 active 8a:de:fe:b8:09:ae br512 active 86:88:9c:8c:e7:9b br513 active aa:be:25:cc:1a:50 br514 active 16:61:94:ff:b1:3d real 0m0.439s user 0m0.024s sys 0m0.017s It has get a remarkable improvement, so change it to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2524.html |
netcf calls aug_load() at the start of every public API that uses augeas, and libvirt's iface-list calls netcf-->augeas at least 3 times for each interface. In one test system with ~600 interfaces (300 bridges each attached to a vlan) this results in a runtime for "virsh iface-list --all" of ~22 seconds (it was 28 seconds prior to libvirt code being optimized to eliminate one of the three calls). This is in spite of the fact that augeas itself isn't actually re-reading the data unless a file has changed - it's merely checking the mtime of every ifcfg-* file, but even that takes a very long time when there are a lot of interfaces. There is an upstream patch that allows netcf to used cached config data from augeas during a public API call, refreshing the cache at most once per second (assuming a single persistent instance of netcf; if netcf is instead hutdown and re-inited, the entire config tree must be re-read). This lowers the time for virsh iface-list --all on the same test setup down to < 1 second. Here is the upstream netcf patch that solves the problem: commit 9b5f4eb57af28a604cd7ac8b2c1be9e49f0b517d Author: Laine Stump <laine> Date: Mon Sep 28 17:11:11 2015 -0400 call aug_load() at most once per second