Bug 1268382

Summary: virsh iface-list --all takes a very long time to complete on host with large number of interfaces
Product: Red Hat Enterprise Linux 7 Reporter: Laine Stump <laine>
Component: netcfAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: dyuan, jishao, jsuchane, laine, mzhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: netcf-0.2.8-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1268384 (view as bug list) Environment:
Last Closed: 2016-11-04 07:52:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1288337    

Description Laine Stump 2015-10-02 18:12:43 UTC
netcf calls aug_load() at the start of every public API that uses augeas, and libvirt's iface-list calls netcf-->augeas at least 3 times for each interface. In one test system with ~600 interfaces (300 bridges each attached to a vlan) this results in a runtime for "virsh iface-list --all" of ~22 seconds (it was 28 seconds prior to libvirt code being optimized to eliminate one of the three calls).

This is in spite of the fact that augeas itself isn't actually re-reading the data unless a file has changed - it's merely checking the mtime of every ifcfg-* file, but even that takes a very long time when there are a lot of interfaces.

There is an upstream patch that allows netcf to used cached config data from augeas during a public API call, refreshing the cache at most once per second (assuming a single persistent instance of netcf; if netcf is instead hutdown and re-inited, the entire config tree must be re-read). This lowers the time for virsh iface-list --all on the same test setup down to < 1 second.

Here is the upstream netcf patch that solves the problem:

commit 9b5f4eb57af28a604cd7ac8b2c1be9e49f0b517d
Author: Laine Stump <laine>
Date:   Mon Sep 28 17:11:11 2015 -0400

    call aug_load() at most once per second

Comment 1 Shanzhi Yu 2015-11-16 09:40:03 UTC
# rpm -q libvirt netcf
libvirt-1.2.17-13.el7.x86_64
netcf-0.2.8-1.el7.x86_64

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz
Stepping:              3
CPU MHz:               3563.375
BogoMIPS:              6385.71
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-3

# free -g
              total        used        free      shared  buff/cache   available
Mem:              7           3           1           0           1           2
Swap:             0           0           0



0. prepare xml file for define iface
# cat iface-X.xml 
<interface type='ethernet' name='ethX'>
<start mode='none'/>
<protocol family='ipv4'>
<dhcp peerdns='yes'/>
</protocol>
<protocol family='ipv6'>
<autoconf/>
</protocol>
</interface>

# for i in {1..512};do sed -e s/X/$i/ iface-X.xml > iface-$i.xml;done 

1. Define 512 iface, check cpu usage of libvirtd process 

# time for i in {1..512};do virsh iface-define iface-$i.xml;done 

Interface eth1 defined from iface-1.xml
...
Interface eth512 defined from iface-512.xml

real	0m50.221s
user	0m2.868s
sys	0m1.623s

During define interface, the usage of libvirtd will go up to about 80%.  
# top -i

Tasks: 211 total,   2 running, 209 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.7 us,  0.4 sy,  0.0 ni, 96.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7734908 total,  1646404 free,  4355384 used,  1733120 buff/cache
KiB Swap:   511996 total,   511996 free,        0 used.  2758688 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
 9248 root      20   0 1153524 224748  13756 S  81.2  2.9  21:52.09 libvirtd 

2. Undefine the 512 iface, check cpu usage of libvirtd process 

# time for i in {1..512};do virsh iface-undefine eth$i;done 
....
Interface eth512 undefined

real	0m31.393s
user	0m2.828s
sys	0m1.551s

During undefine interface, the usage of libvirtd will go up to about 90%,

$ top 

top - 16:33:04 up  6:16, 10 users,  load average: 0.93, 0.59, 0.41
Tasks: 211 total,   1 running, 210 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.6 us,  1.8 sy,  0.0 ni, 73.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 64.3/7734908  [                                                                                                  ]
KiB Swap:  0.0/511996   [                                                                                                  ]

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
 9248 root      20   0 1153524 220356  13756 S  93.0  2.8  23:05.67 /usr/sbin/libvirtd  

3. Get iface list, check cpu usage of libvirtd process 

# time virsh iface-list 
 Name                 State      MAC Address
---------------------------------------------------
 eth0                 active     a0:48:1c:86:67:9c
 lo                   active     00:00:00:00:00:00


real	0m6.143s
user	0m0.006s
sys	0m0.003s

cpu usage of libvirtd process will go up to about 90%
$ top

top - 17:18:09 up  7:01,  9 users,  load average: 0.54, 0.46, 0.31
Tasks: 212 total,   1 running, 211 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.0 us,  5.0 sy,  0.0 ni, 72.8 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7734908 total,  1987444 free,  4000940 used,  1746524 buff/cache
KiB Swap:   511996 total,   511996 free,        0 used.  3119156 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  


# time virsh iface-list --all
 Name                 State      MAC Address
---------------------------------------------------
 eno3                 inactive   
 eth0                 active     a0:48:1c:86:67:9c
 eth1                 inactive   
...
 lo                   active     00:00:00:00:00:00


real	0m14.882s
user	0m0.040s
sys	0m0.032s
   
cpu usage of libvirtd process will go up to about 90%+

$ top

top - 17:18:55 up  7:01,  9 users,  load average: 0.47, 0.44, 0.32
Tasks: 212 total,   1 running, 211 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.8 us,  4.3 sy,  0.0 ni, 72.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7734908 total,  1971972 free,  4012564 used,  1750372 buff/cache
KiB Swap:   511996 total,   511996 free,        0 used.  3103720 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
 9248 root      20   0 1153524 220780  13924 S  92.4  2.9  27:20.41 libvirtd                                                
 

4. Get interface xml, check cpu usage of libvirtd process

# time for i in {1..512};do virsh iface-dumpxml  eth$i;done

...
<interface type='ethernet' name='eth512'>
  <start mode='none'/>
  <protocol family='ipv4'>
    <dhcp peerdns='yes'/>
  </protocol>
  <protocol family='ipv6'>
    <autoconf/>
  </protocol>
</interface>


real	1m25.222s
user	0m2.965s
sys	0m1.702s

cpu usage of libvirtd process will go up to more than 90%. 

$ top

top - 17:29:26 up  7:12,  9 users,  load average: 0.76, 0.43, 0.32
Tasks: 210 total,   1 running, 209 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.1 us,  1.7 sy,  0.0 ni, 73.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7734908 total,  1958280 free,  4021692 used,  1754936 buff/cache
KiB Swap:   511996 total,   511996 free,        0 used.  3090340 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
 9248 root      20   0 1153524 221568  13924 S  94.3  2.9  29:50.62 libvirtd

Comment 5 Jingjing Shao 2016-07-13 09:05:55 UTC
Hi laine, 
I try to verify the issue and get result as below

As we can see that in the step2  virsh iface-list  takes 1.128s > 1 second.
in step2  virsh iface-list --all just takes 0.712s which is much lower than 1.128s
Is it right for virsh iface-list ? 


# rpm -q libvirt
libvirt-2.0.0-2.el7.x86_64
# rpm -q netcf
netcf-0.2.8-2.el7.x86_64


1.# time for i in {1..512};do virsh iface-define iface-$i.xml;done 

real	0m52.020s
user	0m2.748s
sys	0m1.661s

2. # time virsh iface-list 
 Name                 State      MAC Address
---------------------------------------------------
 eth0                 active     00:23:24:80:99:c6
 lo                   active     00:00:00:00:00:00

real	0m1.128s
user	0m0.004s
sys	0m0.004s


3. # time virsh iface-list --all
....
real	0m0.712s
user	0m0.022s
sys	0m0.021s


4. # time for i in {1..512};do virsh iface-dumpxml  eth$i;done
...
real	0m6.209s
user	0m2.719s
sys	0m1.639s


5. # time for i in {1..512};do virsh iface-undefine eth$i;done 

real	0m26.912s
user	0m2.640s
sys	0m1.587s

Comment 6 Laine Stump 2016-07-26 21:31:26 UTC
(In reply to Jingjing Shao from comment #5)

> As we can see that in the step2  virsh iface-list  takes 1.128s > 1 second.
> in step2  virsh iface-list --all just takes 0.712s which is much lower than
> 1.128s
> Is it right for virsh iface-list ? 

> 
> 2. # time virsh iface-list 
>  Name                 State      MAC Address
> ---------------------------------------------------
>  eth0                 active     00:23:24:80:99:c6
>  lo                   active     00:00:00:00:00:00
> 
> real	0m1.128s
> user	0m0.004s
> sys	0m0.004s
> 
> 
> 3. # time virsh iface-list --all
> ....
> real	0m0.712s
> user	0m0.022s
> sys	0m0.021s

I guess you're asking if it's normal that "virsh iface-list" takes longer than "virsh iface-list --all", and the answer is "yes, that is expected". The reason is that when you run "virsh iface-list --all", a single call is made to netcf's ncf_list_interfaces(), which returns all interfaces on the host, but a call to plain "virsh iface-list" will first call ncf_list_interfaces(), and then spend extra time filtering that complete list to eliminate all interfaces that aren't active.

The important thing to look for in determining if the bug has been fixed is the relative time for "virsh iface-list --all" on the host with the older netcf vs. the host with the newer netcf. (I did this just now on my RHEL7 system with 514 bridge+vlan combinations, and this time went from 1m14.334s down to 0m1.833s).

Comment 7 Jingjing Shao 2016-07-31 10:24:45 UTC
(In reply to Laine Stump from comment #6)
> (In reply to Jingjing Shao from comment #5)
> 
> > As we can see that in the step2  virsh iface-list  takes 1.128s > 1 second.
> > in step2  virsh iface-list --all just takes 0.712s which is much lower than
> > 1.128s
> > Is it right for virsh iface-list ? 
> 
> > 
> > 2. # time virsh iface-list 
> >  Name                 State      MAC Address
> > ---------------------------------------------------
> >  eth0                 active     00:23:24:80:99:c6
> >  lo                   active     00:00:00:00:00:00
> > 
> > real	0m1.128s
> > user	0m0.004s
> > sys	0m0.004s
> > 
> > 
> > 3. # time virsh iface-list --all
> > ....
> > real	0m0.712s
> > user	0m0.022s
> > sys	0m0.021s
> 
> I guess you're asking if it's normal that "virsh iface-list" takes longer
> than "virsh iface-list --all", and the answer is "yes, that is expected".
> The reason is that when you run "virsh iface-list --all", a single call is
> made to netcf's ncf_list_interfaces(), which returns all interfaces on the
> host, but a call to plain "virsh iface-list" will first call
> ncf_list_interfaces(), and then spend extra time filtering that complete
> list to eliminate all interfaces that aren't active.
> 
> The important thing to look for in determining if the bug has been fixed is
> the relative time for "virsh iface-list --all" on the host with the older
> netcf vs. the host with the newer netcf. (I did this just now on my RHEL7
> system with 514 bridge+vlan combinations, and this time went from 1m14.334s
> down to 0m1.833s).

Thanks laine's explanation.
Reproduce the bug with libvirt-1.2.17-13.el7.x86_64 and netcf-0.2.8-1.el7.x86_64
 cat br-X.xml
<interface type="bridge" name="brX">
<start mode="onboot"/>
<protocol family="ipv4">
</protocol>
<bridge stp="off">
<interface type="vlan" name="eno1.X">
<vlan tag="X">
<interface name="eno1"/>
</vlan>
</interface>
</bridge>
</interface>

# for i in {1..514};do sed -e s/X/$i/ br-X.xml > br-$i.xml;done

# time for i in {1..514};do virsh iface-define br-$i.xml;done
Interface br1 defined from br-1.xml

Interface br2 defined from br-2.xml

Interface br3 defined from br-3.xml

Interface br4 defined from br-4.xml

Interface br5 defined from br-5.xml

....
Interface br513 defined from br-513.xml

Interface br514 defined from br-514.xml


real    6m30.884s
user    0m3.366s
sys     0m1.987s

# time for i in {1..514};do virsh iface-start br$i;done
Interface br1 started

Interface br2 started

.....

Interface br510 started

Interface br511 started

Interface br514 started

real    7m32.715s
user    0m3.348s
sys     0m1.971s


#  time virsh iface-list --all
 Name                 State      MAC Address
---------------------------------------------------
 br1                  active     44:37:e6:95:03:e4
 br10                 active     44:37:e6:95:03:e4
 br100                active     44:37:e6:95:03:e4
 br101                active     44:37:e6:95:03:e4
....
 br512                active     44:37:e6:95:03:e4
 br513                active     44:37:e6:95:03:e4
 br514                active     44:37:e6:95:03:e4

real    4m2.931s
user    0m0.047s
sys     0m0.055s




# time virsh iface-list
 Name                 State      MAC Address
---------------------------------------------------
 br1                  active     44:37:e6:95:03:e4
 br10                 active     44:37:e6:95:03:e4
 br100                active     44:37:e6:95:03:e4
 br101                active     44:37:e6:95:03:e4
...
 br511                active     44:37:e6:95:03:e4
 br514                active     44:37:e6:95:03:e4


real    4m2.336s
user    0m0.056s
sys     0m0.045s

And verify it with libvirt-2.0.0-3.el7.x86_64 and netcf-0.2.8-2.el7.x86_64

# time virsh iface-list --all
 Name                 State      MAC Address
---------------------------------------------------
 br1                  active     44:37:e6:95:03:e4
 br10                 active     44:37:e6:95:03:e4
 br100                active     44:37:e6:95:03:e4
 br101                active     44:37:e6:95:03:e4
....
 br513                active     44:37:e6:95:03:e4
 br514                active     44:37:e6:95:03:e4

real    0m0.425s
user    0m0.027s
sys     0m0.017s


# # time virsh iface-list --all
 Name                 State      MAC Address
---------------------------------------------------
 br1                  active     fa:36:0e:22:8c:fd
 br10                 active     c6:5e:4d:d5:e2:2d
.....
 br511                active     8a:de:fe:b8:09:ae
 br512                active     86:88:9c:8c:e7:9b
 br513                active     aa:be:25:cc:1a:50
 br514                active     16:61:94:ff:b1:3d

real    0m0.439s
user    0m0.024s
sys     0m0.017s

It has get a remarkable improvement, so change it to verified

Comment 9 errata-xmlrpc 2016-11-04 07:52:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2524.html