Bug 1269613

Summary: virsh iface-dumpxml is extremely slow on a host with a large number of interfaces
Product: Red Hat Enterprise Linux 7 Reporter: Laine Stump <laine>
Component: netcfAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: dyuan, jishao, laine, mzhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: netcf-0.2.8-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1271341 (view as bug list) Environment:
Last Closed: 2016-11-04 07:52:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1288337    

Description Laine Stump 2015-10-07 18:05:24 UTC
On a test system with 514 bridges each having an attach vlan interface, "virsh iface-dumpxml" of all toplevel interfaces takes 13min46sec. This is a bad thing because that's essentially what virt-manager does every time it is started.

Investigation and upstream discussion on the netcf-devel and augeas-devel mailing lists:

  https://www.redhat.com/archives/augeas-devel/2015-September/msg00001.html
  https://www.redhat.com/archives/augeas-devel/2015-October/msg00000.html

led to the following:

1) Several commits have recently been pushed to upstream augeas (post 1.4.0) which end up having a dramatic effect on the execution time of "virsh iface-dumpxml" for all interfaces on a system with 514 bridges + 514 vlans. Here are the augeas upstream commit IDs:

  commit a659f09afa1d6383d17a724da9c2ddbb77f844a7
  Author: David Lutterkort <lutter>
  Date:   Mon Sep 28 14:37:03 2015 -0700

    Speed up the duplicate check when constructing nodesets
    
  commit 41e989ca1f9abf7a13e0f333e90857f2ccafbf7d
  Author: David Lutterkort <lutter>
  Date:   Mon Sep 28 14:51:42 2015 -0700

    * src/pathx.c (ns_filter): speed up by removing nodes in batches
    
  commit 23d5e480cab312ed3f47f9f6e0777b152d6df9df
  Author: David Lutterkort <lutter>
  Date:   Mon Sep 28 15:40:31 2015 -0700

    * src/pathx.c (ns_from_locpath): special-case 'name[42]' path expressions

These reduce the time from 13m46s to 7m34s! (The netcf patch given in Bug 1268382 further reduces this time to 6m40s).

2) There is an aug_match() query in netcf that takes a *very* long time to complete (nearly 1 second for each query!). It is in drv_redhat.c:

    ndevs = aug_fmt_match(ncf, &devs,
              "%s[ DEVICE = '%s' or BRIDGE = '%s' or MASTER = '%s'"
              "    or MASTER = ../*[BRIDGE = '%s']/DEVICE" ]/DEVICE",
              ifcfg_path, nif->name, nif->name, nif->name, nif->name);

So the query expression is, e.g.:

     "/files/etc/sysconfig/network-scripts/*[ "
     "DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER = "
     "../*[BRIDGE = 'br1']/DEVICE ]/DEVICE"

The time-consuming part is the final term:

     MASTER = ../*[BRIDGE='br1']DEVICE

This searches for all ifcfg files that have a MASTER that is equal to the DEVICE name of any other ifcfg file that has BRIDGE='br1'. David Lutterkort has offered some advice on how to rewrite that query to make it much more efficient:

  https://www.redhat.com/archives/augeas-devel/2015-October/msg00003.html

  "(/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = '%s']"
  "|/files/etc/sysconfig/network-scripts/*[MASTER]"
  "[MASTER = ../*[BRIDGE = '%s']/DEVICE ])/DEVICE

I've implemented that in a netcf patch, and it reduces the time for virsh iface-dumpxml of all interfaces from 6m40s down to 17 seconds.

Comment 1 Laine Stump 2015-10-13 17:04:58 UTC
This is the upstream patch that changes the query and reduces time (in the specific test case) from 6m40s to 17s:

commit 396e4e0698d9fb542f2eb8b32790a069e1c0df61
Author: Laine Stump <laine>
Date:   Wed Oct 7 13:49:45 2015 -0400

    optimize aug_match() query for all ifcfg files related to an interface

Comment 4 Jingjing Shao 2016-07-13 10:01:16 UTC
Hi laine,

I try to verify this issue as below. I think the time of 'virsh iface-dumpxml' has a remarkable improvement. But the the time of 'iface-define' is 14m14.336s. Does it need to be get improvement? 

# rpm -q libvirt 
libvirt-2.0.0-2.el7.x86_64

# rpm -q netcf
netcf-0.2.8-2.el7.x86_64

1. prepare xml file  for define iface contains a vlan which is using an Ethernet	

# cat br-X.xml 
<interface type="bridge" name="brX">
<start mode="onboot"/>
<protocol family="ipv4">
</protocol>
<bridge stp="off">
<interface type="vlan" name="ethX.2">
<vlan tag="2">
<interface name="ethX"/>
</vlan>
</interface>
</bridge>
</interface>


2. virsh iface-list --all
 eth0                 active     00:23:24:80:99:c6
 eth1                 inactive   
 eth10                inactive   
.....
 

for i in {1..512};do sed -e s/X/$i/ iface-X.xml > br-$i.xml;done 

Define 512 iface and check the time

# time for i in {1..512};do virsh iface-define br-$i.xml;done 
....
Interface br512 defined from br-512.xml


real	14m14.336s
user	0m2.799s
sys	0m1.703s


3. check the dumpxml of interface and check the time
# time for i in {1..512};do virsh iface-dumpxml  br$i;done
.....
<interface type='bridge' name='br512'>
  <start mode='onboot'/>
  <bridge stp='off'>
    <interface type='vlan' name='eth512.2'>
      <vlan tag='2'>
        <interface name='eth512'/>
      </vlan>
    </interface>
  </bridge>
</interface>


real	0m10.150s
user	0m2.960s
sys	0m1.914s

3.check the iface-list time
# time virsh iface-list 
 Name                 State      MAC Address
---------------------------------------------------
 eth0                 active     00:23:24:80:99:c6
 lo                   active     00:00:00:00:00:00


real	0m2.831s
user	0m0.005s
sys	0m0.004s

# time virsh iface-list --all
....

real	0m2.219s
user	0m0.045s
sys	0m0.035s

Comment 5 Laine Stump 2016-07-26 21:14:59 UTC
(In reply to Jingjing Shao from comment #4)
> Hi laine,
> 
> I try to verify this issue as below. I think the time of 'virsh
> iface-dumpxml' has a remarkable improvement. But the the time of
> 'iface-define' is 14m14.336s. Does it need to be get improvement? 

It would always be nice for any operation to get faster, but that's not the topic of this bug. (Aside from that, defining a new interface happens very infrequently, while dumping the interface config is done several times a minute by virt-manager, so it's much more important to get that faster).

Also, I noticed that your example definition wouldn't actually work if you tried to ifup it. The problem is that your template uses eth$N as the physical ethernet name, and that device (e.g. eth512) almost surely doesn't exist. Instead, you should pick an existing physical device (e.g. eth0, or enp5s0 or whatever) and use that same physical device for every vlan device, altering the vlan id, e.g.:

  <interface type="vlan" name="eth0.X">
    <vlan tag="X">
    <interface name="eth0"/>
  </vlan>

Your test ends up being successful anyway because you never ifup the new interfaces, but it technically isn't correct.

Comment 6 Jingjing Shao 2016-07-31 10:21:42 UTC
(In reply to Laine Stump from comment #5)
> (In reply to Jingjing Shao from comment #4)
> > Hi laine,
> > 
> > I try to verify this issue as below. I think the time of 'virsh
> > iface-dumpxml' has a remarkable improvement. But the the time of
> > 'iface-define' is 14m14.336s. Does it need to be get improvement? 
> 
> It would always be nice for any operation to get faster, but that's not the
> topic of this bug. (Aside from that, defining a new interface happens very
> infrequently, while dumping the interface config is done several times a
> minute by virt-manager, so it's much more important to get that faster).
> 
> Also, I noticed that your example definition wouldn't actually work if you
> tried to ifup it. The problem is that your template uses eth$N as the
> physical ethernet name, and that device (e.g. eth512) almost surely doesn't
> exist. Instead, you should pick an existing physical device (e.g. eth0, or
> enp5s0 or whatever) and use that same physical device for every vlan device,
> altering the vlan id, e.g.:
> 
>   <interface type="vlan" name="eth0.X">
>     <vlan tag="X">
>     <interface name="eth0"/>
>   </vlan>
> 
> Your test ends up being successful anyway because you never ifup the new
> interfaces, but it technically isn't correct.

Thanks laine's explanation.
eproduce the bug with libvirt-1.2.17-13.el7.x86_64 and netcf-0.2.8-1.el7.x86_64

# cat br-X.xml
<interface type="bridge" name="brX">
<start mode="onboot"/>
<protocol family="ipv4">
</protocol>
<bridge stp="off">
<interface type="vlan" name="eno1.X">
<vlan tag="X">
<interface name="eno1"/>
</vlan>
</interface>
</bridge>
</interface>

# for i in {1..514};do sed -e s/X/$i/ br-X.xml > br-$i.xml;done

# time for i in {1..514};do virsh iface-define br-$i.xml;done
Interface br1 defined from br-1.xml

Interface br2 defined from br-2.xml

Interface br3 defined from br-3.xml

Interface br4 defined from br-4.xml

Interface br5 defined from br-5.xml

....
Interface br513 defined from br-513.xml

Interface br514 defined from br-514.xml


real    6m30.884s
user    0m3.366s
sys     0m1.987s

# time for i in {1..514};do virsh iface-start br$i;done
Interface br1 started

Interface br2 started

.....

Interface br510 started

Interface br511 started

Interface br514 started

real    7m32.715s
user    0m3.348s
sys     0m1.971s

# time for i in {1..514};do virsh iface-dumpxml  br$i;done
<interface type='bridge' name='br1'>
  <protocol family='ipv6'>
    <ip address='fe80::4637:e6ff:fe95:3e4' prefix='64'/>
  </protocol>
  <bridge>
    <interface type='vlan' name='eno1.1'>
      <link speed='1000' state='up'/>
      <vlan tag='1'>
        <interface name='eno1'/>
      </vlan>
    </interface>
  </bridge>
</interface>
.....
<interface type='bridge' name='br514'>
  <protocol family='ipv6'>
    <ip address='fe80::4637:e6ff:fe95:3e4' prefix='64'/>
  </protocol>
  <bridge>
    <interface type='vlan' name='eno1.514'>
      <link speed='1000' state='up'/>
      <vlan tag='514'>
        <interface name='eno1'/>
      </vlan>
    </interface>
  </bridge>
</interface>

real    3m25.147s
user    0m3.301s
sys     0m1.932s

And verify it with libvirt-2.0.0-3.el7.x86_64 and netcf-0.2.8-2.el7.x86_64

# time for i in {1..514};do virsh iface-dumpxml  br$i;done
<interface type='bridge' name='br1'>
  <protocol family='ipv6'>
    <ip address='fe80::4637:e6ff:fe95:3e4' prefix='64'/>
  </protocol>
  <bridge>
    <interface type='vlan' name='eno1.1'>
      <link speed='1000' state='up'/>
      <vlan tag='1'>
        <interface name='eno1'/>
      </vlan>
    </interface>
  </bridge>
</interface>
.....
<interface type='bridge' name='br514'>
  <protocol family='ipv6'>
    <ip address='fe80::4637:e6ff:fe95:3e4' prefix='64'/>
  </protocol>
  <bridge>
    <interface type='vlan' name='eno1.514'>
      <link speed='1000' state='up'/>
      <vlan tag='514'>
        <interface name='eno1'/>
      </vlan>
    </interface>
  </bridge>
</interface>
real    0m11.419s
user    0m3.158s
sys     0m1.876s


It has get a remarkable improvement, so change it to verified

Comment 8 errata-xmlrpc 2016-11-04 07:52:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2524.html