Bug 1275468 - configure many networks with ipv6; one will randomly not autostart. hits DAD timeout
configure many networks with ipv6; one will randomly not autostart. hits DAD ...
Status: CLOSED CURRENTRELEASE
Product: Virtualization Tools
Classification: Community
Component: libvirt (Show other bugs)
unspecified
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Libvirt Maintainers
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-26 21:27 EDT by jean-christophe manciot
Modified: 2016-08-10 13:59 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-10 05:20:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log_level=debug (1.78 MB, text/plain)
2016-04-11 09:21 EDT, jean-christophe manciot
no flags Details
Contains only lines with: libvirt, NetworkManager, dnsmasq, systemd, virtual-router and virbr2 (356.92 KB, text/plain)
2016-04-12 02:47 EDT, jean-christophe manciot
no flags Details
syslog with only lines containing virbr2 (7.88 KB, text/plain)
2016-04-12 02:48 EDT, jean-christophe manciot
no flags Details
Syslog with only lines containing virbr4 and virtual-bridge-4 after reboot & manual start of the vnet (11.41 KB, text/plain)
2016-04-12 03:29 EDT, jean-christophe manciot
no flags Details
Syslog with only libvirtd, virbr5 & IPv6 (16.42 KB, text/plain)
2016-04-13 01:36 EDT, jean-christophe manciot
no flags Details

  None (edit)
Description jean-christophe manciot 2015-10-26 21:27:36 EDT
Description of problem:
----------------------
sudo virsh connect qemu:///system
sudo virsh net-start default
error: Failed to start network default
error: error creating bridge interface virbr0: File exists

sudo virsh net-start loopback
error: Failed to start network loopback
error: error creating bridge interface virbr1: File exists

sudo virsh net-start  virtual-router
error: Failed to start network virtual-router
error: error creating bridge interface virbr2: File exists

Version-Release number of selected component (if applicable):
------------------------------------------------------------
github sources e739d956e850093b05ea7d95f76ada49bd2ebfb5

Steps to Reproduce:
------------------
1. completely uninstall libvirt
2. compile last libvirt sources with:
./autogen.sh
./configure --without-wireshark-dissector \
            --with-openssl \
	    --without-apparmor \
	    --without-apparmor-mount \
	    --without-apparmor-profiles \
	    --without-secdriver-apparmor \
	    --prefix=/usr --sysconfdir=/etc --localstatedir=/var
make
3. install it with sudo make install
4. copy previous configuration files in /etc/libvirt
5. enable & start libvirtd

Actual results:
--------------
sudo brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.000000000000	yes		
virbr1		8000.000000000000	yes		
virbr10		8000.525400a8ff01	yes		virbr10-nic
virbr11		8000.525400335153	yes		virbr11-nic
virbr2		8000.000000000000	yes		
virbr3		8000.5254002a7ee2	yes		virbr3-nic
virbr4		8000.52540097ec52	yes		virbr4-nic
virbr5		8000.5254004491cb	yes		virbr5-nic
virbr6		8000.5254005f9a92	yes		virbr6-nic
virbr7		8000.525400573542	yes		virbr7-nic
virbr8		8000.525400960209	yes		virbr8-nic
virbr9		8000.525400e89e8c	yes		virbr9-nic

sudo virsh net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 virtual-bridge-1     active     yes           yes
 virtual-bridge-2     active     yes           yes
 virtual-bridge-3     active     yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-bridge-5     active     yes           yes
 virtual-bridge-6     active     yes           yes
 virtual-bridge-7     active     yes           yes
 virtual-bridge-8     active     yes           yes
 virtual-bridge-9     active     yes           yes

sudo virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              inactive   yes           yes
 loopback             inactive   yes           yes
 virtual-bridge-1     active     yes           yes
 virtual-bridge-2     active     yes           yes
 virtual-bridge-3     active     yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-bridge-5     active     yes           yes
 virtual-bridge-6     active     yes           yes
 virtual-bridge-7     active     yes           yes
 virtual-bridge-8     active     yes           yes
 virtual-bridge-9     active     yes           yes
 virtual-router       inactive   yes           yes

Additional info:
---------------
Ubuntu gnome server 15.10
Comment 1 jean-christophe manciot 2015-11-05 04:20:37 EST
Same issue with 1.2.21
Comment 2 Jiri Denemark 2015-11-05 05:22:43 EST
Uninstalling libvirt while there are some domains or networks running is not a very good idea because they will be still running but libvirt will lose track of them. Just shutdown all domains and destroy all networks before uninstalling libvirt and everything should work. I don't see any bug here.
Comment 3 jean-christophe manciot 2015-11-05 10:31:27 EST
Got it, thanks.
Comment 4 jean-christophe manciot 2015-11-09 05:31:00 EST
Well, I tried to follow your guidelines:
- shutdown all domains
- destroy all networks
- uninstall libvirt
- install libvirt 1.2.21

The issue remains and I have to perform the following each time the machine boots up in order to try have the first three networks active - the others are OK: sometimes, a new issue surfaces!

cd /etc/libvirt/qemu/networks
virsh connect qemu:///system
virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              inactive   yes           yes
 loopback             inactive   yes           yes
 virtual-bridge-1     active     yes           yes
 virtual-bridge-2     active     yes           yes
 virtual-bridge-3     inactive   yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-bridge-5     active     yes           yes
 virtual-bridge-6     active     yes           yes
 virtual-bridge-7     active     yes           yes
 virtual-bridge-8     active     yes           yes
 virtual-bridge-9     active     yes           yes
 virtual-router       inactive   yes           yes

brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.000000000000	yes		
virbr1		8000.000000000000	yes		
virbr10		8000.525400a8ff01	yes		virbr10-nic
virbr11		8000.525400335153	yes		virbr11-nic
virbr2		8000.000000000000	yes		
virbr3		8000.5254002a7ee2	yes		virbr3-nic
virbr4		8000.52540097ec52	yes		virbr4-nic
virbr6		8000.5254005f9a92	yes		virbr6-nic
virbr7		8000.525400573542	yes		virbr7-nic
virbr8		8000.525400960209	yes		virbr8-nic
virbr9		8000.525400e89e8c	yes		virbr9-nic

ifconfig virbr0 down
brctl delbr virbr0
virsh net-define default.xml
Network default defined from default.xml

virsh net-start default
Network default started

ifconfig virbr1 down
brctl delbr virbr1
virsh net-define loopback.xml
Network loopback defined from loopback.xml

virsh net-start loopback
Network default started

ifconfig virbr2 down
brctl delbr virbr2
virsh net-define virtual-router.xml
Network virtual-router defined from virtual-router.xml

virsh net-start virtual-router
Network virtual-router started

virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 loopback             active     yes           yes
 virtual-bridge-1     active     yes           yes
 virtual-bridge-2     active     yes           yes
 virtual-bridge-3     active     yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-bridge-5     active     yes           yes
 virtual-bridge-6     active     yes           yes
 virtual-bridge-7     active     yes           yes
 virtual-bridge-8     active     yes           yes
 virtual-bridge-9     active     yes           yes
 virtual-router       active     yes           yes

brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.5254009d4405	yes		virbr0-nic
virbr1		8000.525400ffebb0	yes		virbr1-nic
virbr10		8000.525400a8ff01	yes		virbr10-nic
virbr11		8000.525400335153	yes		virbr11-nic
virbr2		8000.5254000d35ae	yes		virbr2-nic
virbr3		8000.5254002a7ee2	yes		virbr3-nic
virbr4		8000.52540097ec52	yes		virbr4-nic
virbr5		8000.5254004491cb	yes		virbr5-nic
virbr6		8000.5254005f9a92	yes		virbr6-nic
virbr7		8000.525400573542	yes		virbr7-nic
virbr8		8000.525400960209	yes		virbr8-nic
virbr9		8000.525400e89e8c	yes		virbr9-nic
-------------------------------------------------------------------------------

Sometimes, I get the following errors:
error: Disconnected from qemu:///system due to I/O error
error: Failed to start network loopback
error: Cannot recv data: Connection reset by peer

error: One or more references were leaked after disconnect from the hypervisor
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
Comment 5 jean-christophe manciot 2015-12-10 06:12:41 EST
Not solved with libvirt 1.3.0 with exactly the same symptoms.
Comment 6 jean-christophe manciot 2016-02-09 11:50:09 EST
If I stop & disable gnome NetworkManager systemd service, the symptoms disappear mostly, even though from time to time a random net is not started after reboot despite being marked as Autostart:

virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 loopback             active     yes           yes
 ovs-net              active     yes           yes
 virl-data-flat       active     yes           yes
 virl-data-flat1      active     yes           yes
 virl-data-snat       active     yes           yes
 virl-openstack       active     yes           yes
 virtual-bridge-1     active     yes           yes
 virtual-bridge-2     active     yes           yes
 virtual-bridge-3     active     yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-bridge-5     active     yes           yes
 virtual-bridge-6     active     yes           yes
 virtual-bridge-7     inactive   yes           yes
 virtual-bridge-8     active     yes           yes
 virtual-bridge-9     active     yes           yes
 virtual-router       active     yes           yes
Comment 7 Cole Robinson 2016-04-10 18:38:51 EDT
Strange, not sure how NetworkManager would be affecting things here.
What distro are you on?
Is this still reproducing with latest libvirt?
Comment 8 jean-christophe manciot 2016-04-11 00:09:44 EDT
The behavior has been improved with the latest releases of libvirt.
With 1.3.3, I continue to experience the latest symptom (one virtual net not active after reboot) with network-manager active.
I use Ubuntu server 15.10 4.2.0-35.
Comment 9 Cole Robinson 2016-04-11 07:29:14 EDT
Can you check syslog for any libvirt errors? It may explain why the network is not starting at bootup
Comment 10 jean-christophe manciot 2016-04-11 09:19:28 EDT
With log_level = 1 in /etc/libvirt/libvirtd.conf and
sed '/libvirt/!d' syslog > libvirtd.debug.log

The result is attached.
Comment 11 jean-christophe manciot 2016-04-11 09:21 EDT
Created attachment 1145997 [details]
log_level=debug
Comment 12 Cole Robinson 2016-04-11 11:11:30 EDT
global debug logging is extremely verbose... and I don't see messages about most of your networks in that log, just virtual-bridge-3 and virtual-bridge-4, so i'm not really sure what's going if. default settings should have libvirtd send errors to syslog which should catch the startup failure I assume. dmesg may also have bits about network stuff going up or down
Comment 13 jean-christophe manciot 2016-04-12 02:44:59 EDT
OK. This time, the vnet "virtual-router" has not started despite being set as autostart:
sudo virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 loopback             active     yes           yes
 ovs-net              active     yes           yes
 virl-data-flat       active     yes           yes
 virl-data-flat1      active     yes           yes
 virl-data-snat       active     yes           yes
 virl-openstack       active     yes           yes
 virtual-bridge-3     active     yes           yes
 virtual-bridge-4     active     yes           yes
 virtual-mgt-5        active     yes           yes
 virtual-router       inactive   yes           yes

It is linked to virbr2 bridge which is nowhere to be found:
sudo brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.024230f861fd	no		
lxcbr0		8000.000000000000	no		
virbr0		8000.525400e18980	yes		virbr0-nic
virbr1		8000.525400ffebb0	yes		virbr1-nic
virbr12		8000.5254008826e1	yes		virbr12-nic
virbr13		8000.5254001b43cc	yes		virbr13-nic
virbr14		8000.525400cb1e3a	yes		virbr14-nic
virbr15		8000.525400d30f13	yes		virbr15-nic
virbr3		8000.5254006b9df4	yes		virbr3-nic
virbr4		8000.52540078ef37	yes		virbr4-nic
virbr5		8000.5254009aa68c	yes		virbr5-nic


There are the follwing entries in the log:

NetworkManager[1643]: <info>  (virbr2): Activation: successful, device activated.
...
NetworkManager[1643]: <warn>  (virbr2): failed to detach bridge port virbr2-nic
NetworkManager[1643]: <info>  (virbr2): device state change: activated -> unmanaged (reason 'removed') [100 10 36]

It's the only bridge which experiences the previous fate.
syslog.libvirt.log & syslog.only-virbr2.log are attached
Comment 14 jean-christophe manciot 2016-04-12 02:47 EDT
Created attachment 1146206 [details]
Contains only lines with: libvirt, NetworkManager, dnsmasq, systemd, virtual-router and virbr2
Comment 15 jean-christophe manciot 2016-04-12 02:48 EDT
Created attachment 1146207 [details]
syslog with only lines containing virbr2
Comment 16 jean-christophe manciot 2016-04-12 02:56:43 EDT
I also do not know the consequences of:
libvirtd[1518]: Duplicate Address Detection not finished in 20 seconds
libvirtd[1518]: this function is not supported by the connection driver: virConnectGetCPUModelNames
Comment 17 jean-christophe manciot 2016-04-12 03:26:05 EDT
I rebooted & confirmed that this issue happens randomly to another virtual net/bridge, this time to virtual-bridge-4/virbr4.
I started virtual-bridge-4 with:
sudo virsh net-start virtual-bridge-4

We can note from the log "syslog.only_virbr4_virtual-bridge-4.log" that this time:
NetworkManager[1025]: <info>  (virbr4): bridge port virbr4-nic was detached

I have no idea why it succeeds when done from the CLI.
Comment 18 jean-christophe manciot 2016-04-12 03:29 EDT
Created attachment 1146225 [details]
Syslog with only lines containing virbr4 and virtual-bridge-4 after reboot & manual start of the vnet
Comment 19 Cole Robinson 2016-04-12 08:55:32 EDT
Do those failing network configs have IPv6 in them? When there's a failure, is there always an error in the logs from libvirt about duplicate address detection timeout?
Comment 20 jean-christophe manciot 2016-04-13 01:35:20 EDT
Yes, the virtual networks not defined with IPv6 never fail.
Yes, it seems that the "Duplicate Address Detection not finished in 20 seconds" triggers the removal of the virtual bridge.
Also, "Interface virbrn.IPv4 no longer relevant for mDNS" always appears right after "Duplicate Address Detection not finished in 20 seconds", although there should not be any link in theory.

libvirtd[1119]: libvirt version: 1.3.3
libvirtd[1119]: hostname: samsung-ubuntu.actionmystique.net
libvirtd[1119]: Duplicate Address Detection not finished in 20 seconds
avahi-daemon[1016]: Interface virbr5.IPv4 no longer relevant for mDNS.
avahi-daemon[1016]: Leaving mDNS multicast group on interface virbr5.IPv4 with address 172.21.100.1.
kernel: [   20.678465] virbr5: port 1(virbr5-nic) entered disabled state
avahi-daemon[1016]: Withdrawing address record for 172.21.100.1 on virbr5.
avahi-daemon[1016]: Joining mDNS multicast group on interface virbr5-nic.IPv6 with address fe80::5054:ff:fe9a:a68c.
avahi-daemon[1016]: New relevant interface virbr5-nic.IPv6 for mDNS.
avahi-daemon[1016]: Registering new address record for fe80::5054:ff:fe9a:a68c on virbr5-nic.*.
avahi-daemon[1016]: Interface virbr5-nic.IPv6 no longer relevant for mDNS.
avahi-daemon[1016]: Leaving mDNS multicast group on interface virbr5-nic.IPv6 with address fe80::5054:ff:fe9a:a68c.
avahi-daemon[1016]: Withdrawing address record for fe80::5054:ff:fe9a:a68c on virbr5-nic.
kernel: [   20.847127] device virbr5-nic left promiscuous mode
kernel: [   20.847130] virbr5: port 1(virbr5-nic) entered disabled state
avahi-daemon[1016]: Withdrawing workstation service for virbr5-nic.
NetworkManager[975]: <info>  devices removed (path: /sys/devices/virtual/net/virbr5-nic, iface: virbr5-nic)
NetworkManager[975]: <info>  (virbr5-nic): device state change: activated -> unmanaged (reason 'removed') [100 10 36]
NetworkManager[975]: <warn>  (virbr5): failed to detach bridge port virbr5-nic
NetworkManager[975]: <warn>  (virbr5-nic): failed to disable userspace IPv6LL address handling
avahi-daemon[1016]: Withdrawing workstation service for virbr5.
NetworkManager[975]: <info>  (virbr5): device state change: activated -> unmanaged (reason 'removed') [100 10 36]
NetworkManager[975]: <info>  devices removed (path: /sys/devices/virtual/net/virbr5, iface: virbr5)

Full log filtered with libvirtd, virbr5 & IPv6 is attached.
Comment 21 jean-christophe manciot 2016-04-13 01:36 EDT
Created attachment 1146699 [details]
Syslog with only libvirtd, virbr5 & IPv6
Comment 22 jean-christophe manciot 2016-04-13 02:00:29 EDT
Is there a way to change the DAD timeout? Nothing like that appears in libvirtd.conf.

According to this post - https://www.redhat.com/archives/libvir-list/2015-October/msg00851.html -, it should equal to the following sum:
net.ipv6.conf.default.router_solicitation_delay + net.ipv6.conf.default.dad_transmits * net.ipv6.neigh.default.retrans_time_ms

On my system, that means 3s:
net.ipv6.conf.all.router_solicitation_delay = 1
net.ipv6.conf.default.dad_transmits = 1
net.ipv6.neigh.default.retrans_time_ms = 1000
Comment 23 Cole Robinson 2016-04-13 08:34:48 EDT
No, it's not configurable. But if you are building libvirt from source, you can manually adjust the code. If that works, maybe we should look at making it configurable

The line you need to edit is in src/util/virnetdev.c:

#define VIR_DAD_WAIT_TIMEOUT 20 /* seconds */

Try upping that to 120, recompile + install, and see what happens
Comment 24 jean-christophe manciot 2016-04-14 03:19:18 EDT
Changing VIR_DAD_WAIT_TIMEOUT to 60 makes the issue vanish.

However, I'm still puzzled: I have 5 virtual networks defined with IPv6; 
5x3x2=30 s (one DAD for each LLA and one for each static IPv6 address).
That would explain that 20 s are too short.

Why doesn't libvirtd test IPv6 DAD on all interfaces in parallel, instead of sequentially which seems to be the case?
Comment 25 Cole Robinson 2016-04-14 12:20:38 EDT
(In reply to jean-christophe manciot from comment #24)
> Changing VIR_DAD_WAIT_TIMEOUT to 60 makes the issue vanish.
> 
> However, I'm still puzzled: I have 5 virtual networks defined with IPv6; 
> 5x3x2=30 s (one DAD for each LLA and one for each static IPv6 address).
> That would explain that 20 s are too short.
> 
> Why doesn't libvirtd test IPv6 DAD on all interfaces in parallel, instead of
> sequentially which seems to be the case?

laine, thoughts?
Comment 26 Laine Stump 2016-04-14 13:36:15 EDT
It is true that libvirt does the setup for each network sequentially, not in parallel (it would be too chaotic to try to do them in parallel - think about adding the iptables rules), but the DAD timeout is per bridge, not the combined time for all bridges. I've found that setting a longer forwarding delay for a network (in the "bridge" element, usually set to 0) requires a longer timeout for DAD, but couldn't determine an equation to describe it, and since almost nobody uses a non-0 forwarding delay (iow, probably only me in a test setup), I didn't take the time to figure it out. My recollection is that DAD on a single bridge took around 7 seconds if the forwarding delay was 0, and didn't vary.

The stupid part of all this is that this DAD is happening when the bridge is first created, before there is anything at all attached to it, so there shouldn't be *any* addresses at all, much less duplicates. It would be better if we could disable DAD during the startup of the network. (or does DAD look for duplicate addresses on *all* interfaces of the machine? If so, maybe that's what's causing the increase in required time here).

I'd rather avoid adding a configuration knob for the timeout if at all possible - once something like that is in, we have to keep it in even if we later determine it wasn't necessary.


I'll try setting up a large number of networks with IPv6 and see if the time required for DAD changes as the network count increases.
Comment 27 jean-christophe manciot 2016-08-09 04:42:59 EDT
Any progress on that front as of 2.1.0?
Comment 28 Cole Robinson 2016-08-09 10:39:28 EDT
(In reply to jean-christophe manciot from comment #27)
> Any progress on that front as of 2.1.0?

None that I've seen
Comment 29 Laine Stump 2016-08-09 11:05:58 EDT
I found awhile back that the one thing that changed the DAD timing was setting an STP forwarding delay - the amount of time required for DAD to complete is some function of that value (but it isn't linear - the time for DAD increases much more quickly than the forward delay). Do your networks have a non-0 delay set? If so, can you try setting those to 0 (unless you have a very unusual network setup, a 0 forward delay should cause no problems).
Comment 30 jean-christophe manciot 2016-08-10 03:35:44 EDT
STP & DAD are 2 different things, although it is true that DAD can depend on STP timers.
STP is used to prevent loops in a Layer 2 network. The default forwarding delay of 15 s (STP) or 2 s (RSTP) is used in listening and learning states before the ports can reach the forwarding state, when DAD (and other networking communication) is allowed to start.
Setting the forwarding delay to 0 simulates "portfast/edge" ports usually facing a server or VM/container in our case which is useful when there is no risk of loop to speed up the initial setup of the port, avoiding the listening and learning STP port states.
If you use the Linux bridge(s) as standalone element(s), not connected to upstream ToR switches or to one another, i.e if there is no risk of loops, you can safely disable STP since it is not useful.
Otherwise, setting the forwarding delay to 0 should only be done on ports facing a VM/container, or you risk experiencing "broadcast storms" for instance.
Unfortunately, we don't have that level of control on Linux bridges (no portfast command for individual ports), so setting the forwarding delay to 0 means setting all ports as portfast, which is "dangerous" if the bridge is connected to other bridges in loops.
All my virtual networks have been created with virt-manager, which does not offer the possibility to enable/disable STP or change the timers.
So I left the default STP values which on Ubuntu are:
sudo showstp virbr0
virbr0
 bridge id		8000.000000000000
 designated root	8000.000000000000
 root port		   0			path cost		   0
 max age		  20.00			bridge max age		  20.00
 hello time		   2.00			bridge hello time	   2.00
 forward delay		   2.00			bridge forward delay	   2.00
 ageing time		 300.00
 hello timer		   1.31			tcn timer		   0.00
 topology change timer	   0.00			gc timer		 129.40
 
The DAD default value should take into account the STP default values since setting the forwarding delay to 0 is a special case.

Anyway, I'll try to manually remove STP on my Linux bridges, recompile libvirt (2.0.0 and not 2.1.0 which has an issue, cf. https://bugzilla.redhat.com/show_bug.cgi?id=1365607) with the default VIR_DAD_WAIT_TIMEOUT and see what happens.
Comment 31 jean-christophe manciot 2016-08-10 05:16:06 EDT
Trying to rebuild libvirt made me realize that something has changed:
- VIR_DAD_WAIT_TIMEOUT is not defined in src/util/virnetdev.c anymore, but in src/util/virnetdevip.c
- my build script was expecting it in the original C file to change its value (with sed) to 60 s, which means my first 2.0.0 build has been made with the default 20 s value without me being aware of this and with STP on all bridges.
- since I have not experienced the symptoms described in this thread, something ***has been changed in the code*** that solves this issue, although it does not clearly appear in the release notes.

As a conclusion, as far as I'm concerned, this issue is closed.
However, there is no way for me to close it with "FIXED" since this choice does not appear in the list below.
Comment 32 Laine Stump 2016-08-10 13:59:17 EDT
> All my virtual networks have been created with virt-manager, which
> does not offer the possibility to enable/disable STP or change the timers.

You can do that with "virsh net-edit $networkname" - just modify the settings in the <bridge> element, save the file, then destroy and restart the network (don't do it while any guests are connected to the network, as it destroys the guests' tap device connections to the bridge). The settings in the <bridge> element  default to stp='on' delay='0'/>, and the kernel does allow setting the forward delay to 0 *before STP has been turned on*. But when stp is enabled, forward delay is clamped to a minimum value of 2 seconds, so that's why you see a 2.

It's good that you brought this up, because although I had gone through all the investigation to figure this out a few years ago, I had forgotten, and this may explain why I couldn't see a linear relationship between STP setting and the time it takes for DAD to complete (on a simplistic level, you'd think that it would take [some base time] + the STP delay). Taking the minimum of 2 seconds forward delay into account may make it easier to compute a proper DAD wait timeout value. (this hasn't been urgent, since it's only for libvirt-created bridges, and almost nobody plays with the default STP settings - STP is essentially pointless when there is no L2 connection to any other network, and that's almost always true for libvirt virtual networks' bridges).

Note You need to log in before you can comment on or make changes to this bug.