Description of problem:
Unable to start bridges made with libvirt in this new kernel
Version-Release number of selected component (if applicable):
libvirtd (libvirt) 1.2.20
Steps to Reproduce:
2. Open Virtual Networks in virt-manager
3. Attempt to start bridge virbr0 (the default as installed)
Error starting network 'default': Unable to set bridge virbr0 forward_delay: Numerical result out of range
bridge is started on boot
Above libvirt versions work in kernel 4.3.0-0.rc7.git0.1.fc24.x86_64
Created attachment 1091510 [details]
Python trace upon attempting to start bridge
Does not seem to affect other system bridges made with real interfaces
If memory serves me correctly, it has always been the case that the forward_delay can't be set to 0 after a bridge has had STP turned on. This is why libvirt sets the forward delay before enabling STP. There are two possibilities:
1) maybe the bridge module previously created bridges with STP off, and now creates them with STP on, so that when libvirt tries to set the delay to 0, it fails. (This seems unlikely, since the kernel people usually try to avoid changing defaults)
2) maybe the bridge module previously bypassed the range check for forward_delay when STP was off, and now it checks whether STP is off or on.
I just tried manually setting forward_delay on a bridge with STP on and off (on F22) and found that while it is possible to set it to 0 with STP off, as soon as STP is turned on, it is changed to "200" (that value is in "jiffies", which afaiu is msec/10, so this is equivalent to 2 seconds).
To get your network working again, you should be able to just edit it and change "delay='0'" to "delay='2'".
We do need to think about how to prevent this from failing without manual intervention though - simply changing libvirt to setting delay to a minumum of 2 seconds won't do any good (since there are so many existing networks that have it set to 0), so I guess we need to modify the function that sets forward_delay to silently increase the setting to 2 seconds. I dislike lying to users about what is happening, but then it seems we've already been lying to them for a very long time anyway :-/
It looks like 2) is the case:
Author: Ian Wilson <firstname.lastname@example.org>
AuthorDate: 2015-09-24 11:20:11 -0700
Commit: David S. Miller <email@example.com>
CommitDate: 2015-09-27 19:09:38 -0700
bridge: Allow forward delay to be cfgd when STP enabled
Allow bridge forward delay to be configured when Spanning Tree is enabled.
Signed-off-by: Ian Wilson <firstname.lastname@example.org>
Signed-off-by: Stephen Hemminger <email@example.com>
Signed-off-by: David S. Miller <firstname.lastname@example.org>
Looking at that patch, in my opinion the comment doesn't match the code. As far as I can see, it doesn't "allow the forward delay to be configured when STP is enabled" (which was already possible without this patch); the behavior change is instead to force a range check on the desired value when STP is *disabled*. So there is nothing about that comment that is correct.
And now that I've looked/thought some more, I don't think libvirt should have to include any "magic number" for limiting the delay. The bridge module previously allowed any value to be set for forward_delay as long as STP was disabled, and would then clamp it into its internally/privately defined allowable range when STP was turned on (from net/bridge/br_stp_if.c):
static void br_stp_start(struct net_bridge *br)
if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY)
else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY)
Since BR_(MIN|MAX)_FORWARD_DELAY are defined privately by the kernel (in net/bridge/br_private_stp.h), the only way for libvirt or any other userspace program to assure that the value it wants to set is within limits was previously:
1) set the value with STP disabled, then enable STP and let the bridge module clamp the value to something acceptable
2) read the kernel source and copy/paste the min/max from there into the userspace code.
With the kernel change Jan pointed out in Comment 4, the only possibility left is (2), and I don't think that's acceptable. In my opinion, the kernel should revert that patch, since it is modifying the semantics of the setting of forward_delay, and will likely cause regressions in other places (and it doesn't even accomplish its advertised intent anyway.)
Vlad has posted a revert to the offending patch to email@example.com:
Hopefully it will be pushed into the upstream kernel and make it back down to rawhide quickly. In the meantime, rawhide users should use "virsh net-edit default" to change "delay='0'" to "delay='2'" (and do the same for other libvirt networks that aren't starting).
I'm moving this BZ to kernel, since that is where the problem is / fix will be.
Looks like the update to 4.4.0-0.rc1.git0.1 has fixed this
*** Bug 1283241 has been marked as a duplicate of this bug. ***