Bug 1279285
Summary: | Unable to start virbr0 "forward_delay: Numerical result out of range" in kernel 4.4.0-0.rc0.git4.2.fc24.x86_64 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | York Possemiers <ypossem> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | agedosier, atodorov, berrange, clalancette, gansalmon, itamar, jfilak, jforbes, jonathan, jtomko, kernel-maint, laine, libvirt-maint, madhu.chinakonda, mchehab, veillard, virt-maint, vyasevic | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-11-17 23:24:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
York Possemiers
2015-11-09 03:08:00 UTC
Created attachment 1091510 [details]
Python trace upon attempting to start bridge
Does not seem to affect other system bridges made with real interfaces If memory serves me correctly, it has always been the case that the forward_delay can't be set to 0 after a bridge has had STP turned on. This is why libvirt sets the forward delay before enabling STP. There are two possibilities: 1) maybe the bridge module previously created bridges with STP off, and now creates them with STP on, so that when libvirt tries to set the delay to 0, it fails. (This seems unlikely, since the kernel people usually try to avoid changing defaults) 2) maybe the bridge module previously bypassed the range check for forward_delay when STP was off, and now it checks whether STP is off or on. I just tried manually setting forward_delay on a bridge with STP on and off (on F22) and found that while it is possible to set it to 0 with STP off, as soon as STP is turned on, it is changed to "200" (that value is in "jiffies", which afaiu is msec/10, so this is equivalent to 2 seconds). To get your network working again, you should be able to just edit it and change "delay='0'" to "delay='2'". We do need to think about how to prevent this from failing without manual intervention though - simply changing libvirt to setting delay to a minumum of 2 seconds won't do any good (since there are so many existing networks that have it set to 0), so I guess we need to modify the function that sets forward_delay to silently increase the setting to 2 seconds. I dislike lying to users about what is happening, but then it seems we've already been lying to them for a very long time anyway :-/ It looks like 2) is the case: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34c2d9f commit 34c2d9fb0498c066afbe610b15e18995fd8be792 Author: Ian Wilson <iwilson> AuthorDate: 2015-09-24 11:20:11 -0700 Commit: David S. Miller <davem> CommitDate: 2015-09-27 19:09:38 -0700 bridge: Allow forward delay to be cfgd when STP enabled Allow bridge forward delay to be configured when Spanning Tree is enabled. Signed-off-by: Ian Wilson <iwilson> Signed-off-by: Stephen Hemminger <stephen> Signed-off-by: David S. Miller <davem> Looking at that patch, in my opinion the comment doesn't match the code. As far as I can see, it doesn't "allow the forward delay to be configured when STP is enabled" (which was already possible without this patch); the behavior change is instead to force a range check on the desired value when STP is *disabled*. So there is nothing about that comment that is correct. And now that I've looked/thought some more, I don't think libvirt should have to include any "magic number" for limiting the delay. The bridge module previously allowed any value to be set for forward_delay as long as STP was disabled, and would then clamp it into its internally/privately defined allowable range when STP was turned on (from net/bridge/br_stp_if.c): static void br_stp_start(struct net_bridge *br) ... if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY) __br_set_forward_delay(br, BR_MIN_FORWARD_DELAY); else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY) __br_set_forward_delay(br, BR_MAX_FORWARD_DELAY); Since BR_(MIN|MAX)_FORWARD_DELAY are defined privately by the kernel (in net/bridge/br_private_stp.h), the only way for libvirt or any other userspace program to assure that the value it wants to set is within limits was previously: 1) set the value with STP disabled, then enable STP and let the bridge module clamp the value to something acceptable or 2) read the kernel source and copy/paste the min/max from there into the userspace code. With the kernel change Jan pointed out in Comment 4, the only possibility left is (2), and I don't think that's acceptable. In my opinion, the kernel should revert that patch, since it is modifying the semantics of the setting of forward_delay, and will likely cause regressions in other places (and it doesn't even accomplish its advertised intent anyway.) Vlad has posted a revert to the offending patch to netdev.org: http://lists.openwall.net/netdev/2015/11/10/20 Hopefully it will be pushed into the upstream kernel and make it back down to rawhide quickly. In the meantime, rawhide users should use "virsh net-edit default" to change "delay='0'" to "delay='2'" (and do the same for other libvirt networks that aren't starting). I'm moving this BZ to kernel, since that is where the problem is / fix will be. Looks like the update to 4.4.0-0.rc1.git0.1 has fixed this *** Bug 1283241 has been marked as a duplicate of this bug. *** |