Bug 1162822 - ifup of bridge with STP=on fails (even when DELAY=0)
Summary: ifup of bridge with STP=on fails (even when DELAY=0)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: initscripts
Version: rawhide
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lukáš Nykrýn
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-11 18:58 UTC by Laine Stump
Modified: 2014-11-20 23:02 UTC (History)
4 users (show)

Fixed In Version: initscripts-9.51-3.fc20
Clone Of:
Environment:
Last Closed: 2014-11-12 13:23:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
patch against current upstream git of initscripts (2.74 KB, text/plain)
2014-11-11 19:03 UTC, Laine Stump
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1105012 1 None None None 2021-01-20 06:05:38 UTC

Internal Links: 1105012

Description Laine Stump 2014-11-11 18:58:35 UTC
with NM disabled and the network service enabled, the following standard bridge configuration fails ifup every time:

[root@localhost 1105012]# cat /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
ONBOOT=no
HOTPLUG=yes
TYPE=Bridge
BOOTPROTO=dhcp
STP=on
DELAY=0
[root@localhost 1105012]# cat /etc/sysconfig/network-scripts/ifcfg-enp2s0 
DEVICE=enp2s0
HWADDR=00:11:22:33:44:55
ONBOOT=no
HOTPLUG=yes
BRIDGE=br0

# ifup enp2s0; ifup br0;

Determining IP information for br0... failed; no link present.  Check cable?


Since this was originally reported wrt using libvirt's "virsh
iface-start" command (which calls a function in the netcf library), I
at first thought there might be a problem with the order that netcf
was ifup'ing the interfaces - in a discussion somewhere I'd seen
someone mention that they were ifup'ing the bridge first, then the
ethernets, which is the opposite of what netcf does.

But manual experimentation shows that netcf is doing it in the correct
order, and (as was suggested by someone triaging the original bug
report) adding a sufficiently large LINKDELAY to ifcfg-br0 does solve
the problem. However, we should not require every existing
installation with a bridge device and STP enabled to modify their
config. Instead, initscripts' ifup should properly account for this
needed delay when it notices that STP is enabled.

For the record, here is the sequence of events that leads to the problem:

1) "ifup $ether" calls /etc/sysconfig/network-scripts/ifup-eth; it
does this:

  1a) auto-create the $bridge *with an implicit 0 forward delay* but
      still "down".
  1b) "ip link set dev $ether up"
  1b) sleep for $LINKDELAY seconds (as set in the ifcfg-$ether, NOT
      the ifcfg-$bridge)
  1c) brctl addif -- $bridge $ether

(at this point if you look at "brctl showstp $bridge" you'll see that
the $ether port is in "disabled" state)

2) "ifup $bridge" - this again ends up in
/etc/sysconfig/network-scripts/ifup-eth, which:

  2a) (doesn't create the bridge device, because it was already
      auto-created in step (1a).
  2b) sets a forward delay and other bridge options according to
      ifcfg-$bridge

  2c) *IF* the device has "BOOTPROTO=dhcp", it goes into a loop
      waiting for up to LINKDELAY seconds until
      /sys/class/net/$bridge/carrier contains "1" rather than "0".
      (NB: this will happen as soon as at least one device attached to
      the bridge is in "forwarding" state.)

Experimentation shows that when STP is enabled on the bridge, step 2c
takes *at least* ${DELAY} * 2 + 5 seconds, and sometimes as much as
$DELAY * 2 + 6.5 seconds. But when no LINKDELAY is set,
check_link_down() only waits for 5 seconds, so it will *always*
fail. (this happens regardless of how much time passes between the
first and second ifup invocations; also note that doing the ifups in
the opposite order woul also always fail, since carrier would *never*
go up on the bridge device if it had nothing attached).

Since I'm fairly certain that people have been configuring bridges
with a non-0 DELAY for many years and haven't previously encountered
this problem, I would class this as a regression in the behavior of
ifup that must be resolved.

Comment 1 Laine Stump 2014-11-11 19:03:15 UTC
Created attachment 956390 [details]
patch against current upstream git of initscripts

This patch causes ifup to wait at least this long for carrier on a bridge device when STP is enabled. This has caused all tests I've tried for differing values of STP, DELAY, and LINKDELAY to succeed.

Note that although I filed this BZ against rawhide, the problem exists at least as far back as F20, as well as in RHEL7 and CentOS7 (I haven't checked RHEL6, but think that it *isn't* a problem there) so it should be backported to all of those releases.

Comment 2 Fedora Update System 2014-11-12 13:05:03 UTC
initscripts-9.56.1-4.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/initscripts-9.56.1-4.fc21

Comment 3 Fedora Update System 2014-11-12 13:15:49 UTC
initscripts-9.51-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/initscripts-9.51-3.fc20

Comment 5 Fedora Update System 2014-11-16 14:45:33 UTC
initscripts-9.56.1-4.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 6 Fedora Update System 2014-11-20 23:02:21 UTC
initscripts-9.51-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.