Bug 612068

Summary: netcf: failure bringing up new bridge: 'ifup br0' failed with exit code 1
Product: Red Hat Enterprise Linux 6 Reporter: dyuan
Component: netcfAssignee: Laine Stump <laine>
Status: CLOSED NOTABUG QA Contact: qe-baseos-daemons
Severity: urgent Docs Contact:
Priority: low    
Version: 6.0CC: berrange, herbert.xu, llim, twoerner, tyan, weizhan, xen-maint, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-06 14:02:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virt-manager --debug none

Description dyuan 2010-07-07 08:48:40 UTC
Description of problem:

Add a bridge failed, at the same time the physical net port which link the test
machine down.    



Version-Release number of selected component (if applicable):
virt-manager-0.8.4-6.el6


How reproducible:
2/2


Steps to Reproduce:
1. click 'Edit','Host Details','Network Interfaces'
2. click 'add','Bridge',Name->'br0', choose 'eth0'
3. click 'Finish'
  
Actual results:
create failed, physical net port which link the test machine down.

** When I create bridge by edit ifcfg-br0 and ifcfg-eth0 directly, it can work
well and show correctly in virt-manager. 


Expected results:
create successfully.


Additional info:

# tail -f /var/log/messages

Jul  7 15:52:11 dhcp-66-70-146 dhclient: No DHCPOFFERS received.
Jul  7 15:52:11 dhcp-66-70-146 dhclient: Trying recorded lease 10.66.70.146
Jul  7 15:52:11 dhcp-66-70-146 avahi-daemon[1564]: Joining mDNS multicast group
on interface br0.IPv4 with address 10.66.70.146.
Jul  7 15:52:11 dhcp-66-70-146 avahi-daemon[1564]: New relevant interface
br0.IPv4 for mDNS.
Jul  7 15:52:11 dhcp-66-70-146 avahi-daemon[1564]: Registering new address
record for 10.66.70.146 on br0.IPv4.


Jul  7 15:52:14 dhcp-66-70-146 avahi-daemon[1564]: Withdrawing address record
for 10.66.70.146 on br0.
Jul  7 15:52:14 dhcp-66-70-146 avahi-daemon[1564]: Leaving mDNS multicast group
on interface br0.IPv4 with address 10.66.70.146.
Jul  7 15:52:14 dhcp-66-70-146 avahi-daemon[1564]: Interface br0.IPv4 no longer
relevant for mDNS.
Jul  7 15:52:14 dhcp-66-70-146 avahi-daemon[1564]: Withdrawing address record
for fe80::6ef0:49ff:fe27:c19 on br0.
Jul  7 15:52:14 dhcp-66-70-146 libvirtd: 15:52:14.059: error :
interfaceCreate:473 : internal error failed to create (start) interface br0
(netcf: failed to execute external program - Running 'ifup br0' failed with
exit code 1)


# tail -f /root/.virt-manager/virt-manager.log
  <start mode='none'/>
  <protocol family='ipv4'>
    <dhcp/>
  </protocol>
  <bridge stp='on' delay='0'>
    <interface name='eth0' type='ethernet'/>
  </bridge>
</interface>

[Wed, 07 Jul 2010 15:50:51 virt-manager 3717] DEBUG (engine:423) Tick is slow,
not running at requested rate.
[Wed, 07 Jul 2010 15:52:14 virt-manager 3717] DEBUG (error:86) Uncaught Error:
Error creating interface: 'Could not create interface: internal error failed to
create (start) interface br0 (netcf: failed to execute external program -
Running 'ifup br0' failed with exit code 1)' : Error creating interface: '<type
'exceptions.RuntimeError'> Could not create interface: internal error failed to
create (start) interface br0 (netcf: failed to execute external program -
Running 'ifup br0' failed with exit code 1)
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/createinterface.py", line 1156, in
do_install
    self.interface.install(meter, create=activate)
  File "/usr/lib/python2.6/site-packages/virtinst/Interface.py", line 258, in
install
    raise RuntimeError(errmsg)
RuntimeError: Could not create interface: internal error failed to create
(start) interface br0 (netcf: failed to execute external program - Running
'ifup br0' failed with exit code 1)

Comment 5 RHEL Program Management 2010-07-15 14:54:55 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 8 dyuan 2010-07-20 03:28:18 UTC
Created attachment 433053 [details]
virt-manager --debug

steps:
1.create the bridge device with no error when 'Activate now' is not selected.

2.start bridge manually via virt-manager with error output.

3.br0 's state is 'Active', but cannot get ip.
# cat ifcfg-eth0 
DEVICE=eth0
ONBOOT=yes
BRIDGE=br0
# cat ifcfg-br0 
DEVICE=br0
ONBOOT=yes
TYPE=Bridge
BOOTPROTO=dhcp
STP=on
DELAY=0

# service network restart
Shutting down interface br0:                               [  OK  ]
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface br0:  
Determining IP information for br0... failed.
                                                           [FAILED]

4.rm ifcfg-br0, edit ifcfg-eth0, then start network failed with no link present error.

# more ifcfg-eth0 
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp

# service network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... failed; no link present.  Check cable?
                                                           [FAILED]

Comment 10 Laine Stump 2010-08-05 21:46:10 UTC
1) It sounds like when the test with virt-manager was run, NetworkManager may have been active. With currently available NetworkManager, bringing up a bridge device is guaranteed to fail is NM is active. You must disabled NM.

2) There are also currently issues with selinux when in enforcing mode. Check /var/log/audit/audit.log and /var/log/messages for AVC messages. If there are any, do "setenforce permissive" and try again.

Bug 619849 and bug 621499 are tracking selinux problems related to netcf. In particular, nug 619849 has to do with an inability to run brctl when selinux is enforcing.

3) In the end, however, you demonstrate that "service network start" fails to bring up the devices because link is down on the interface. At the same time, the ifcfg-eth0 and ifcfg-br0 are 100% correct. If the lower level utilities are unable to bring up the interfaces. This indicates that virt-manager, libvirt, and netcf are all doing the right thing.

Can you please do the following:

1) first, get the link problem with your ethernet fixed, remove ifcfg-br0, and set ifcfg-eth0 back to

  DEVICE=eth0
  ONBOOT=yes
  BOOTPROTO=dhcp

and verify that you have connectivity on eth0.

2) service NetworkManager stop
   setenforce permissive

3) now attempt to create the bridge in virt-manager.

If it is successful, your problem was either NetworkManager running, selinux denying permission to run brctl, or that the link on your ethernet port was down.

Comment 11 dyuan 2010-08-06 03:29:23 UTC
(In reply to comment #10)
> 1) It sounds like when the test with virt-manager was run, NetworkManager may
> have been active. With currently available NetworkManager, bringing up a bridge
> device is guaranteed to fail is NM is active. You must disabled NM.

====
I brought up a bridge successfully when NM was active in previous testing more than once. 
But as you mentioned, with currently available NM, must disable it when bringing up a bridge, works as designed ? if it is, I think it should pop-up a prompt for end-user to check NM. if not, adding note in somewhere will be better :-)

> 
> 2) There are also currently issues with selinux when in enforcing mode. Check
> /var/log/audit/audit.log and /var/log/messages for AVC messages. If there are
> any, do "setenforce permissive" and try again.
> 
> Bug 619849 and bug 621499 are tracking selinux problems related to netcf. In
> particular, nug 619849 has to do with an inability to run brctl when selinux is
> enforcing.
> 
> 3) In the end, however, you demonstrate that "service network start" fails to
> bring up the devices because link is down on the interface. At the same time,
> the ifcfg-eth0 and ifcfg-br0 are 100% correct. If the lower level utilities are
> unable to bring up the interfaces. This indicates that virt-manager, libvirt,
> and netcf are all doing the right thing.
> 
> Can you please do the following:
> 
> 1) first, get the link problem with your ethernet fixed, remove ifcfg-br0, and
> set ifcfg-eth0 back to
> 
>   DEVICE=eth0
>   ONBOOT=yes
>   BOOTPROTO=dhcp
> 
> and verify that you have connectivity on eth0.
> 

step 1: can reach google.com via eth0
# cat ifcft-eth0
DEVICE="eth0"
BOOTPROTO="dhcp"
HWADDR="6C:F0:49:27:0C:06"
NM_CONTROLLED="yes"
ONBOOT="yes"

> 2) service NetworkManager stop
>    setenforce permissive
> 

step 2:
# service NetworkManager stop
# setenfoce 0
# getenforce
Permissive

> 3) now attempt to create the bridge in virt-manager.
> 
step 3: create bridge in virt-manager
pop-up an error:
Error starting interface 'br0': internal error failed to create
(start) interface br0 (netcf: failed to execute external program - Running
'ifup br0' failed with exit code 1)    

> If it is successful, your problem was either NetworkManager running, selinux
> denying permission to run brctl, or that the link on your ethernet port was
> down.    

step 4: set eth0 back as step 1
# service network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... failed; no link present.  Check cable?
                                                           [FAILED]    

Ethernet port is down...

Additional info:
The Network Team told me that the issue is 'you send bpdu.'

Comment 13 Laine Stump 2010-08-06 06:01:32 UTC
(as far as the incompatibility with NM, we are working on that. Hopefully you will soon be able to create bridges and vlans without disabling NM)

Thatnks for going through all that; it helps to eliminate a lot of possible failure points.

So, when the bridge is enabled it is sending out some sort of control packet that (as far as I can understand from what I just read) tells the rest of the network that this bridge wants to be the "root" bridge. Someone who understands the internals of the bridge would be better able to explain that than me; I'm Cc'ing Herbert Xu and Thomas Woerner, since they know how bridging works under the covers.

Herbert or Thomas: could this be the result of setting a forward delay of 0 and or turning on STP? Is there any other config option we need to modify to prevent these "BDPU" packets from being transmitted?

(The delay is being set to 0, btw, because the bridge is only connected to a single physical interface, and to the virtual guests, who begin trying to acquire an IP address within less than a second of being connected to the bridge; if we leave it at the default of 15 (meaning it's 30 seconds before any packets are forwarded, many dhcp clients just give up and fail (eg, PXE boot doesn't work).

BTW, you can see the error output of "ifup" by first creating the bridge in virt-manager with "Activate Now" *not* checked (get the port re-enabled first! ;-), and then from a shell prompt, give the command:

   ncftool ifup br0

This will allow you to see more than just the exit code (although I'm not sure exactly what it will show in this case; possibly just that the link is down)

Comment 14 Herbert Xu 2010-08-06 09:04:23 UTC
It's a case of if it hurts don't do it.

Having a physical bridge that disables a port upon seeing a BPDU and then enabling STP on that port so that a BPDU is sent is ...

Comment 15 Laine Stump 2010-08-06 14:02:18 UTC
Okay, now I understand. Thanks, Herbert!

To summarize (now that I've read some STP documentation), BPDU packets are what bridges participating in STP use to determine the topology of the network. Any time STP is enabled, the bridge will send BPDU packets.

In this case, the admin of the switch that we're connecting to has decided that nothing on that particular port can participate in STP, and rather than enforcing that by just blocking/ignoring BPDU packets, it has decided to disable the port if it ever sees one.

So there are 2 possible solutions: 1) convince the switch admin to allow participation in STP (I don't really see any gain here, but it would solve the problem ;-), or 2) set "stp='off'" in br0's definition.

I wonder if we should make all the examples and defaults have stp='off'.

Comment 16 Daniel Berrangé 2010-08-06 14:10:59 UTC
Defaulting to stp=off would surely leave open the possibility of creating network loops, which could take out the whole network segment. IMHO that's worse than having just one single port disabled due to admin policy.

Comment 17 dyuan 2010-08-09 02:53:28 UTC
Disable STP in virt-manager manually, create bridge successfully.:-)
The default settings in virt-manager will be tested in other topology without BPDU Guard.