Bug 905035 - NM seems to destroy bridge interface making my qemu/kvm guest unusable
Summary: NM seems to destroy bridge interface making my qemu/kvm guest unusable
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-28 12:31 UTC by Zdenek Kabelac
Modified: 2013-02-18 07:04 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-02-18 07:04:25 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
virsh net-dumpxml default (424 bytes, text/plain)
2013-01-28 13:12 UTC, Zdenek Kabelac
no flags Details
/etc/sysconfig/network-scripts/ifcfg-virbr0 (101 bytes, text/plain)
2013-01-28 13:13 UTC, Zdenek Kabelac
no flags Details
NM git20121130 syslog (3.02 KB, text/plain)
2013-01-31 19:31 UTC, Gene Czarcinski
no flags Details
NM git201211 syslog where IPv4 on virbrX is broken (6.33 KB, text/plain)
2013-01-31 19:32 UTC, Gene Czarcinski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 692880 0 None None None 2019-07-30 00:44:20 UTC

Description Zdenek Kabelac 2013-01-28 12:31:44 UTC
Description of problem:

I'm not really sure how to best describe my problem - but essentially NM seems to be killing IPv4 address on my  virbr0 bridge interface making my qemu guest
hardly usable with virtual network (the only usable path seems to be 'user' networking in qemu with all its known limitations).

Here are some logged message from /var/log/messages related to virbr0:

NetworkManager[10701]: <info> (wlan0): taking down device.
NetworkManager[10701]: <info> (virbr0): now unmanaged
kernel: [43592.709122] virbr0: port 2(vnet0) entered disabled state
NetworkManager[10701]: <info> (virbr0): device state change: disconnected -> unmanaged (reason 'removed') [30 10 36]
NetworkManager[10701]: <info> (virbr0): cleaning up...
NetworkManager[10701]: <info> (virbr0): taking down device.
NetworkManager[10701]: <info> (virtb0): now unmanaged
NetworkManager[10701]: <info> (virtb0): device state change: disconnected -> unmanaged (reason 'removed') [30 10 36]
NetworkManager[10701]: <info> (virtb0): cleaning up...
NetworkManager[10781]: <info> (virbr0): carrier is OFF
NetworkManager[10781]: <info> (virbr0): new Bridge device (driver: 'bridge' ifindex: 4)
NetworkManager[10781]: <info> (virbr0): exported as /org/freedesktop/NetworkManager/Devices/2
NetworkManager[10781]: <info> (virbr0): now managed
NetworkManager[10781]: <info> (virbr0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
NetworkManager[10781]: <info> (virbr0): bringing up device.
NetworkManager[10781]: <info> (virbr0): carrier now ON (device state 20)
NetworkManager[10781]: <info> (virbr0): deactivating device (reason 'managed') [2]
NetworkManager[10781]: (nm-device.c:4958):nm_device_queue_state: runtime check failed: (priv->queued_state.id == 0)
NetworkManager[10781]: <warn> /sys/devices/virtual/net/virbr0-nic: couldn't determine device driver; ignoring...
NetworkManager[10781]: <warn> failed to allocate link cache: (-10) Operation not supported
NetworkManager[10781]: <info> (virtb0): carrier is OFF
NetworkManager[10781]: <info> (virtb0): new Bridge device (driver: 'bridge' ifindex: 9)
NetworkManager[10781]: <info> (virtb0): exported as /org/freedesktop/NetworkManager/Devices/3
NetworkManager[10781]: <info> (virtb0): now managed
NetworkManager[10781]: <info> (virtb0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
NetworkManager[10781]: <info> (virtb0): bringing up device.
NetworkManager[10781]: <info> (virtb0): carrier now ON (device state 20)
NetworkManager[10781]: <info> (virtb0): deactivating device (reason 'managed') [2]
NetworkManager[10781]: (nm-device.c:4958):nm_device_queue_state: runtime check failed: (priv->queued_state.id == 0)
NetworkManager[10781]: <warn> /sys/devices/virtual/net/vnet0: couldn't determine device driver; ignoring...
NetworkManager[10781]: <warn> /sys/devices/virtual/net/lo: couldn't determine device driver; ignoring...
NetworkManager[10781]: <warn> /sys/devices/virtual/net/virbr0-nic: couldn't determine device driver; ignoring...
NetworkManager[10781]: <warn> /sys/devices/virtual/net/vnet0: couldn't determine device driver; ignoring...


And it seems there is no known to me way how to restart bridge afterwards.
So basically after the boot - my first qemu machine was ok - but 
after closing it - I could not restart it again with working network.

So as a workaround - I'm now living with

systemctl mask NetworkManager.service

and managing network myself via 'ifup' commands if I want to use qemu.

As a side note - I tried to add   ifcfg-virbr0 config file to:
/etc/sysconfig/network-scripts/

But I've figured out 

 NM_CONTROLLED=no 

has absolutelly no effect - unless 'possibly?' specified with HW mac address,
which is obviously generated randomly for virbr0 interface....


Version-Release number of selected component (if applicable):
NetworkManager-0.9.7.0-13.git20121211.fc19.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2013-01-28 13:12:25 UTC
Created attachment 688951 [details]
virsh net-dumpxml default

Comment 2 Zdenek Kabelac 2013-01-28 13:13:04 UTC
Created attachment 688952 [details]
/etc/sysconfig/network-scripts/ifcfg-virbr0

Comment 3 Zdenek Kabelac 2013-01-28 13:15:11 UTC
After boot  with   libvirtd.service  I have this setup 


# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:1c:25:14:4a:e2 brd ff:ff:ff:ff:ff:ff
    inet xx.xx.xxx.xxx/23 brd xx.xx.xxx.xxx scope global eth0
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:1c:bf:03:02:87 brd ff:ff:ff:ff:ff:ff
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 52:54:00:42:dc:3a brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc mq master virbr0 state DOWN qlen 500
    link/ether 52:54:00:42:dc:3a brd ff:ff:ff:ff:ff:ff
[root@linux kabi]# ifconfig 
eth0: flags=4163<AKTIVOVÁNO,VŠESMĚR,BĚŽÍ,MULTICAST>  mtu 1500
        inet xx.xx.xxx.xxx síťová_maska 255.255.254.0  všesměr xx.xx.xxx.xxx
        ether 00:1c:25:14:4a:e2 délka_odchozí_fronty 1000  (Ethernet)
        RX packetů 271  bajtů 182050 (177,7 KiB)
        RX chyb 0  zahozeno 0  přetečení 0  rámců 0
        TX packetů 207  bajtů 23241 (22,6 KiB)
        TX chyb 0  zahozeno 0  přetečení 0  přenos 0  kolizí 0
        device přerušení 20  paměť 0xfe000000–fe020000  

lo: flags=73<AKTIVOVÁNO,SMYČKA,BĚŽÍ>  mtu 65536
        inet 127.0.0.1 síťová_maska 255.0.0.0
        loop délka_odchozí_fronty 0  (Místní smyčka)
        RX packetů 20  bajtů 1944 (1,8 KiB)
        RX chyb 0  zahozeno 0  přetečení 0  rámců 0
        TX packetů 20  bajtů 1944 (1,8 KiB)
        TX chyb 0  zahozeno 0  přetečení 0  přenos 0  kolizí 0

virbr0: flags=4099<AKTIVOVÁNO,VŠESMĚR,MULTICAST>  mtu 1500
        inet 192.168.122.1 síťová_maska 255.255.255.0  všesměr 192.168.122.255
        ether 52:54:00:42:dc:3a délka_odchozí_fronty 0  (Ethernet)
        RX packetů 0  bajtů 0 (0,0 B)
        RX chyb 0  zahozeno 0  přetečení 0  rámců 0
        TX packetů 4  bajtů 850 (850,0 B)
        TX chyb 0  zahozeno 0  přetečení 0  přenos 0  kolizí 0

wlan0: flags=4099<AKTIVOVÁNO,VŠESMĚR,MULTICAST>  mtu 1500
        ether 00:1c:bf:03:02:87 délka_odchozí_fronty 1000  (Ethernet)
        RX packetů 0  bajtů 0 (0,0 B)
        RX chyb 0  zahozeno 0  přetečení 0  rámců 0
        TX packetů 0  bajtů 0 (0,0 B)
        TX chyb 0  zahozeno 0  přetečení 0  přenos 0  kolizí 0

Comment 4 David Jaša 2013-01-30 15:21:49 UTC
IMO the root cause is the use of MAC address as a primary identifier of interfaces; it changes when tap device is plugged to the bridge by qemu so when qemu exits and tap device is removed, NM doesn't recognize that the bridge device itself should be kept up.

Comment 5 Antoni Segura Puimedon 2013-01-30 16:14:49 UTC
It seems we keep hitting the issue of "MAC address as a primary identifier of interfaces" from different sides. Hopefully it will be solved soon.

Comment 6 Antoni Segura Puimedon 2013-01-30 16:15:56 UTC
@Dan Williams I think there is some other bug that has this as the root cause, maybe some can be marked as duplicates?

Comment 7 Pavel Šimerda (pavlix) 2013-01-30 17:43:22 UTC
Btw, the root cause is not about MAC addresses but about bridging. Me and Dan already agreed that this would be solved by not tearing down connections on virtual interfaces not configured for NetworkManager. Dan sometimes calls it “not doing bridges by default”.

Comment 8 Antoni Segura Puimedon 2013-01-30 23:36:19 UTC
I stand corrected :-)

Comment 9 Gene Czarcinski 2013-01-31 19:31:19 UTC
Created attachment 691198 [details]
NM git20121130 syslog

I had been running NM that I built from git20121130 tarball and everything worked fine.  When I saw the git20121211 package in rawhide (I am running Fedora 18), I stopped getting IPv4 addresses on my virbrX devices (IPv6 still worked) [I run libvirt-1.0.2]

Since I just gathered the data, I am attaching a syslog portion for git20121130 where things worked and a another syslog portion for git20121211 where it did not work.

Comment 10 Gene Czarcinski 2013-01-31 19:32:29 UTC
Created attachment 691199 [details]
NM git201211 syslog where IPv4 on virbrX is broken

Comment 11 Dan Williams 2013-01-31 21:37:11 UTC
We'll be defaulting the bridge support to "off" until we can be more cooperative with existing bridge configurations.

Comment 12 Zdenek Kabelac 2013-01-31 21:59:58 UTC
I should just add a simple workaround I'm currently using - just prior starting qemu:

ifconfig  virbr0 192.168.122.1

Comment 13 David Jaša 2013-02-01 10:46:24 UTC
(In reply to comment #11)
> We'll be defaulting the bridge support to "off" until we can be more
> cooperative with existing bridge configurations.

1) NM is taking down bridge that should be managed just by initscripts (NM_CONTROLLED=no in ifcfg-virbr0)
2) Zdeněk didn't turn bridge support on

Comment 14 Gene Czarcinski 2013-02-01 13:52:28 UTC
FYI comment on what was done/not done.

With the git20121211 NM, IPv4 was impacted but not IPv6 and I run most virtual networks dual stack IPv4/IPv6  and DHCP with both.

Either the "right thing" is being done for IPv6 or you are ignoring it.

Comment 15 David Jaša 2013-02-01 14:22:11 UTC
> Created attachment 688952 [details]
> /etc/sysconfig/network-scripts/ifcfg-virbr0

DEVICE=virbr0
TYPE=Bridge
HWADDR=52:54:00:42:DC:3A
...

My guess of what happens is:
1) NM sees the new bridge with different MAC than HWADDR configured in ifcfg-virbr0
2) because of 1), NM takes the virbr0 device for something else and thus applies its default configuration on it
3) when the link goes down on the device, NM deconfigures IP as well (default behaviour)

If I understand correctly, matching by MAC was the most reliable way of physical device identification for devices before 70-persistent-net.rules or new system consistent device names were available, because MAC was persistent (unless there was some manual change) and name was generated afterwards.
Linux bridges do behave opposite: when they are created, name is assigned to them, but MAC address can and often does change with any slave plug/unplug event so for bridges, HWADDR is probably the least reliable way to find out device identity.

If the statements above are true, NM should probably ignore HWADDR setting of devices with TYPE=Bridge and use just name for initial identification and interface number afterwards.

Note that all of these do happen with 'bridge suport defaulting to "off"', so the switch seems to be redundant and better device-configuration matching (that would really ensure that NM_CONTROLLED=no would also mean it) would enable to get rid of the switch and have bridge support available.

Comment 16 Laine Stump 2013-02-01 17:49:13 UTC
Before the talk about ifcfg-virbr0 and initscripts gets out of hand, I want to point out that:

1) libvirt creates virbr0 (and any other bridge it creates) itself at runtime using direct ioctl() calls, and does not create *any* ifcfg-* file.

2) the problem reported in this bug also occurs when the bridge in question has no associated ifcfg file.

3) the ifcfg-virbr0 file mentioned in this bug report was created manually by the reporter of the bug in response to a misguided suggestion for how to work around the problem.

I verified with him that the problem still existed after removing this erroneously added file and restarting everything.

NetworkManager is apparently noticing changes in *transient* bridges that are not mentioned anywhere in any persistent system network configuration.

What NM needs to do is simply not pay any attention to bridge devices that have no explicit persistent configuration; it may be okay/convenient to automatically notice physical devices that magically appear with no associated persistent config, but if a bridge, bond, or vlan device shows up, *somebody* did that as a conscious decision, and already has a plan for it - it makes no sense to try and co-opt those devices and try to auto-create a persistent (or semi-persistent) config, because it will almost (probably completely) definitely be wrong.

Comment 17 Laine Stump 2013-02-01 17:54:56 UTC
Oh, and in case all the verbiage obscures this basic reality - this is a serious regression, as it renders networking on virtual machines that use libvirt's virtual networks completely non-functional.

Whatever is done, it would be very helpful if it was done quickly :-)

Comment 18 David Jaša 2013-02-04 11:52:34 UTC
> Before the talk about ifcfg-virbr0 and initscripts gets out of hand, I want to point out that:
> ...

It essentially proves the same point: NM as of now is not able to identify bridge interfaces in reliable manner.

In addition, NM shouldn't "turn of bridge support by default" as it is not effective, instead it should default to "I see the bridge but I don't do any automatic action on it if it was created by someone else".

Comment 19 Dan Williams 2013-02-04 17:58:46 UTC
The solution we are going to land until we can more cooperatively handle bridges is this:

1) on startup, NM will ignore bridges that it did not create.  That should include virbr0.

2) if NM is told to touch a bridge that it did not create, even if there is configuration for that bridge, NM will refuse.

3) on shutdown, NM will save bridges it created to /run and re-read that configuration to better facilitate restarting the NM service; this file should be wiped on machine restart though, returning us to a clean state


So in summary, NM will only handle bridges that it's explicitly told to create via nmcli, ifup, or an applet.  NM will also auto-start bridges for which configuration exists, but the interface does not yet exist.

Comment 20 Pavel Šimerda (pavlix) 2013-02-04 21:38:56 UTC
(In reply to comment #19)
> 1) on startup, NM will ignore bridges that it did not create.  That should
> include virbr0.

Good.

> 2) if NM is told to touch a bridge that it did not create, even if there is
> configuration for that bridge, NM will refuse.

Is this just because we can't properly check whether the configuration is for NetworkManager? If not, is it really necessary? If yes, is it necessary to apply it to keyfile?

> 3) on shutdown, NM will save bridges it created to /run and re-read that
> configuration to better facilitate restarting the NM service; this file
> should be wiped on machine restart though, returning us to a clean state

Good. As it is runtime-only, it should be possible to change this behavior at any time.

> So in summary, NM will only handle bridges that it's explicitly told to
> create via nmcli, ifup, or an applet.  NM will also auto-start bridges for
> which configuration exists, but the interface does not yet exist.

Will NetworkManager (still talking about very-short-term plan) leave bridges when the respective connection is disconnected? Is /run storage intended to check the origin of pre-existing bridges?

Comment 21 Dan Williams 2013-02-05 18:00:36 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > 1) on startup, NM will ignore bridges that it did not create.  That should
> > include virbr0.
> 
> Good.
> 
> > 2) if NM is told to touch a bridge that it did not create, even if there is
> > configuration for that bridge, NM will refuse.
> 
> Is this just because we can't properly check whether the configuration is
> for NetworkManager? If not, is it really necessary? If yes, is it necessary
> to apply it to keyfile?

The principle here is not to touch any bridge that's managed by an external tool, until we can more cooperatively handle these interfaces.

If, for example, virbr0 is running when NM starts up, clearly that bridge was configured by something else.  And then if magically an ifcfg-virbr0 file appears, just because it has ONBOOT=yes or because somebody ran 'nmcli con up id virbr0", doesn't mean NM should start doing something to that bridge which is managed by an external tool.

This also ensure the bridge is in a known state when NM starts to control it, ie, that it has no ports and it's properties are correct.

Again, both of these rationales disappear when we can more cooperatively handle bridges.  Until then, I'd like to draw a clear line between what NM will and will not touch for bridges.

> > 3) on shutdown, NM will save bridges it created to /run and re-read that
> > configuration to better facilitate restarting the NM service; this file
> > should be wiped on machine restart though, returning us to a clean state
> 
> Good. As it is runtime-only, it should be possible to change this behavior
> at any time.

Yes, but in the future when we more cooperatively handle bridges obviously this all shouldn't be necessary.

> > So in summary, NM will only handle bridges that it's explicitly told to
> > create via nmcli, ifup, or an applet.  NM will also auto-start bridges for
> > which configuration exists, but the interface does not yet exist.
> 
> Will NetworkManager (still talking about very-short-term plan) leave bridges
> when the respective connection is disconnected? Is /run storage intended to
> check the origin of pre-existing bridges?

NM will still always leave the bridge interface created when it quits; NM will not (and I argue shouldn't ever) destroy the bridge interface, since it was explicitly created as a response to a user action (nmcli con up) or a user-requested action (ONBOOT=yes or autoconnect=true) and we have no good way to determine its life-cycle.  Later on we may add methods to the Device object to destroy the interface.

Comment 22 Fedora Update System 2013-02-09 06:42:46 UTC
network-manager-applet-0.9.7.997-1.fc18,NetworkManager-0.9.7.997-2.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/network-manager-applet-0.9.7.997-1.fc18,NetworkManager-0.9.7.997-2.fc18

Comment 23 Fedora Update System 2013-02-10 04:23:44 UTC
Package network-manager-applet-0.9.7.997-1.fc18, NetworkManager-0.9.7.997-2.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing network-manager-applet-0.9.7.997-1.fc18 NetworkManager-0.9.7.997-2.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-2218/network-manager-applet-0.9.7.997-1.fc18,NetworkManager-0.9.7.997-2.fc18
then log in and leave karma (feedback).

Comment 24 Fedora Update System 2013-02-18 07:04:28 UTC
network-manager-applet-0.9.7.997-1.fc18, NetworkManager-0.9.7.997-2.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.