Bug 1001186 - With AIO installer and NetworkManager enabled, the ovirtmgmt bridge is not properly configured
With AIO installer and NetworkManager enabled, the ovirtmgmt bridge is not pr...
Status: CLOSED WONTFIX
Product: ovirt-host-deploy
Classification: oVirt
Component: Plugins.VDSM (Show other bugs)
1.3.0
Unspecified Linux
high Severity high (vote)
: ovirt-3.6.1
: ---
Assigned To: Alon Bar-Lev
integration
: Reopened, Triaged
: 991087 1110875 1144084 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-26 12:13 EDT by Jason Brooks
Modified: 2016-01-04 00:39 EST (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-02 09:27:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sbonazzo: ovirt‑3.6.z?
ylavi: planning_ack?
ylavi: devel_ack?
ylavi: testing_ack?


Attachments (Terms of Use)
supervdsm log (23.14 KB, text/plain)
2015-08-18 11:23 EDT, Sagi Shnaidman
no flags Details
messages (166.88 KB, text/plain)
2015-08-18 11:24 EDT, Sagi Shnaidman
no flags Details

  None (edit)
Description Jason Brooks 2013-08-26 12:13:31 EDT
Description of problem:

On a Fedora 19 machine with NetworManager enabled (which is the default) the All in One installer fails to configure the ovirtmgmt network.

After stopping and disabling NetworkManager, and starting and enabling the network service, the AIO install performs as expected.

Version-Release number of selected component (if applicable):

oVirt 3.3RC, Fedora 19 minimal with all updates applied.

How reproducible:



Steps to Reproduce:
1. Run oVirt 3.3 AIO installer on stock F19 host.
2.
3.

Actual results:

ovirtmgmt network is created, and then rolled back, leaving the local vdsm host in a non-operational state.

Expected results:

Working local vdsm host.

Additional info:
Comment 1 Alon Bar-Lev 2013-08-26 12:40:02 EDT
Antoni,

Per our discussion, please ACK that we must disable network manager at engine machine as well... This is something I really tried to avoid.

Thanks!
Comment 2 Itamar Heim 2013-08-27 03:08:32 EDT
if we need to disable it, i think we need to give a warning to the user at beginning of installation.
i hope we can avoid this as well
Comment 3 Antoni Segura Puimedon 2013-08-27 03:29:29 EDT
Alon,

Do we really need to special case the AIO, the normal deployment already does the mask and stop, and the AIO is just a machine with engine that deploys to itself. Shouldn't that be enough?

If that is not enough, then, of course, the AIO installation must prompt a warning and proceed with the disabling.
Comment 4 Alon Bar-Lev 2013-08-27 03:39:09 EDT
(In reply to Antoni Segura Puimedon from comment #3)
> Alon,
> 
> Do we really need to special case the AIO, the normal deployment already
> does the mask and stop, and the AIO is just a machine with engine that
> deploys to itself. Shouldn't that be enough?
> 
> If that is not enough, then, of course, the AIO installation must prompt a
> warning and proceed with the disabling.

Engine machine is generic server, it is not used only for ovirt... the AIO is grow as you need setup. Changing the core configuration of the engine server is something that is not good for supportability.

I would really like to see some workaround so in this AIO configuration we can live with network manager for the basic setup.
Comment 5 Yedidyah Bar David 2013-08-27 10:32:50 EDT
Please see https://bugzilla.redhat.com/show_bug.cgi?id=991087#c4 and further comments there.

These might be duplicates, not sure.
Comment 6 Yedidyah Bar David 2013-08-28 09:18:42 EDT
Hi all,

Alon told me in private that the problem is that NetworkManager sometimes takes time
between us changing ifcfg-* to have NM_CONTROLLED=no and it taking the interface down,
and when it does, if it's too late, it breaks our setup.

He also told me that the time it takes might be short or long, where log is 5-10 seconds.

If so: I suggest to change vdsm, in the code that changes ifcfg-*, to:
1. Verify that the interface is managed by NM
2. Change it to NM_CONTROLLED=no
3. Wait until NM says it's unmanaged. This can be done with something like
nmcli dev list iface eth0 | grep -q 'GENERAL.STATE: .*unmanaged'
or some api calls or whatever (I do not know much about NM), in a loop, with some delay,
maximum, alert if more than max delay, etc.

To force a reproduction of this bug, based on the above, I did, at some point after
running engine-setup (but can also be done before starting it):

pid=$(pidof NetworkManager); kill -STOP $pid; inotifywait /etc/sysconfig/network-scripts/ifcfg-eth0 ; sleep 10; kill -CONT $pid

This caused NM to be suspended until 10 seconds after vdsm changed ifcfg-eth0.
Indeed, the network connection was dropped, and I had to manually fix it.

Comments are welcome.
Comment 7 Yedidyah Bar David 2013-08-28 09:19:14 EDT
*** Bug 991087 has been marked as a duplicate of this bug. ***
Comment 8 Alon Bar-Lev 2013-09-07 19:07:25 EDT
I really prefer vdsm will wait for network manager to take of the interface before continuing over disable network manager at engine host. This behavior can be specific for non production and per explicit configuration.
Comment 9 Yedidyah Bar David 2013-09-17 04:44:27 EDT
Any update on this one?
Comment 10 Dan Kenigsberg 2013-09-17 06:36:33 EDT
No one has started implementing Didi's suggestion. I feel a bit reluctant to add an non-deterministic wait on NetworkManager. I would prefer to have a synchronous "unmanage" NM API call. But since that's not going to happen any time soon, we should probably add the wait anyway.

The only question is when we should do it. Adding such an intrusive change to the ifup process seems too late for ovirt-3.3.0.

I suggest that we keep our request to turn off NetworkManager when configuring new networks (installation time included), and postpone this fix to the grand rebase of ovirt-3.3.1.
Comment 11 Itamar Heim 2013-10-08 02:21:28 EDT
so we are at the grand rebase, what's the plan now?
Comment 12 Dan Kenigsberg 2013-10-08 04:58:39 EDT
I did not manage to hack this on the master branch on time (the grand rebase took too much of my time).
Comment 13 Dan Kenigsberg 2013-10-14 09:53:17 EDT
I am told that the surprising NM behavior of taking the nic down, has been changed in upstream NM, by virtue of installing NetworkManager-servce package.
Comment 14 Antoni Segura Puimedon 2013-10-14 11:05:49 EDT
NetworkManager-config-server available on F20+.
Comment 15 Antoni Segura Puimedon 2013-11-08 05:30:48 EST
All the functional tests over dummies and bridged and bridgeless dhcp network configuring work with NM enabled and running on Fedora 20 (due to a newer version of Network Manager that doesn't take the devices down when unmanaging).
Comment 16 Alon Bar-Lev 2013-11-08 06:38:52 EST
(In reply to Antoni Segura Puimedon from comment #15)
> All the functional tests over dummies and bridged and bridgeless dhcp
> network configuring work with NM enabled and running on Fedora 20 (due to a
> newer version of Network Manager that doesn't take the devices down when
> unmanaging).

So should we stop disabling network manager on hosts?
Comment 17 Antoni Segura Puimedon 2013-11-08 07:20:39 EST
@Alon: I think that for F20 it'll be safe to do so.
Comment 18 Alon Bar-Lev 2013-11-08 07:22:47 EST
(In reply to Antoni Segura Puimedon from comment #17)
> @Alon: I think that for F20 it'll be safe to do so.

so we cannot close this bug as action is required.
Comment 19 Antoni Segura Puimedon 2013-11-08 07:50:36 EST
Changing to ovirt-host-deploy because there's action needed on that part to stop disabling network manager.
Comment 21 Alon Bar-Lev 2014-01-28 08:26:33 EST
Closed as NetworkManager seems to be fixed now.

Also we no longer create the bridge during deploy, so this issue is irrelevant.
Comment 22 Sandro Bonazzola 2014-07-25 10:25:01 EDT
Reopening, since now Fedora 20 is affected by the issue.
Comment 23 Sandro Bonazzola 2014-07-25 10:25:49 EDT
Sorry, EL7.
Comment 24 Sandro Bonazzola 2014-07-25 10:26:36 EDT
Sorry, EL7 too.
Comment 25 Simone Tiraboschi 2014-07-25 10:38:58 EDT
*** Bug 1110875 has been marked as a duplicate of this bug. ***
Comment 26 Dan Kenigsberg 2014-08-13 10:14:05 EDT
Alon, if NM is used on a host, we need NetworkManager-config-server to be installed. Can ovirt-host-deploy take care of this kind of soft requirement?
Comment 27 Alon Bar-Lev 2014-08-13 10:20:52 EDT
(In reply to Dan Kenigsberg from comment #26)
> Alon, if NM is used on a host, we need NetworkManager-config-server to be
> installed. Can ovirt-host-deploy take care of this kind of soft requirement?

how is it related to this bug?
why can't the vdsm package pull this?
Comment 28 Sandro Bonazzola 2015-02-11 02:07:03 EST
*** Bug 1110875 has been marked as a duplicate of this bug. ***
Comment 29 Sagi Shnaidman 2015-08-18 11:23:05 EDT
When using VDSM with NetworkManager there is a problem in their combination.

When VDSM makes ifdown to eth0, the NetworkManager interferes and brings it up, that cause other problems and finally the host has no network.
eth0 has a address as ovirtmgmt bridge and default route is via eth0:


[root@hehost vdsm]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000
    link/ether 00:aa:aa:aa:aa:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.2/24 brd 192.168.10.255 scope global dynamic eth0
       valid_lft 3579sec preferred_lft 3579sec
3: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 36:b7:b3:0d:f8:d0 brd ff:ff:ff:ff:ff:ff
4: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 3e:c9:a1:8b:c8:22 brd ff:ff:ff:ff:ff:ff
5: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 00:aa:aa:aa:aa:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.2/24 brd 192.168.10.255 scope global dynamic ovirtmgmt
       valid_lft 2066sec preferred_lft 2066sec
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN qlen 500
    link/ether fe:aa:aa:aa:aa:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcaa:aaff:feaa:aa02/64 scope link 
       valid_lft forever preferred_lft forever

[root@hehost vdsm]# ip ro
default via 192.168.10.1 dev eth0  proto static  metric 100 
169.254.0.0/16 dev ovirtmgmt  scope link  metric 1005 
192.168.10.0/24 dev ovirtmgmt  proto kernel  scope link  src 192.168.10.2 
192.168.10.0/24 dev eth0  proto kernel  scope link  src 192.168.10.2 

[root@hehost vdsm]# brctl show
bridge name	bridge id		STP enabled	interfaces
;vdsmdummy;		8000.000000000000	no		
ovirtmgmt		8000.00aaaaaaaa01	no		eth0
							vnet0

In logs you can see this in 16:41:17 timestamp.

we are talking about 3.6 on rhel 7.1:
packages versions:

[root@hehost vdsm]# rpm -qa | grep vdsm
vdsm-python-4.17.2-0.el7.noarch
vdsm-jsonrpc-4.17.2-0.el7.noarch
vdsm-4.17.2-0.el7.noarch
vdsm-infra-4.17.2-0.el7.noarch
vdsm-xmlrpc-4.17.2-0.el7.noarch
vdsm-yajsonrpc-4.17.2-0.el7.noarch
vdsm-cli-4.17.2-0.el7.noarch

[root@hehost vdsm]# rpm -qa | grep Network
NetworkManager-1.0.0-14.git20150121.b4ea599c.el7.x86_64
NetworkManager-tui-1.0.0-14.git20150121.b4ea599c.el7.x86_64
NetworkManager-config-server-1.0.0-14.git20150121.b4ea599c.el7.x86_64
NetworkManager-team-1.0.0-14.git20150121.b4ea599c.el7.x86_64
NetworkManager-libnm-1.0.0-14.git20150121.b4ea599c.el7.x86_64
Comment 30 Sagi Shnaidman 2015-08-18 11:23:39 EDT
Created attachment 1064364 [details]
supervdsm log
Comment 31 Sagi Shnaidman 2015-08-18 11:24:31 EDT
Created attachment 1064365 [details]
messages

Please note that NetworkManagerconfig-server was also installed
Comment 32 Sandro Bonazzola 2015-09-04 05:00:34 EDT
This is an automated message.
This Bugzilla report has been opened on a version which is not maintained anymore.
Please check if this bug is still relevant in oVirt 3.5.4.
If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution)
If it's an RFE please update the version to 4.0 if still relevant.
Comment 33 Barak 2015-09-07 08:58:30 EDT
Don't we disable NetworkManager on host deploy ?
Comment 34 Alon Bar-Lev 2015-09-07 13:27:23 EDT
(In reply to Barak from comment #33)
> Don't we disable NetworkManager on host deploy ?

AIO should behave as standard system, as engine and other application running on it, in this case network manager is not disabled to avoid unexpected configuration.
Comment 35 Barak 2015-09-07 15:50:50 EDT
Do we disable NetworkManager on  standard installation (regular host deploy) ?

AFAIR AIO uses standard host deploy at the end of it's execution.
Comment 36 Alon Bar-Lev 2015-09-07 15:53:43 EDT
(In reply to Barak from comment #35)
> Do we disable NetworkManager on  standard installation (regular host deploy)
> ?

yes.

> AFAIR AIO uses standard host deploy at the end of it's execution.

host-deploy is capable of customization you know...
Comment 37 Dan Kenigsberg 2015-09-08 05:47:14 EDT
*** Bug 1144084 has been marked as a duplicate of this bug. ***
Comment 38 Alon Bar-Lev 2015-10-20 13:13:43 EDT
Not sure I understand:
"""
 Barak 2015-10-20 13:07:56 EDT
Assignee: danken@redhat.comalonbl@redhat.com
Whiteboard: network → integration
"""

If network manager is to disabled permanently, it should be done so via the vdsm service, manual/puppet installation of vdsm should also result in valid setup.

Also, as I wrote, in engine we cannot disable network manager, this will make engine machine unsupportable.

Finally, the AIO setup is depreciated in favour of the hosted engine as far as I understand.
Comment 39 Sandro Bonazzola 2015-10-26 08:18:33 EDT
Won't be in 3.6.0. Re-targeting to 3.6.1.
Comment 40 Yaniv Lavi (Dary) 2015-11-02 09:27:32 EST
Can you add a prerequisite to the wiki on disabling network manager?
Comment 41 Sandro Bonazzola 2015-11-16 10:42:35 EST
(In reply to Yaniv Dary from comment #40)
> Can you add a prerequisite to the wiki on disabling network manager?

Not sure, on FC24 network service will stop to exist due to sysvinit scripts support removal and only NetworkManager will exist. So we can ignore / workaround this issue only for 6 months.
Comment 42 Yaniv Lavi (Dary) 2015-11-17 04:49:33 EST
(In reply to Sandro Bonazzola from comment #41)
> (In reply to Yaniv Dary from comment #40)
> > Can you add a prerequisite to the wiki on disabling network manager?
> 
> Not sure, on FC24 network service will stop to exist due to sysvinit scripts
> support removal and only NetworkManager will exist. So we can ignore /
> workaround this issue only for 6 months.

Then AIO will not be supported on FC24.

Note You need to log in before you can comment on or make changes to this bug.