Description of problem: On a Fedora 19 machine with NetworManager enabled (which is the default) the All in One installer fails to configure the ovirtmgmt network. After stopping and disabling NetworkManager, and starting and enabling the network service, the AIO install performs as expected. Version-Release number of selected component (if applicable): oVirt 3.3RC, Fedora 19 minimal with all updates applied. How reproducible: Steps to Reproduce: 1. Run oVirt 3.3 AIO installer on stock F19 host. 2. 3. Actual results: ovirtmgmt network is created, and then rolled back, leaving the local vdsm host in a non-operational state. Expected results: Working local vdsm host. Additional info:
Antoni, Per our discussion, please ACK that we must disable network manager at engine machine as well... This is something I really tried to avoid. Thanks!
if we need to disable it, i think we need to give a warning to the user at beginning of installation. i hope we can avoid this as well
Alon, Do we really need to special case the AIO, the normal deployment already does the mask and stop, and the AIO is just a machine with engine that deploys to itself. Shouldn't that be enough? If that is not enough, then, of course, the AIO installation must prompt a warning and proceed with the disabling.
(In reply to Antoni Segura Puimedon from comment #3) > Alon, > > Do we really need to special case the AIO, the normal deployment already > does the mask and stop, and the AIO is just a machine with engine that > deploys to itself. Shouldn't that be enough? > > If that is not enough, then, of course, the AIO installation must prompt a > warning and proceed with the disabling. Engine machine is generic server, it is not used only for ovirt... the AIO is grow as you need setup. Changing the core configuration of the engine server is something that is not good for supportability. I would really like to see some workaround so in this AIO configuration we can live with network manager for the basic setup.
Please see https://bugzilla.redhat.com/show_bug.cgi?id=991087#c4 and further comments there. These might be duplicates, not sure.
Hi all, Alon told me in private that the problem is that NetworkManager sometimes takes time between us changing ifcfg-* to have NM_CONTROLLED=no and it taking the interface down, and when it does, if it's too late, it breaks our setup. He also told me that the time it takes might be short or long, where log is 5-10 seconds. If so: I suggest to change vdsm, in the code that changes ifcfg-*, to: 1. Verify that the interface is managed by NM 2. Change it to NM_CONTROLLED=no 3. Wait until NM says it's unmanaged. This can be done with something like nmcli dev list iface eth0 | grep -q 'GENERAL.STATE: .*unmanaged' or some api calls or whatever (I do not know much about NM), in a loop, with some delay, maximum, alert if more than max delay, etc. To force a reproduction of this bug, based on the above, I did, at some point after running engine-setup (but can also be done before starting it): pid=$(pidof NetworkManager); kill -STOP $pid; inotifywait /etc/sysconfig/network-scripts/ifcfg-eth0 ; sleep 10; kill -CONT $pid This caused NM to be suspended until 10 seconds after vdsm changed ifcfg-eth0. Indeed, the network connection was dropped, and I had to manually fix it. Comments are welcome.
*** Bug 991087 has been marked as a duplicate of this bug. ***
I really prefer vdsm will wait for network manager to take of the interface before continuing over disable network manager at engine host. This behavior can be specific for non production and per explicit configuration.
Any update on this one?
No one has started implementing Didi's suggestion. I feel a bit reluctant to add an non-deterministic wait on NetworkManager. I would prefer to have a synchronous "unmanage" NM API call. But since that's not going to happen any time soon, we should probably add the wait anyway. The only question is when we should do it. Adding such an intrusive change to the ifup process seems too late for ovirt-3.3.0. I suggest that we keep our request to turn off NetworkManager when configuring new networks (installation time included), and postpone this fix to the grand rebase of ovirt-3.3.1.
so we are at the grand rebase, what's the plan now?
I did not manage to hack this on the master branch on time (the grand rebase took too much of my time).
I am told that the surprising NM behavior of taking the nic down, has been changed in upstream NM, by virtue of installing NetworkManager-servce package.
NetworkManager-config-server available on F20+.
All the functional tests over dummies and bridged and bridgeless dhcp network configuring work with NM enabled and running on Fedora 20 (due to a newer version of Network Manager that doesn't take the devices down when unmanaging).
(In reply to Antoni Segura Puimedon from comment #15) > All the functional tests over dummies and bridged and bridgeless dhcp > network configuring work with NM enabled and running on Fedora 20 (due to a > newer version of Network Manager that doesn't take the devices down when > unmanaging). So should we stop disabling network manager on hosts?
@Alon: I think that for F20 it'll be safe to do so.
(In reply to Antoni Segura Puimedon from comment #17) > @Alon: I think that for F20 it'll be safe to do so. so we cannot close this bug as action is required.
Changing to ovirt-host-deploy because there's action needed on that part to stop disabling network manager.
Closed as NetworkManager seems to be fixed now. Also we no longer create the bridge during deploy, so this issue is irrelevant.
Reopening, since now Fedora 20 is affected by the issue.
Sorry, EL7.
Sorry, EL7 too.
*** Bug 1110875 has been marked as a duplicate of this bug. ***
Alon, if NM is used on a host, we need NetworkManager-config-server to be installed. Can ovirt-host-deploy take care of this kind of soft requirement?
(In reply to Dan Kenigsberg from comment #26) > Alon, if NM is used on a host, we need NetworkManager-config-server to be > installed. Can ovirt-host-deploy take care of this kind of soft requirement? how is it related to this bug? why can't the vdsm package pull this?
When using VDSM with NetworkManager there is a problem in their combination. When VDSM makes ifdown to eth0, the NetworkManager interferes and brings it up, that cause other problems and finally the host has no network. eth0 has a address as ovirtmgmt bridge and default route is via eth0: [root@hehost vdsm]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP qlen 1000 link/ether 00:aa:aa:aa:aa:01 brd ff:ff:ff:ff:ff:ff inet 192.168.10.2/24 brd 192.168.10.255 scope global dynamic eth0 valid_lft 3579sec preferred_lft 3579sec 3: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 36:b7:b3:0d:f8:d0 brd ff:ff:ff:ff:ff:ff 4: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 3e:c9:a1:8b:c8:22 brd ff:ff:ff:ff:ff:ff 5: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:aa:aa:aa:aa:01 brd ff:ff:ff:ff:ff:ff inet 192.168.10.2/24 brd 192.168.10.255 scope global dynamic ovirtmgmt valid_lft 2066sec preferred_lft 2066sec 6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN qlen 500 link/ether fe:aa:aa:aa:aa:02 brd ff:ff:ff:ff:ff:ff inet6 fe80::fcaa:aaff:feaa:aa02/64 scope link valid_lft forever preferred_lft forever [root@hehost vdsm]# ip ro default via 192.168.10.1 dev eth0 proto static metric 100 169.254.0.0/16 dev ovirtmgmt scope link metric 1005 192.168.10.0/24 dev ovirtmgmt proto kernel scope link src 192.168.10.2 192.168.10.0/24 dev eth0 proto kernel scope link src 192.168.10.2 [root@hehost vdsm]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.00aaaaaaaa01 no eth0 vnet0 In logs you can see this in 16:41:17 timestamp. we are talking about 3.6 on rhel 7.1: packages versions: [root@hehost vdsm]# rpm -qa | grep vdsm vdsm-python-4.17.2-0.el7.noarch vdsm-jsonrpc-4.17.2-0.el7.noarch vdsm-4.17.2-0.el7.noarch vdsm-infra-4.17.2-0.el7.noarch vdsm-xmlrpc-4.17.2-0.el7.noarch vdsm-yajsonrpc-4.17.2-0.el7.noarch vdsm-cli-4.17.2-0.el7.noarch [root@hehost vdsm]# rpm -qa | grep Network NetworkManager-1.0.0-14.git20150121.b4ea599c.el7.x86_64 NetworkManager-tui-1.0.0-14.git20150121.b4ea599c.el7.x86_64 NetworkManager-config-server-1.0.0-14.git20150121.b4ea599c.el7.x86_64 NetworkManager-team-1.0.0-14.git20150121.b4ea599c.el7.x86_64 NetworkManager-libnm-1.0.0-14.git20150121.b4ea599c.el7.x86_64
Created attachment 1064364 [details] supervdsm log
Created attachment 1064365 [details] messages Please note that NetworkManagerconfig-server was also installed
This is an automated message. This Bugzilla report has been opened on a version which is not maintained anymore. Please check if this bug is still relevant in oVirt 3.5.4. If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution) If it's an RFE please update the version to 4.0 if still relevant.
Don't we disable NetworkManager on host deploy ?
(In reply to Barak from comment #33) > Don't we disable NetworkManager on host deploy ? AIO should behave as standard system, as engine and other application running on it, in this case network manager is not disabled to avoid unexpected configuration.
Do we disable NetworkManager on standard installation (regular host deploy) ? AFAIR AIO uses standard host deploy at the end of it's execution.
(In reply to Barak from comment #35) > Do we disable NetworkManager on standard installation (regular host deploy) > ? yes. > AFAIR AIO uses standard host deploy at the end of it's execution. host-deploy is capable of customization you know...
*** Bug 1144084 has been marked as a duplicate of this bug. ***
Not sure I understand: """ Barak 2015-10-20 13:07:56 EDT Assignee: danken → alonbl Whiteboard: network → integration """ If network manager is to disabled permanently, it should be done so via the vdsm service, manual/puppet installation of vdsm should also result in valid setup. Also, as I wrote, in engine we cannot disable network manager, this will make engine machine unsupportable. Finally, the AIO setup is depreciated in favour of the hosted engine as far as I understand.
Won't be in 3.6.0. Re-targeting to 3.6.1.
Can you add a prerequisite to the wiki on disabling network manager?
(In reply to Yaniv Dary from comment #40) > Can you add a prerequisite to the wiki on disabling network manager? Not sure, on FC24 network service will stop to exist due to sysvinit scripts support removal and only NetworkManager will exist. So we can ignore / workaround this issue only for 6 months.
(In reply to Sandro Bonazzola from comment #41) > (In reply to Yaniv Dary from comment #40) > > Can you add a prerequisite to the wiki on disabling network manager? > > Not sure, on FC24 network service will stop to exist due to sysvinit scripts > support removal and only NetworkManager will exist. So we can ignore / > workaround this issue only for 6 months. Then AIO will not be supported on FC24.