Description of problem:
On a fully virtualized test plant (two f19 VM on KVM with nested virtualization, one for the engine and the second as nested hypervisor host, one one network interface on any host) network connectivity get lost on the hypervisor host adding it to a cluster.
On the ovirt engine console everything seams to go well till "Starting vdsm" than, after a long timeout, it adds "Processing stopped due to timeout" and "SSH session timeout host 'root@f19td5'"
Now if I check the hypervisor host (f19td5 on my test plant) is not anymore reachable from network.
Checking the status of that host from the spice console of the exterior KVM I found that ifconfig command reports just about the loopback interface.
Ethernet interface seams missing. The same after a reboot.
'/bin/systemctl status network' reports:
network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network)
Active: failed (Result: exit-code) since Wed 2014-07-02 11:57:03 CEST; 28min ago
Jul 02 11:57:03 f19td5.localdomain systemd: Starting LSB: Bring up/down networking...
Jul 02 11:57:03 f19td5.localdomain network: Bringing up loopback interface: [ OK ]
Jul 02 11:57:03 f19td5.localdomain network: Bringing up interface eth0: ERROR : [/etc/sysconfig...ing.
Jul 02 11:57:03 f19td5.localdomain network: [FAILED]
Jul 02 11:57:03 f19td5.localdomain systemd: network.service: control process exited, code=exited status=1
Jul 02 11:57:03 f19td5.localdomain systemd: Failed to start LSB: Bring up/down networking.
Jul 02 11:57:03 f19td5.localdomain systemd: Unit network.service entered failed state.
Version-Release number of selected component (if applicable):
On the engine host:
ovirt-engine.noarch 3.5.0-0.0.master.20140629172304.git0b16ed7.fc19 @ovirt-3.5-pre
on the hypervisor host:
vdsm.x86_64 188.8.131.52-0.fc19 @updates
I tried more than one time always with the same result. 100% at least on my perspective.
Steps to Reproduce:
1. Install ovirt engine on a fresh system
2. Try to add an hypervisor host
Network connectivity get lost and and the host is not added to the cluster
Host becomes part of the cluster
Have you disabled NetworkManager? In f19 (and f20) it still tries to take over any network device.
Please try again after having run
/usr/bin/systemctl stop NetworkManager.service
/usr/bin/systemctl mask NetworkManager.service
on the nodes to be added.
If this is not the case, please attach the output of
bash -xv /etc/sysconfig/network-scripts/ifup-eth eth0
to understand why this fails.
Yes, I think it was indeed enabled:
[stirabos@f19t2 ~]$ /usr/bin/systemctl status NetworkManager.service
NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
Active: inactive (dead)
I'll try again with a fresh VM disabling it.
Please reopen the bug if it's not all about NetworkManager (which should be harmless to Vdsm beginning Fedora 21)
Fedora 20, disabling NetworkManager before trying to add it, works correctly.
but what to do in the mean time? Fedora 20 uses NetworkManager by default.
I think that this is a regression: previously having NetworkManager running didn't cause any issue. And also if in F21 it will be harmless, we're not supporting F21, we're supporting F19 and F20. And there it's an issue.
If NetworkManager must be stopped, vdsm should ensure it's stopped or if not vdsm at least host-deploy.
I don't think this can be covered only by release note.
This is not a regression. We could never install vdsm (or setup networking in other circumstances) while NetworkManager was running. Unless configured otherwise (which should be available in F20, not only F21) NM auto-manages any new device, and takes it down.
I think that in such case at least host-deploy should detect NetworkManager in order to abort alerting the user to stop it before trying again.
Now it doesn't provide any hint to the user and it results in a not working network configuration. If the host is remote is always a mess.
By the way, no problem on my side to close it on VDSM front, but at least we should solve it on host-deploy side.
Simone, you can re-open, and change component, but I am not sure that we'd have resources to fix a f19-only bug.
Unfortunately we got the same behavior on RHEL7, Centos7 and f20.
Do you have NetworkManager-config-server installed on these hosts?
No, I don't: I just discovered now this pkg.
This morning sbonazzo told me that he got it working on centos7 with NetworkManager simply enforcing 'NM_CONTROLLED=no' on the network-script of the physical interface before starting engine-setup.
I didn't try with that.