1115420 – Network connectivity get lost on the hypervisor host adding it to a cluster if NetworkManager is running

Bug 1115420 - Network connectivity get lost on the hypervisor host adding it to a cluster if NetworkManager is running

Summary: Network connectivity get lost on the hypervisor host adding it to a cluster i...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	vdsm
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Antoni Segura Puimedon
QA Contact:	Gil Klein
Docs Contact:
URL:
Whiteboard:	network
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-02 10:29 UTC by Simone Tiraboschi
Modified:	2016-02-10 19:36 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-07-08 13:01:47 UTC
oVirt Team:	Network
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1001186	0	high	CLOSED	With AIO installer and NetworkManager enabled, the ovirtmgmt bridge is not properly configured	2021-11-12 10:13:23 UTC
Red Hat Bugzilla	1124876	0	unspecified	CLOSED	el7 with NetworkManager: vdsm fails to remove dummy device	2021-02-22 00:41:40 UTC

Internal Links: 1001186 1124876

Description Simone Tiraboschi 2014-07-02 10:29:57 UTC

Description of problem:
On a fully virtualized test plant (two f19 VM on KVM with nested virtualization, one for the engine and the second as nested hypervisor host, one one network interface on any host) network connectivity get lost on the hypervisor host adding it to a cluster.

On the ovirt engine console everything seams to go well till "Starting vdsm" than, after a long timeout, it adds "Processing stopped due to timeout" and "SSH session timeout host 'root@f19td5'"

Now if I check the hypervisor host (f19td5 on my test plant) is not anymore reachable from network.
Checking the status of that host from the spice console of the exterior KVM I found that ifconfig command reports just about the loopback interface.

Ethernet interface seams missing. The same after a reboot.

'/bin/systemctl status network' reports:
network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network)
Active: failed (Result: exit-code) since Wed 2014-07-02 11:57:03 CEST; 28min ago

Jul 02 11:57:03 f19td5.localdomain systemd[1]: Starting LSB: Bring up/down networking...
Jul 02 11:57:03 f19td5.localdomain network[2202]: Bringing up loopback interface: [ OK ]
Jul 02 11:57:03 f19td5.localdomain network[2202]: Bringing up interface eth0: ERROR : [/etc/sysconfig...ing.
Jul 02 11:57:03 f19td5.localdomain network[2202]: [FAILED]
Jul 02 11:57:03 f19td5.localdomain systemd[1]: network.service: control process exited, code=exited status=1
Jul 02 11:57:03 f19td5.localdomain systemd[1]: Failed to start LSB: Bring up/down networking.
Jul 02 11:57:03 f19td5.localdomain systemd[1]: Unit network.service entered failed state.

Version-Release number of selected component (if applicable):

On the engine host:
ovirt-engine.noarch 3.5.0-0.0.master.20140629172304.git0b16ed7.fc19 @ovirt-3.5-pre

on the hypervisor host:
vdsm.x86_64 4.14.8.1-0.fc19 @updates

How reproducible:
I tried more than one time always with the same result. 100% at least on my perspective.

Steps to Reproduce:
1. Install ovirt engine on a fresh system
2. Try to add an hypervisor host
3.

Actual results:
Network connectivity get lost and and the host is not added to the cluster

Expected results:
Host becomes part of the cluster

Additional info:

Comment 1 Dan Kenigsberg 2014-07-07 23:29:22 UTC

Have you disabled NetworkManager? In f19 (and f20) it still tries to take over any network device.

Please try again after having run

    /usr/bin/systemctl stop NetworkManager.service
    /usr/bin/systemctl mask NetworkManager.service

on the nodes to be added.

If this is not the case, please attach the output of

    bash -xv /etc/sysconfig/network-scripts/ifup-eth eth0

to understand why this fails.

Comment 2 Simone Tiraboschi 2014-07-08 07:55:57 UTC

Yes, I think it was indeed enabled:

[stirabos@f19t2 ~]$ /usr/bin/systemctl status NetworkManager.service
NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
   Active: inactive (dead)


I'll try again with a fresh VM disabling it.

Comment 3 Dan Kenigsberg 2014-07-08 09:43:51 UTC

Please reopen the bug if it's not all about NetworkManager (which should be harmless to Vdsm beginning Fedora 21)

Comment 4 Simone Tiraboschi 2014-07-08 11:56:35 UTC

Fedora 20, disabling NetworkManager before trying to add it, works correctly.

but what to do in the mean time? Fedora 20 uses NetworkManager by default.

Comment 5 Sandro Bonazzola 2014-07-08 12:03:34 UTC

I think that this is a regression: previously having NetworkManager running didn't cause any issue. And also if in F21 it will be harmless, we're not supporting F21, we're supporting F19 and F20. And there it's an issue.

If NetworkManager must be stopped, vdsm should ensure it's stopped or if not vdsm at least host-deploy.

I don't think this can be covered only by release note.

Comment 6 Dan Kenigsberg 2014-07-08 13:01:47 UTC

This is not a regression. We could never install vdsm (or setup networking in other circumstances) while NetworkManager was running. Unless configured otherwise (which should be available in F20, not only F21) NM auto-manages any new device, and takes it down.

https://bugzilla.redhat.com/show_bug.cgi?id=1001186#c14

Comment 7 Simone Tiraboschi 2014-07-09 07:27:08 UTC

I think that in such case at least host-deploy should detect NetworkManager in order to abort alerting the user to stop it before trying again.
Now it doesn't provide any hint to the user and it results in a not working network configuration. If the host is remote is always a mess.

Comment 8 Simone Tiraboschi 2014-07-09 07:32:58 UTC

By the way, no problem on my side to close it on VDSM front, but at least we should solve it on host-deploy side.

Comment 9 Dan Kenigsberg 2014-07-25 14:57:22 UTC

Simone, you can re-open, and change component, but I am not sure that we'd have resources to fix a f19-only bug.

Comment 10 Simone Tiraboschi 2014-07-25 15:00:24 UTC

Unfortunately we got the same behavior on RHEL7, Centos7 and f20.

Comment 11 Dan Kenigsberg 2014-07-30 08:51:13 UTC

Do you have NetworkManager-config-server installed on these hosts?

Comment 12 Simone Tiraboschi 2014-07-30 15:02:15 UTC

No, I don't: I just discovered now this pkg.

This morning sbonazzo told me that he got it working on centos7 with NetworkManager simply enforcing 'NM_CONTROLLED=no' on the network-script of the physical interface before starting engine-setup.
I didn't try with that.

Note You need to log in before you can comment on or make changes to this bug.