Bug 1158418 - oVirt it's not able to directly add an host to a cluster if we have more than a required network on it
Summary: oVirt it's not able to directly add an host to a cluster if we have more than...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-hosted-engine-ha
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Doron Fediuck
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks: 1086032
TreeView+ depends on / blocked
 
Reported: 2014-10-29 10:42 UTC by Simone Tiraboschi
Modified: 2014-11-05 09:27 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-05 09:27:59 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)

Description Simone Tiraboschi 2014-10-29 10:42:46 UTC
Description of problem:
Adding a second logical network to a datacenter, oVirt is going by default to flag it as required for all the participant clusters.
Than, if we try to add another host to that cluster, the automatic procedure fails cause it's not able to automatically bind the second logical network.

In engine.log we can find:

 2014-10-28 16:42:25,809 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-13) [7181825b] Host hosted_engine_2 is set to Non-Operational, it is missing the following networks: net2

At the point the user should switch the host to maintenance mode, than manually bind the second network and than try to re-install.
Only at that point the deploy procedure it's able to create and bind the second bridge to complete the process.

We are facing the same problem also from hosted-engine, on that currently we cannot also let the user manually review the network bindings and so we also don't have a workaround or a recovery procedure.


Version-Release number of selected component (if applicable):
oVirt 3.5.0


How reproducible:
100%


Steps to Reproduce:
1. configure a datacenter, configure a cluster, add the first host
2. add a second logical network, bind it to the first host
3. try to add a second host


Actual results:
It fails adding the host: the host is set to non-operational mode; the user should manually set it to maintenance mode in order to manually bind the second network and reinstall than the host.
On my opinion, this procedure it's not that user friendly cause it's not that easy understanding why it's failing and what the user should do to fix it.

Expected results:
Knowing in advance that we have more than a required network, I see some alternatives:
a. let the user review the network binding schema from the GUI before adding an host
b. directly add the host in maintenance mode showing an additional alert with an hint for the user about what to do to complete the process
c. if the host has enough net adapter/VLAN automatically choose the best matching schema to bind all the logical network to the available adapters

  
Additional info:
We are facing the same issue also on hosted-engine where I still haven't found any way to complete the setup if we have more than one logical network.

Comment 1 Sven Kieske 2014-10-29 11:50:09 UTC
As argued before it does not make sense at all to give new networks the "required" flag as a default, because this makes assumptions about the network
topology which ovirt just can't decide, the user/admin must decide if a network
should be required, you can't take away that decision from the user/admin.

I know this would be a backward incompatible change, so you must fill release
notes and maybe wait for 4.0 release with it, but it's still worth it imho.

Comment 2 Simone Tiraboschi 2014-10-30 10:55:18 UTC
Avoiding to flag a new network as "required" by default can reduce the impact of the issue cause it will not affect who don't change that network attribute but we'll still face the same problem when an user will mark any additional network as required, so we definitively need to find a way to make easier to add an host if we have more than a required network.

Comment 3 Lior Vernia 2014-10-30 14:28:32 UTC
I disagree that a network should be marked as non-required by default. Most networks in a virtual DC are added so that VMs are able to properly run. Having non-required networks (i.e. that are "optional" for a host to properly function) is the exception rather than the rule, to the best of my knowledge.

The description of the bug is inaccurate as well:
1. The host moving to non-operation state does not mean that the install failed, and the host need not be re-installed.
2. There's no need to move the host to maintenance in order to attach an additional network. Setup network operations may be performed when the host is non-operational, and if following the operation the host has all the required networks - it'll move to active state automatically.

There might be a scenario specific to hosted engine where this isn't convenient, but I do not understand it from the above description - Simone, please elaborate. Otherwise, I don't see this as a bug.

Comment 4 Simone Tiraboschi 2014-10-31 16:05:11 UTC
Thanks Lior,
I tried again and indeed we can simply complete the network configuration on an host in non-operation state and than it takes a bit of time to notice before becoming operative.

I didn't notice it at all so I'm still thinking that is not so intuitive: if we know that the user need to configure a required network, on my opinion it's better to have he doing it before trying to install an host witch is for sure going to become non-operational and so red-flagged. At lest we should add a more evident alert about how to solve it.

We have the same problem on hosted engine when we have more than a required network for the second host.
Currently we are simply waiting till a long timeout when it fails.
What is the best way to have the user configuring the network binding in order to add it from hosted-engine-setup?

Comment 5 Lior Vernia 2014-11-02 08:42:42 UTC
The basic problem with what you're suggesting is that we can't know which interface a required network should be attached to. In oVirt we've been very conservative about trying to "guess" such things. This might change with time, but at the moment I'd be reluctant to do so.

I still don't understand what you mean by "waiting till a long timeout when it fails". A host moving to non-operational state is not considered a failure - there's no action that fails when this happens. It just means that no VM can be run on that host.

I also don't know what you mean by "add a more evident alert about how to solve it" - in the event log there appears a very detailed message about the host moving to non-operational state because network X is required and missing.

Lastly, what do you mean by "configuring the network binding in order to add it from hosted-engine-setup"? hosted-engine-setup is only used to set up the first host, after that everything can be performed via the GUI/REST. I wouldn't give hosted-engine-setup any responsibility to add more networks to the first host (at that point there aren't any networks other than the management one so there are no other required networks), nor for the setup of additional hosts. It's just a tool to get your deployment up and running, I don't see how it can replace the act of actually managing your virtual DC.

Comment 6 Simone Tiraboschi 2014-11-03 08:44:29 UTC
(In reply to Lior Vernia from comment #5)
> I also don't know what you mean by "add a more evident alert about how to
> solve it" - in the event log there appears a very detailed message about the
> host moving to non-operational state because network X is required and
> missing.

Yes, but it's just in the log: if possible I'd prefer to configure it from the GUI before trying to add the host or having an interactive alert which directly let me configure the network bindings instead of having to look into the logs to check what was wrong.

> Lastly, what do you mean by "configuring the network binding in order to add
> it from hosted-engine-setup"? hosted-engine-setup is only used to set up the
> first host, after that everything can be performed via the GUI/REST. I
> wouldn't give hosted-engine-setup any responsibility to add more networks to
> the first host (at that point there aren't any networks other than the
> management one so there are no other required networks), nor for the setup
> of additional hosts. It's just a tool to get your deployment up and running,
> I don't see how it can replace the act of actually managing your virtual DC.

It's not that easy and actually you need to run hosted-engine --deploy an all the host that participate the cluster.
The engine VM will be stored on a dedicated storage domain that it's not visible from the engine. On both the host we need to setup and configure an HA agent that is able to monitor the engine VM and it's able to re-start it on the other host if the first fails and so on.

So we definitively need to run hosted-engine --deploy also on the second host.
The issue is that, adding the second host currently hosted-engine waits the VDSM host to become operational witch never happens due to the lack of information about the additional required network and so hosted-engine --deploy doesn't complete correctly on the second host and so no HA.

So what do you suggest on this scenario?

Comment 7 Lior Vernia 2014-11-03 11:57:42 UTC
It seems a proper solution would be as follows:

1. When deploying the second host for HA, query the engine API for networks that are marked as required (excluding the management network).

This can be performed either by listing all networks in the relevant cluster and iterating over them, or better yet using a custom query (http://www.ovirt.org/Python-sdk#Querying_a_collection_using_the_oVirt_search_engine_query_and_custom_constraints:).

2. If such networks exist, do either of the following:
2.1. Fail and explain that required networks must not exist as part of a hosted engine HA deployment.
2.2. Ask the user for permission to turn the required networks to non-required, and proceed with deployment.
2.3. Ask the user to supply, for each required network, on which host interface it should be configured.

Out of these alternatives, I personally like (2.3) the most. In that case, after the user has input the relevant interfaces, you could add them as parameters to a setup networks command for the new host (see for example https://motiasayag.wordpress.com/2013/02/13/creating-networks-on-top-of-a-bond/).

Comment 8 Sven Kieske 2014-11-05 09:04:32 UTC
I do not see how this behaviour is special for ovirt-hosted-engine-ha

Imho this bug should be assigned to ovirt-engine-core ?

Comment 9 Lior Vernia 2014-11-05 09:15:59 UTC
I've explained several times in the above comments that the behavior of a host being non-operational until all required networks are configured on it is not a bug.

There is no way to "guess" which interface each network should be attached to on a host (or at least making such guesses goes against the current paradigm of how things are done in oVirt).

Comment 10 Simone Tiraboschi 2014-11-05 09:27:59 UTC
After reviewing it keeping the host not operational till all the required network are not configured does really make sense: think for instance what could happen if the new host has only the management network and the VM ones still has to be configured.
As soon as it becomes operational the optimizer, if present, could start to move some VMs on that to rebalance the load but at that point some VM could become unreachable due to the different network topology of the new host.

We already have the correspondent bug on hosted-engine side ( https://bugzilla.redhat.com/show_bug.cgi?id=1086032 ); solving there for hosted engine setup.

Maybe we can open an RFE ta make this more user friendly adding a new section to configure the networks directly in the first dialog before adding the new host.


Note You need to log in before you can comment on or make changes to this bug.