Bug 988453 - can't add host to engine when all hosts in cluster are in Install Failed state
Summary: can't add host to engine when all hosts in cluster are in Install Failed state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.3
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 3.3
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard: gluster
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-25 15:53 UTC by Mike Burns
Modified: 2013-09-23 07:34 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-09-23 07:34:19 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
engine log (4.34 MB, text/plain)
2013-07-25 16:09 UTC, Mike Burns
no flags Details
engine log (4.35 MB, text/plain)
2013-07-25 16:13 UTC, Mike Burns
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 17552 0 None None None Never

Description Mike Burns 2013-07-25 15:53:04 UTC
probably not otopi, but it's a best guess

Description of problem:
When attempting to add a new host to engine when all hosts are in install failed state, the host cannot be added

"There is no available server in the cluster to probe the new server"

Version-Release number of selected component (if applicable):
ovirt-host-deploy-1.1.0-0.2.master.20130723.gita991545.fc19.noarch
otopi-1.1.0-0.2.master.20130723.git8b451a2.fc19.noarch
ovirt-image-uploader-3.3.0-0.1.beta1.fc19.noarch
ovirt-engine-backend-3.3.0-0.3.beta1.fc19.noarch
ovirt-host-deploy-java-1.1.0-0.2.master.20130723.gita991545.fc19.noarch
ovirt-engine-dbscripts-3.3.0-0.3.beta1.fc19.noarch
ovirt-engine-sdk-python-3.3.0.3-1.fc19.noarch
otopi-java-1.1.0-0.2.master.20130723.git8b451a2.fc19.noarch
ovirt-engine-userportal-3.3.0-0.3.beta1.fc19.noarch
ovirt-engine-lib-3.3.0-0.3.beta1.fc19.noarch
ovirt-log-collector-3.3.0-0.1.beta1.fc19.noarch
ovirt-engine-restapi-3.3.0-0.3.beta1.fc19.noarch
ovirt-release-fedora-7-1.noarch
ovirt-iso-uploader-3.3.0-0.1.beta1.fc19.noarch
ovirt-engine-setup-3.3.0-0.3.beta1.fc19.noarch
ovirt-engine-cli-3.3.0.3-1.fc19.noarch
ovirt-engine-webadmin-portal-3.3.0-0.3.beta1.fc19.noarch
ovirt-engine-tools-3.3.0-0.3.beta1.fc19.noarch
ovirt-engine-3.3.0-0.3.beta1.fc19.noarch


How reproducible:
Always

Steps to Reproduce:
1.add a host and have it fail for any reason (no virt extensions works)
2.add a different host in the same cluster
3.

Actual results:
"There is no available server in the cluster to probe the new server"

Expected results:
I'm guessing that engine is delegating the work to the hosts in the same cluster if there are any.  It should have a way to fall back to the engine if there are no active hosts.  

Additional info:

I suspect that this might happen when all hosts are in maintenance mode as well.
This does happen when hosts are non-operational as well.

Comment 1 Alon Bar-Lev 2013-07-25 15:55:44 UTC
Yair?

Comment 2 Alon Bar-Lev 2013-07-25 15:56:50 UTC
mburns, engine log would be great.

Comment 3 Mike Burns 2013-07-25 16:09:24 UTC
Created attachment 778355 [details]
engine log

Comment 4 Mike Burns 2013-07-25 16:11:16 UTC
disregard that log...

Comment 5 Mike Burns 2013-07-25 16:13:38 UTC
Created attachment 778370 [details]
engine log

last error in this log should be the relevant error.

Situation:

1 host in Default cluster in non-operational state
add second host to the Default cluster

UI error:  Error while executing action: Cannot add Host. There is no available server in the cluster to probe the new server.

Comment 6 Itamar Heim 2013-07-25 20:28:01 UTC
this is a gluster error: ACTION_TYPE_FAILED_NO_GLUSTER_HOST_TO_PEER_PROBE
mike - is it a gluster cluster?

Comment 7 Mike Burns 2013-07-25 20:36:35 UTC
(In reply to Itamar Heim from comment #6)
> this is a gluster error: ACTION_TYPE_FAILED_NO_GLUSTER_HOST_TO_PEER_PROBE
> mike - is it a gluster cluster?

It's the default cluster, which happens to have both gluster and virt checked by default, so yes.

Comment 8 Itamar Heim 2013-07-25 20:39:03 UTC
sahina - should the default cluster have gluster enabled by default?
(in any case, sounds like the check if host is first in cluster is tricky if there are hosts in the cluster by they are not working properly...)

Comment 9 Sahina Bose 2013-07-26 12:01:04 UTC
It's enabled by default when Application mode is Both. Probably not required to be enabled by default.

When we add a host to a cluster, and there are already hosts in the cluster, we need to add the new host as a peer in the underlying storage cluster. So any one UP server is required to do this. 
If all hosts are in failed state, and we allow the host to be added to the cluster,
it means the newly added host is not really part of the underlying storage cluster.

We could possibly try to add as gluster peer whenever the other hosts in the cluster come up.
But then, there are other validations
- If the host that comes up is part of another cluster, we do not support merging of clusters.
- This becomes a problem when there are multiple hosts that are not working properly in the cluster and we try to add a new host.

Comment 10 Itamar Heim 2013-07-26 13:43:02 UTC
I agree caution may be prudent when you know it is a gluster cluster (and if there are old, non operational hosts in the cluster, you may want to ask the user if this is about initializing a new gluster cluster or joining the existing one...)

but i think the default cluster should be virt only for application mode which isn't gluster only?

Comment 11 Sahina Bose 2013-07-31 04:46:09 UTC
Yes. We can change the default cluster to virt only unless application mode is gluster only.
Aravinda, can you change the installer to do this?

Comment 12 Itamar Heim 2013-08-21 16:41:24 UTC
as RC is built, moving to ON_QA (hopefully did not catch incorrect bugs when doing this)

Comment 13 Itamar Heim 2013-09-23 07:34:19 UTC
closing as this should be in 3.3 (doing so in bulk, so may be incorrect)


Note You need to log in before you can comment on or make changes to this bug.