Bug 1222417 - Host stay in 'Unassigned' state for ever, unless restarting vdsm+engine
Summary: Host stay in 'Unassigned' state for ever, unless restarting vdsm+engine
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Frontend.WebAdmin
Version: ---
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-3.6.2
: 3.6.2
Assignee: Moti Asayag
QA Contact: Petr Kubica
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-18 07:59 UTC by Michael Burman
Modified: 2016-02-18 11:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-18 11:18:00 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
rule-engine: devel_ack+
pnovotny: testing_ack+


Attachments (Terms of Use)
engine log 3.6 (1.56 MB, text/plain)
2015-05-18 07:59 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 41529 0 'None' MERGED engine: Network errors should be handled specifically 2021-02-05 14:37:52 UTC
oVirt gerrit 49403 0 'None' MERGED core: Ignore network exceptions during maintenance 2021-02-05 14:37:52 UTC
oVirt gerrit 49405 0 'None' MERGED core: Ignore network exceptions during maintenance 2021-02-05 14:37:52 UTC

Description Michael Burman 2015-05-18 07:59:53 UTC
Created attachment 1026619 [details]
engine log 3.6

Description of problem:
Host stay in 'Unassigned' state for ever, unless restarting vdsm+engine.
'Unassigned' state is not clear at all and server will stay in this state unless we will restart the engine+vdsm on server.
This issue happening when we have server that is running in 3.6 engine and his state is up, then someone else installing this server in other 3.6 engine.
In the first engine server will stay in 'Unassigned' state for ever. 
In the second engine most probably host will be installed with success, but once moved to maintenance, he will be stay in 'Unassigned' state as well, or will move to non-operational after some time.
But in the origin engine, the server is in 'Unassigned' state for ever. 

engine.log

2015-05-18 10:51:29,733 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-74) [] Failed to refresh VDS, network error, continuing, vds='red-vds4.qa.lab.tlv.redhat.com'(17f89824-c4c8-4829-9424-e22dfe28f562): VDSGenericException: VDSNetworkException: General SSLEngine problem
2015-05-18 10:51:33,256 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to red-vds4.qa.lab.tlv.redhat.com/10.35.128.10
2015-05-18 10:51:33,277 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages
2015-05-18 10:51:33,279 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-77) [] Command 'ListVDSCommand(HostName = red-vds4.qa.lab.tlv.redhat.com, HostId = 17f89824-c4c8-4829-9424-e22dfe28f562, vds=Host[red-vds4.qa.lab.tlv.redhat.com,17f89824-c4c8-4829-9424-e22dfe28f562])' execution failed: VDSGenericException: VDSNetworkException: General SSLEngine problem
2015-05-18 10:51:33,285 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-77) [] Failed to invoke scheduled method vmsMonitoring: null

Version-Release number of selected component (if applicable):
3.6.0-0.0.master.20150412172306.git55ba764.el6


Steps to Reproduce:
1. Deploy and Run server on 3.6 engine
2. Install server in another 3.6 engine, without moving to 'maintenance' in the origin engine. 

Actual results:
First engine -  server stay in 'Unassigned' state for ever, unless restarting engine+vdsm
Second engine - Sometimes host installed with success, but switching to maintenance, will move server to 'Unassigned' state or non-operational.
And sometimes server will not be installed successfully and will stay in 'Unassigned' state as well.  

Expected results:
- Maybe an alert message, letting the user know that this server is already in use and installed in other engine
- Block such operation?
- Make 'Unassigned' state more clear and more recoverable

Comment 1 Max Kovgan 2015-06-28 14:13:00 UTC
ovirt-3.6.0-3 release

Comment 2 Petr Kubica 2015-10-19 07:46:47 UTC
I have two clean engine. I installed one host to one of them. Everything seems okay but after I installed the same host to the second engine there is in first engine repeatedly show eror message VDSM host command failed: General SSLEngine problem (every 15+- second) In log I see many exception

I have latest version 3.6.0-16

My steps are:
1. Have clean (new) installation of two engine
2. Install a host to first engine
3. Install the same host to second engine after successful installation to first engine
4. See what happens in first engine, in the second engine the host is up.

Comment 10 Moti Asayag 2015-11-08 09:49:20 UTC
According to the engine.log of the first engine, the installation of the host didn't end successfully:

2015-10-16 12:00:43,706 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-7-thread-3) [651e96ac] Host installation failed for host '1450b74f-84af-4966-9a9e-6bb69a32980f', 'srv-02': Command returned failure code 1 during SSH session 'root.63.205'

And the engine moves that host right to "InstallFailed":
2015-10-16 12:00:43,710 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-7-thread-3) [651e96ac] START, SetVdsStatusVDSCommand(HostName = srv-02, SetVdsStatusVDSCommandParameters:{runAsync='true', hostId='1450b74f-84af-4966-9a9e-6bb69a32980f', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 7ba83821

So in that scenario the brokers weren't started for the engine - therefore the host cannot be accessible, and the only operation which should have done for that host is reinstalling it, which is allowed when the host is on "Install Failed" status.

Comment 11 Petr Kubica 2015-11-20 14:18:10 UTC
You checked wrong log. For the third time (last log 20151016144142) I successfully added the host to the first engine (Mentioned attempt failed due to missing rhev-h channel repo). It was up until I installed him to the second engine. After that I cannot do anything with that host in first engine (I can't remove the host from the first engine)

Comment 12 Moti Asayag 2015-11-30 12:21:46 UTC
The ability to move a host to maintenance from non-responsive will be supported as part of fixing bug 1279625

Comment 13 Red Hat Bugzilla Rules Engine 2015-11-30 12:21:50 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 14 Sandro Bonazzola 2015-12-23 15:07:46 UTC
This bug has target milestone 3.6.2 and is on modified without a target release.
This may be perfectly correct, but please check if the patch fixing this bug is included in ovirt-engine-3.6.2. If it's included, please set target-release to 3.6.2 and move to ON_QA. Thanks.

Comment 15 Petr Kubica 2016-01-14 14:33:32 UTC
Verified in 3.6.2.5-0.1.el6


Note You need to log in before you can comment on or make changes to this bug.