Bug 1327155 - engine failed to comunicate with hosts after restart or reinstall
Summary: engine failed to comunicate with hosts after restart or reinstall
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 3.6.5.1
Hardware: x86_64
OS: Linux
unspecified
urgent vote
Target Milestone: ---
: ---
Assignee: Moti Asayag
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-14 11:20 UTC by Eldad Marciano
Modified: 2019-04-28 08:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-18 12:07:00 UTC
oVirt Team: Infra
gklein: ovirt-3.6.z?
gklein: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Eldad Marciano 2016-04-14 11:20:13 UTC
Description of problem:
scale team experiencing some strange behavior, 
once engine installed and being populated, after we restart the engine or reinstall it, we can find our hosts as non-responsive.

reinstalling the hosts dose not work


Version-Release number of selected component (if applicable):
3.6.5.1

How reproducible:
no clear

Steps to Reproduce:
1. establishing setup with some hosts
2. restart the engine \ reinstall it 
3. hosts become nonresponsive 

Actual results:
non responsive hosts after after running reinstall host

Expected results:
reinstall hosts should pass with no issues


Additional info:

Comment 1 Oved Ourfali 2016-04-15 08:01:24 UTC
What do you mean by engine reinstall? You mean upgrade? 

Please attach complete logs.

Comment 2 guy chen 2016-04-15 10:32:30 UTC
Created attachment 1147600 [details]
server log at debug level

Comment 3 guy chen 2016-04-15 10:34:10 UTC
Created attachment 1147601 [details]
engine log at debug level

Comment 5 Oved Ourfali 2016-04-15 17:39:55 UTC
Do you have the host deploy logs?

Comment 11 Oved Ourfali 2016-04-18 03:16:56 UTC
This might be an environment issue, also related to fake hosts, as the environment is mixed. 

As no one else had that, I don't think it should be marked as blocker. 

Gil?

Comment 12 Gil Klein 2016-04-18 05:33:22 UTC
Eldad, is this happening on fake hosts only or on bare metal hosts?

Comment 13 Gil Klein 2016-04-18 05:38:37 UTC
To (In reply to Oved Ourfali from comment #11)
> This might be an environment issue, also related to fake hosts, as the
> environment is mixed. 
> 
> As no one else had that, I don't think it should be marked as blocker. 
> 
> Gil?
Based on comment #4, this was reproduced with a bare metal host.

Oved, can we put an attention on this to confirm if this is an environment issue or a real regression?

Comment 14 Oved Ourfali 2016-04-18 05:42:40 UTC
Can you try to reproduce on a clean environment, with only real hosts?
We're examining the logs, but want a cleaner reproduction, if any.

Comment 15 Piotr Kliczewski 2016-04-18 07:11:07 UTC
Except of having steps to reproduce it would be great to have ssl debug logs. You can enable them by providing parameter -Djavax.net.debug=all to engine jvm.

Comment 16 Eldad Marciano 2016-04-18 11:58:34 UTC
Seems like the issue disappear, after we found two engine process running.
now hosts can be reinstall with no issues.

Comment 17 Eldad Marciano 2016-04-18 12:02:26 UTC
it happens to me again i found two engine process once i ran 'service ovirt-engine restart'

Comment 18 Piotr Kliczewski 2016-04-18 12:07:00 UTC
Based on comment #16 looks like environment issue

Comment 19 Oved Ourfali 2016-04-18 12:08:14 UTC
There shouldn't be two engine processes running.
If there are perhaps it is because the service was killed, but no the engine itself, so a restart of the service will restart it.

So, might be related to Bug 1320903.

Changing to NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.