Bug 1327155

Summary: engine failed to comunicate with hosts after restart or reinstall
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: Backend.CoreAssignee: Moti Asayag <masayag>
Status: CLOSED NOTABUG QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.6.5.1CC: bugs, emarcian, guchen, oourfali, pkliczew
Target Milestone: ---Keywords: Regression
Target Release: ---Flags: gklein: ovirt-3.6.z?
gklein: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-18 12:07:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eldad Marciano 2016-04-14 11:20:13 UTC
Description of problem:
scale team experiencing some strange behavior, 
once engine installed and being populated, after we restart the engine or reinstall it, we can find our hosts as non-responsive.

reinstalling the hosts dose not work


Version-Release number of selected component (if applicable):
3.6.5.1

How reproducible:
no clear

Steps to Reproduce:
1. establishing setup with some hosts
2. restart the engine \ reinstall it 
3. hosts become nonresponsive 

Actual results:
non responsive hosts after after running reinstall host

Expected results:
reinstall hosts should pass with no issues


Additional info:

Comment 1 Oved Ourfali 2016-04-15 08:01:24 UTC
What do you mean by engine reinstall? You mean upgrade? 

Please attach complete logs.

Comment 2 guy chen 2016-04-15 10:32:30 UTC
Created attachment 1147600 [details]
server log at debug level

Comment 3 guy chen 2016-04-15 10:34:10 UTC
Created attachment 1147601 [details]
engine log at debug level

Comment 5 Oved Ourfali 2016-04-15 17:39:55 UTC
Do you have the host deploy logs?

Comment 11 Oved Ourfali 2016-04-18 03:16:56 UTC
This might be an environment issue, also related to fake hosts, as the environment is mixed. 

As no one else had that, I don't think it should be marked as blocker. 

Gil?

Comment 12 Gil Klein 2016-04-18 05:33:22 UTC
Eldad, is this happening on fake hosts only or on bare metal hosts?

Comment 13 Gil Klein 2016-04-18 05:38:37 UTC
To (In reply to Oved Ourfali from comment #11)
> This might be an environment issue, also related to fake hosts, as the
> environment is mixed. 
> 
> As no one else had that, I don't think it should be marked as blocker. 
> 
> Gil?
Based on comment #4, this was reproduced with a bare metal host.

Oved, can we put an attention on this to confirm if this is an environment issue or a real regression?

Comment 14 Oved Ourfali 2016-04-18 05:42:40 UTC
Can you try to reproduce on a clean environment, with only real hosts?
We're examining the logs, but want a cleaner reproduction, if any.

Comment 15 Piotr Kliczewski 2016-04-18 07:11:07 UTC
Except of having steps to reproduce it would be great to have ssl debug logs. You can enable them by providing parameter -Djavax.net.debug=all to engine jvm.

Comment 16 Eldad Marciano 2016-04-18 11:58:34 UTC
Seems like the issue disappear, after we found two engine process running.
now hosts can be reinstall with no issues.

Comment 17 Eldad Marciano 2016-04-18 12:02:26 UTC
it happens to me again i found two engine process once i ran 'service ovirt-engine restart'

Comment 18 Piotr Kliczewski 2016-04-18 12:07:00 UTC
Based on comment #16 looks like environment issue

Comment 19 Oved Ourfali 2016-04-18 12:08:14 UTC
There shouldn't be two engine processes running.
If there are perhaps it is because the service was killed, but no the engine itself, so a restart of the service will restart it.

So, might be related to Bug 1320903.

Changing to NOTABUG.