Bug 1350687

Summary: Maintenance and hosted engine issue with Ovirt 4.0
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Bertrand Caplet <bertrand.caplet>
Component: GeneralAssignee: Martin Sivák <msivak>
Status: CLOSED DUPLICATE QA Contact: Ilanit Stein <istein>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0.0CC: bertrand.caplet, bugs, dfediuck, rgolan, ylavi
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-03 08:18:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovirt-hosted-engine-ha Agent log
none
ovirt-hosted-engine-ha Broker log
none
ovirt-hosted-engine-ha Agent log 20160629
none
ovirt-hosted-engine-ha Broker log 20160629 none

Description Bertrand Caplet 2016-06-28 06:46:27 UTC
Created attachment 1173278 [details]
ovirt-hosted-engine-ha Agent log

Description of problem:
After upgrade from Ovirt 3.6 to 4.0 When putting a node in maintenance, it prepares for maintenance ; unset SPM if it was. Then migrate all the VMs to the another node. All of this steps are ok (VMs are migratted) but VMs stays locked and in migration mode.
I did noticed every 10 minutes or so (pretty random), the ovirt-hosted-engine HA is sending mail in this order :

    1) EngineUnexpectedlyDown
    2) (StartState-ReinitializeFSM sometimes)
    3) EngineDown-EngineStart
    4) EngineStart-EngineStarting
    5) EngineStarting-EngineUnexpectedlyDown
    6) Repeat

Version-Release number of selected component (if applicable):
Ovirt 4.0 (I'm not sure about the ovirt-hosted-engine-ha version.)


How reproducible:
I manage to reproduce it 3 times

Steps to Reproduce:
1. Migrate hosted-engine manually
2. Put a node in maintenance

Actual results:
VMs are migratted but stays locked and in migration mode.
HA stops and restarts.

Expected results:
VMs migratted.
HA OK.

Additional info:
Manual migration is ok. Rebooting the hosted-engine unlocks VMs, I don't know yet about the HA stopping & restarting.

Comment 1 Bertrand Caplet 2016-06-28 06:48:34 UTC
Created attachment 1173280 [details]
ovirt-hosted-engine-ha Broker log

Comment 2 Roy Golan 2016-06-29 07:00:29 UTC
Please specify the timestamp it happens so I can trace it in the log and also  supply the engine.log

When you say the vm migrated but stated locked I guess you mean their engine status remained 'MigratingFrom' because the engine was down.

Comment 3 Roy Golan 2016-06-29 08:36:11 UTC
see comment 2

Comment 4 Bertrand Caplet 2016-06-29 15:55:32 UTC
So I did dig into logs little more. I had an issue with ovirt-ha-agent service, it couldn't create/open new files because of limits on one node. I did set number of open files for vdsm user to 2048 in /etc/security/limits.conf on the 2 nodes like this :
vdsm    -       nofile  2048

then I restarted  ovirt-ha-broker  ovirt-ha-agent on both nodes.

I retried to put a node in maintenance, and VMs are still stucks in locked & migration mode. They remain in 'Migrating To' status on engine.

However I don't receive anymore emails from ovirt-hosted-engine HA.

I will put more limited logs right now. I did put the node in maintenance @17:35

Comment 5 Bertrand Caplet 2016-06-29 15:56:50 UTC
Created attachment 1174010 [details]
ovirt-hosted-engine-ha Agent log 20160629

Comment 6 Bertrand Caplet 2016-06-29 15:57:37 UTC
Created attachment 1174023 [details]
ovirt-hosted-engine-ha Broker log 20160629

Comment 7 Roy Golan 2016-07-03 08:17:28 UTC
you probably hit Bug 1343005, "too many open files"

Comment 8 Doron Fediuck 2016-07-03 08:18:27 UTC
Please try again once there's a fix bug 1343005 and re-open if needed.

*** This bug has been marked as a duplicate of bug 1343005 ***