Bug 1350687 - Maintenance and hosted engine issue with Ovirt 4.0
Summary: Maintenance and hosted engine issue with Ovirt 4.0
Keywords:
Status: CLOSED DUPLICATE of bug 1343005
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General
Version: 2.0.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Martin Sivák
QA Contact: Ilanit Stein
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-28 06:46 UTC by Bertrand Caplet
Modified: 2017-10-30 12:14 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-07-03 08:18:27 UTC
oVirt Team: SLA
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
ovirt-hosted-engine-ha Agent log (17.94 MB, text/plain)
2016-06-28 06:46 UTC, Bertrand Caplet
no flags Details
ovirt-hosted-engine-ha Broker log (3.92 MB, text/plain)
2016-06-28 06:48 UTC, Bertrand Caplet
no flags Details
ovirt-hosted-engine-ha Agent log 20160629 (211.00 KB, text/plain)
2016-06-29 15:56 UTC, Bertrand Caplet
no flags Details
ovirt-hosted-engine-ha Broker log 20160629 (384.25 KB, text/plain)
2016-06-29 15:57 UTC, Bertrand Caplet
no flags Details

Description Bertrand Caplet 2016-06-28 06:46:27 UTC
Created attachment 1173278 [details]
ovirt-hosted-engine-ha Agent log

Description of problem:
After upgrade from Ovirt 3.6 to 4.0 When putting a node in maintenance, it prepares for maintenance ; unset SPM if it was. Then migrate all the VMs to the another node. All of this steps are ok (VMs are migratted) but VMs stays locked and in migration mode.
I did noticed every 10 minutes or so (pretty random), the ovirt-hosted-engine HA is sending mail in this order :

    1) EngineUnexpectedlyDown
    2) (StartState-ReinitializeFSM sometimes)
    3) EngineDown-EngineStart
    4) EngineStart-EngineStarting
    5) EngineStarting-EngineUnexpectedlyDown
    6) Repeat

Version-Release number of selected component (if applicable):
Ovirt 4.0 (I'm not sure about the ovirt-hosted-engine-ha version.)


How reproducible:
I manage to reproduce it 3 times

Steps to Reproduce:
1. Migrate hosted-engine manually
2. Put a node in maintenance

Actual results:
VMs are migratted but stays locked and in migration mode.
HA stops and restarts.

Expected results:
VMs migratted.
HA OK.

Additional info:
Manual migration is ok. Rebooting the hosted-engine unlocks VMs, I don't know yet about the HA stopping & restarting.

Comment 1 Bertrand Caplet 2016-06-28 06:48:34 UTC
Created attachment 1173280 [details]
ovirt-hosted-engine-ha Broker log

Comment 2 Roy Golan 2016-06-29 07:00:29 UTC
Please specify the timestamp it happens so I can trace it in the log and also  supply the engine.log

When you say the vm migrated but stated locked I guess you mean their engine status remained 'MigratingFrom' because the engine was down.

Comment 3 Roy Golan 2016-06-29 08:36:11 UTC
see comment 2

Comment 4 Bertrand Caplet 2016-06-29 15:55:32 UTC
So I did dig into logs little more. I had an issue with ovirt-ha-agent service, it couldn't create/open new files because of limits on one node. I did set number of open files for vdsm user to 2048 in /etc/security/limits.conf on the 2 nodes like this :
vdsm    -       nofile  2048

then I restarted  ovirt-ha-broker  ovirt-ha-agent on both nodes.

I retried to put a node in maintenance, and VMs are still stucks in locked & migration mode. They remain in 'Migrating To' status on engine.

However I don't receive anymore emails from ovirt-hosted-engine HA.

I will put more limited logs right now. I did put the node in maintenance @17:35

Comment 5 Bertrand Caplet 2016-06-29 15:56:50 UTC
Created attachment 1174010 [details]
ovirt-hosted-engine-ha Agent log 20160629

Comment 6 Bertrand Caplet 2016-06-29 15:57:37 UTC
Created attachment 1174023 [details]
ovirt-hosted-engine-ha Broker log 20160629

Comment 7 Roy Golan 2016-07-03 08:17:28 UTC
you probably hit Bug 1343005, "too many open files"

Comment 8 Doron Fediuck 2016-07-03 08:18:27 UTC
Please try again once there's a fix bug 1343005 and re-open if needed.

*** This bug has been marked as a duplicate of bug 1343005 ***


Note You need to log in before you can comment on or make changes to this bug.