Created attachment 1173278 [details] ovirt-hosted-engine-ha Agent log Description of problem: After upgrade from Ovirt 3.6 to 4.0 When putting a node in maintenance, it prepares for maintenance ; unset SPM if it was. Then migrate all the VMs to the another node. All of this steps are ok (VMs are migratted) but VMs stays locked and in migration mode. I did noticed every 10 minutes or so (pretty random), the ovirt-hosted-engine HA is sending mail in this order : 1) EngineUnexpectedlyDown 2) (StartState-ReinitializeFSM sometimes) 3) EngineDown-EngineStart 4) EngineStart-EngineStarting 5) EngineStarting-EngineUnexpectedlyDown 6) Repeat Version-Release number of selected component (if applicable): Ovirt 4.0 (I'm not sure about the ovirt-hosted-engine-ha version.) How reproducible: I manage to reproduce it 3 times Steps to Reproduce: 1. Migrate hosted-engine manually 2. Put a node in maintenance Actual results: VMs are migratted but stays locked and in migration mode. HA stops and restarts. Expected results: VMs migratted. HA OK. Additional info: Manual migration is ok. Rebooting the hosted-engine unlocks VMs, I don't know yet about the HA stopping & restarting.
Created attachment 1173280 [details] ovirt-hosted-engine-ha Broker log
Please specify the timestamp it happens so I can trace it in the log and also supply the engine.log When you say the vm migrated but stated locked I guess you mean their engine status remained 'MigratingFrom' because the engine was down.
see comment 2
So I did dig into logs little more. I had an issue with ovirt-ha-agent service, it couldn't create/open new files because of limits on one node. I did set number of open files for vdsm user to 2048 in /etc/security/limits.conf on the 2 nodes like this : vdsm - nofile 2048 then I restarted ovirt-ha-broker ovirt-ha-agent on both nodes. I retried to put a node in maintenance, and VMs are still stucks in locked & migration mode. They remain in 'Migrating To' status on engine. However I don't receive anymore emails from ovirt-hosted-engine HA. I will put more limited logs right now. I did put the node in maintenance @17:35
Created attachment 1174010 [details] ovirt-hosted-engine-ha Agent log 20160629
Created attachment 1174023 [details] ovirt-hosted-engine-ha Broker log 20160629
you probably hit Bug 1343005, "too many open files"
Please try again once there's a fix bug 1343005 and re-open if needed. *** This bug has been marked as a duplicate of bug 1343005 ***