Created attachment 950890 [details] engine.log Description of problem: On a hosted-engine envi., I deployed a second hosted-engine host using hosted-engine --depoy on it, it was added to the exiting cluster as hosted_engine_2. I tried to migrate the HostedEngine VM from the first host to the second and blocked on CDA: 2014-10-26 22:51:47,488 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (ajp-/127.0.0.1:8702-5) [41d31606] CanDoAction of action MigrateVm failed. Reasons:VAR__ACTION__MIGRAT E,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName hosted_engine_2,$filterName HA,VAR__DETAIL__NOT_HE_HOST,SCHEDULING_HOST_FILTERED_REASON_WIT H_DETAIL Then I tried to put the first host (the one that runs the HostedEngine VM) into maintenance mode. The operation was allowed and wasn't blocked. Version-Release number of selected component (if applicable): rhev 3.5 vt7 rhel6.6 host ovirt-hosted-engine-setup-1.2.1-1.el6ev.noarch rhevm-3.5.0-0.17.beta.el6ev.noarch vdsm-4.16.7.1-1.el6ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Deploy hosted-engine (I used iSCSI storage) 2. When the engine is up and running, add a second host to the hosted-engine cluster by 'hosted-engine --deploy'. 3. Try to migrate the HostedEngine VM to the new host. The operation is blocked on CDA. 4. Try to put the first host (the one that runs the HostedEngine VM) into maintenance mode. Actual results: Putting the host that runs the HostedEngine VM into maintenance mode is allowed although migrating the VM manually is blocked. I set the host to maintenance and it got stuck in 'Preparing for maintenance' forever. Expected results: The behavior should be consistent, putting the HostedEngine host in mainantence should be blocked in CDA as migrating the VM manually is. Additional info: engine.log
This one is tricky, the engine is not responsible for migrating the HE VM, so it moves the host to "preparing for maintenance" waiting for the agent to do the migration, blocking it in CDA is not done even with the non HE VMs - engine tries to migrate them and if it fails we fail the maintenance command. I can see 2 possible solutions here: 1. ask scheduler if the HE VM can be migrated (block it the same way as we do when trying to migrate it manually as suggested by reporter) 2. teach ha-agent to notify engine if it can't mograte the HE VM, so engine can abort the maintenance command.
proposing not-a-bug Maintenance is an action that might or might not trigger underlying multiple migration. we don't propagate they're can do actions to the parent Maintenance command. so you can't ask Migrate and Maintenance to be consistent as they are different things. Jiri, Doron - any additions?
(In reply to Roy Golan from comment #2) > proposing not-a-bug > > Maintenance is an action that might or might not trigger underlying multiple > migration. we don't propagate they're can do actions to the parent > Maintenance command. > > so you can't ask Migrate and Maintenance to be consistent as they are > different things. > > Jiri, Doron - any additions? I would at least add some (what's reasonable here?) timeout, so the host will come back to active from preparing to maintenance, or should we leave this entirely on the admin?
All referenced patches have been merged, any reason for keeping this on POST?
Checked on rhevm-3.5.0-0.23.beta.el6ev.noarch Have 3 hosts in engine, two of them also in HE environment, and have one vm(HE vm), when have 3 hosts up and put to maintenance one of HE hosts, vm migrated to second host and host drops to maintenance. But now if I try put to maintenance second HE host, host stuck in 'preparing in maintenance', buut vm migrate to host that already in maintenance(and have score zero) not desired behavior and also from engine I see that both HE hosts have 0 vms. Attach vdsm, agent logs from both HE host and also screenshot from engine.
Created attachment 966322 [details] logs and screenshoots
Your engine was hitting bug 1171491. Please reproduce once its resolved.
Checked on rhevm-3.5.0-0.27.el6ev.noarch Have 3 hosts in engine, two of them also in HE environment, and have one vm(HE vm), when have 3 hosts up and put to maintenance one of HE hosts, vm migrated to second host and host drops to maintenance. But now if I try put to maintenance second HE host, host stuck in 'preparing in maintenance', but vm migrate to host that already in maintenance(and have score zero) not desired behavior and also from engine I see that both HE hosts have 0 vms. Attach vdsm, agent logs from both HE host.
Created attachment 972751 [details] newer logs
ERROR in engine log appear under I not activate first hosted engine host(that was first turned to local maintenance, and on this host was migrated HE vm after I put second HE host to maintenance, instead of that host have zero 0)
Not sure but, because we move migration of HE vm to agents, we just not apply HA filter(that must filter all hosts), and this a reason why host stay in "Preparing to Maintenance" state. Correct behavior if we don't have another HE hosts in rhevm with positive score, just block maintenance with error message.
IMHO we should revert the patches and leave the original behavior as is (as already suggested in comment #2).
(In reply to Artyom from comment #11) > Not sure but, because we move migration of HE vm to agents, we just not > apply HA filter(that must filter all hosts), and this a reason why host stay > in "Preparing to Maintenance" state. Correct behavior if we don't have > another HE hosts in rhevm with positive score, just block maintenance with > error message. Actually that's exactly what the patch does: for (VM vm : vms) { if (vm.isHostedEngine()) { // check if there is host which can be used for HE if (!canScheduleVm(vm)) { succeeded = false; appendCustomValue("failedVms", vm.getName(), ","); log.error("ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM"); } // The Hosted Engine vm is migrated by the HA agent continue; } - it seems like canSchedule() returns true even if there is no HE capable host, but according to your logs the code worked fine: 2014-12-24 13:35:41,676 ERROR [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (org.ovirt.thread.pool-7-thread-24) [b28f467] ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM Btw, I can still see NPEs probably caused by 1171491 in the attachment called "newer log", but those are supposed to be fixed in the -27 build. Are you sure you were testing the updated version?
No I checked it again, it have version rhevm-3.5.0-0.27.el6ev.noarch. If you have time you can take a look on environment: alukiano-he-1.qa.lab.tlv.redhat.com
I just tested it on Artyom's setup and it works as expected. Please note that the patch is not supposed to block the maintenance command, it just adds a warning that the HE VM can't be migrated and user intervention is required.
Verified on rhevm-3.5.0-0.27.el6ev.noarch, engine have error message in log that it can not migrate HE vm("2015-01-06 12:05:39,313 ERROR [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (org.ovirt.thread.pool-7-thread-10) [20c3de6c] ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM") NullPointerExceptions not connect to this bug.
rhev 3.5.0 was released. closing.