Bug 1157378 - [engine-backend] hosted-engine: moving the hosted-engine host to maintenance is allowed although migrating the hosted-engine VM is blocked on CDA
Summary: [engine-backend] hosted-engine: moving the hosted-engine host to maintenance ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Doron Fediuck
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On: 1171491
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-27 06:48 UTC by Elad
Modified: 2016-02-10 20:20 UTC (History)
16 users (show)

Fixed In Version: org.ovirt.engine-root-3.5.0-23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 17:14:00 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (51.36 KB, application/x-gzip)
2014-10-27 06:48 UTC, Elad
no flags Details
logs and screenshoots (2.81 MB, application/zip)
2014-12-09 15:34 UTC, Artyom
no flags Details
newer logs (2.13 MB, application/zip)
2014-12-24 13:50 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 35401 0 master MERGED core: check if the HE guest can be migrated before maintenace Never
oVirt gerrit 35428 0 ovirt-engine-3.5 MERGED core: check if the HE guest can be migrated before maintenace Never

Description Elad 2014-10-27 06:48:19 UTC
Created attachment 950890 [details]
engine.log

Description of problem:
On a hosted-engine envi., I deployed a second hosted-engine host using hosted-engine --depoy on it, it was added to the exiting cluster as hosted_engine_2.
I tried to migrate the HostedEngine VM from the first host to the second and blocked on CDA:

2014-10-26 22:51:47,488 WARN  [org.ovirt.engine.core.bll.MigrateVmCommand] (ajp-/127.0.0.1:8702-5) [41d31606] CanDoAction of action MigrateVm failed. Reasons:VAR__ACTION__MIGRAT
E,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName hosted_engine_2,$filterName HA,VAR__DETAIL__NOT_HE_HOST,SCHEDULING_HOST_FILTERED_REASON_WIT
H_DETAIL


Then I tried to put the first host (the one that runs the HostedEngine VM) into maintenance mode. The operation was allowed and wasn't blocked.

Version-Release number of selected component (if applicable):
rhev 3.5 vt7
rhel6.6 host
ovirt-hosted-engine-setup-1.2.1-1.el6ev.noarch
rhevm-3.5.0-0.17.beta.el6ev.noarch
vdsm-4.16.7.1-1.el6ev.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine (I used iSCSI storage)
2. When the engine is up and running, add a second host to the hosted-engine cluster by 'hosted-engine --deploy'.
3. Try to migrate the HostedEngine VM to the new host. The operation is blocked on CDA.
4. Try to put the first host (the one that runs the HostedEngine VM) into maintenance mode.

Actual results:
Putting the host that runs the HostedEngine VM into maintenance mode is allowed although migrating the VM manually is blocked. I set the host to maintenance and it got stuck in 'Preparing for maintenance' forever.

Expected results:
The behavior should be consistent, putting the HostedEngine host in mainantence should be blocked in CDA as migrating the VM manually is.

Additional info: engine.log

Comment 1 Jiri Moskovcak 2014-11-07 14:26:51 UTC
This one is tricky, the engine is not responsible for migrating the HE VM, so it moves the host to "preparing for maintenance" waiting for the agent to do the migration, blocking it in CDA is not done even with the non HE VMs - engine tries to migrate them and if it fails we fail the maintenance command. I can see 2 possible solutions here:

1. ask scheduler if the HE VM can be migrated (block it the same way as we do when trying to migrate it manually as suggested by reporter)

2. teach ha-agent to notify engine if it can't mograte the HE VM, so engine can abort the maintenance command.

Comment 2 Roy Golan 2014-11-10 12:00:57 UTC
proposing not-a-bug

Maintenance is an action that might or might not trigger underlying multiple migration. we don't propagate they're can do actions to the parent Maintenance command.

so you can't ask Migrate and Maintenance to be consistent as they are different things.

Jiri, Doron - any additions?

Comment 3 Jiri Moskovcak 2014-11-10 12:18:22 UTC
(In reply to Roy Golan from comment #2)
> proposing not-a-bug
> 
> Maintenance is an action that might or might not trigger underlying multiple
> migration. we don't propagate they're can do actions to the parent
> Maintenance command.
> 
> so you can't ask Migrate and Maintenance to be consistent as they are
> different things.
> 
> Jiri, Doron - any additions?

I would at least add some (what's reasonable here?) timeout, so the host will come back to active from preparing to maintenance, or should we leave this entirely on the admin?

Comment 4 Sandro Bonazzola 2014-12-01 14:02:24 UTC
All referenced patches have been merged, any reason for keeping this on POST?

Comment 5 Artyom 2014-12-09 15:34:18 UTC
Checked on rhevm-3.5.0-0.23.beta.el6ev.noarch
Have 3 hosts in engine, two of them also in HE environment, and have one vm(HE vm), when have 3 hosts up and put to maintenance one of HE hosts, vm migrated to second host and host drops to maintenance.
But now if I try put to maintenance second HE host, host stuck in 'preparing in maintenance', buut vm migrate to host that already in maintenance(and have score zero) not desired behavior and also from engine I see that both HE hosts have 0 vms.
Attach vdsm, agent logs from both HE host and also screenshot from engine.

Comment 6 Artyom 2014-12-09 15:34:59 UTC
Created attachment 966322 [details]
logs and screenshoots

Comment 7 Doron Fediuck 2014-12-10 14:26:58 UTC
Your engine was hitting bug 1171491.
Please reproduce once its resolved.

Comment 8 Artyom 2014-12-24 13:49:45 UTC
Checked on rhevm-3.5.0-0.27.el6ev.noarch
Have 3 hosts in engine, two of them also in HE environment, and have one vm(HE vm), when have 3 hosts up and put to maintenance one of HE hosts, vm migrated to second host and host drops to maintenance.
But now if I try put to maintenance second HE host, host stuck in 'preparing in maintenance', but vm migrate to host that already in maintenance(and have score zero) not desired behavior and also from engine I see that both HE hosts have 0 vms.
Attach vdsm, agent logs from both HE host.

Comment 9 Artyom 2014-12-24 13:50:29 UTC
Created attachment 972751 [details]
newer logs

Comment 10 Artyom 2014-12-24 13:59:34 UTC
ERROR in engine log appear under I not activate first hosted engine host(that was first turned to local maintenance, and on this host was migrated HE vm after I put second HE host to maintenance, instead of that host have zero 0)

Comment 11 Artyom 2014-12-24 14:30:59 UTC
Not sure but, because we move migration of HE vm to agents, we just not apply HA filter(that must filter all hosts), and this a reason why host stay in "Preparing   to Maintenance" state. Correct behavior if we don't have another HE hosts in rhevm with positive score, just block maintenance with error message.

Comment 12 Michal Skrivanek 2014-12-30 10:40:39 UTC
IMHO we should revert the patches and leave the original behavior as is (as already suggested in comment #2).

Comment 13 Jiri Moskovcak 2015-01-02 10:12:53 UTC
(In reply to Artyom from comment #11)
> Not sure but, because we move migration of HE vm to agents, we just not
> apply HA filter(that must filter all hosts), and this a reason why host stay
> in "Preparing   to Maintenance" state. Correct behavior if we don't have
> another HE hosts in rhevm with positive score, just block maintenance with
> error message.

Actually that's exactly what the patch does: 


        for (VM vm : vms) {
            if (vm.isHostedEngine()) {
                // check if there is host which can be used for HE
                if (!canScheduleVm(vm)) {
                    succeeded = false;
                    appendCustomValue("failedVms", vm.getName(), ",");
                    log.error("ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM");
                }
                // The Hosted Engine vm is migrated by the HA agent
                continue;
            }

- it seems like canSchedule() returns true even if there is no HE capable host, but according to your logs the code worked fine:

2014-12-24 13:35:41,676 ERROR [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (org.ovirt.thread.pool-7-thread-24) [b28f467] ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM

Btw, I can still see NPEs probably caused by 1171491 in the attachment called "newer log", but those are supposed to be fixed in the -27 build. Are you sure you were testing the updated version?

Comment 14 Artyom 2015-01-05 14:33:44 UTC
No I checked it again, it have version rhevm-3.5.0-0.27.el6ev.noarch.
If you have time you can take a look on environment: alukiano-he-1.qa.lab.tlv.redhat.com

Comment 15 Jiri Moskovcak 2015-01-06 12:59:08 UTC
I just tested it on Artyom's setup and it works as expected. Please note that the patch is not supposed to block the maintenance command, it just adds a warning that the HE VM can't be migrated and user intervention is required.

Comment 16 Artyom 2015-01-06 13:14:48 UTC
Verified on rhevm-3.5.0-0.27.el6ev.noarch, engine have error message in log that it can not migrate HE vm("2015-01-06 12:05:39,313 ERROR [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (org.ovirt.thread.pool-7-thread-10) [20c3de6c] ResourceManager::vdsMaintenance - There is not host capable of running the hosted engine VM")

NullPointerExceptions not connect to this bug.

Comment 18 Eyal Edri 2015-02-17 17:14:00 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.