Bug 1243811 - vm with dedicate host fails to run on other host, if dedicated host is in maintenance
Summary: vm with dedicate host fails to run on other host, if dedicated host is in mai...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.0-ga
: 3.6.0
Assignee: Dudi Maroshi
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-16 11:46 UTC by Omer Frenkel
Modified: 2016-02-10 19:19 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-04 11:37:09 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-3.6.0+
rule-engine: blocker+
mgoldboi: Triaged+
mgoldboi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1266041 0 unspecified CLOSED Confusing VM pinning to Host behaviour 2021-02-22 00:41:40 UTC
oVirt gerrit 44243 0 master MERGED backend: Fix NPE in decreasePendingVm() 2020-10-02 15:15:09 UTC
oVirt gerrit 44796 0 master MERGED core: fix scheduler, ignoring white list if there are dedicates hosts 2020-10-02 15:15:02 UTC
oVirt gerrit 45244 0 ovirt-engine-3.6 MERGED core: fix scheduler, ignoring white list if there are dedicates hosts 2020-10-02 15:15:03 UTC
oVirt gerrit 45816 0 master MERGED scheduling: Add PreferredHosts policy unit 2020-10-02 15:15:03 UTC
oVirt gerrit 47144 0 ovirt-engine-3.6.0 MERGED core: fix scheduler, ignoring white list if there are dedicates hosts 2020-10-02 15:15:02 UTC

Internal Links: 1266041

Description Omer Frenkel 2015-07-16 11:46:42 UTC
Description of problem:
vm with dedicated host and migration policy of "Allow automatic and manual migration" fails to run if the host is in maintenance, although other host can run it.

Version-Release number of selected component (if applicable):
3.6 master

How reproducible:
always

Steps to Reproduce:
1. have 2 up hosts in a cluster (A & B)
2. create vm and set it to start running on specific host A
3. move host A to maintenance
4. start vm

Actual results:
vm fails to run, exception in log:

2015-07-16 14:40:04,401 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-58) [3df9f7f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[e01c6100-3cc6-4f44-9308-38f562c1012c=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2015-07-16 14:40:04,471 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-58) [3df9f7f] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='e01c6100-3cc6-4f44-9308-38f562c1012c'}), log id: 6c417e32
2015-07-16 14:40:04,471 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-58) [3df9f7f] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 6c417e32
2015-07-16 14:40:04,544 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Running command: RunVmCommand internal: false. Entities affected :  ID: e01c6100-3cc6-4f44-9308-38f562c1012c Type: VMAction group RUN_VM with role type USER
2015-07-16 14:40:04,589 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Can't find VDS to run the VM 'e01c6100-3cc6-4f44-9308-38f562c1012c' on, so this VM will not be run.
2015-07-16 14:40:04,590 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Lock freed to object 'EngineLock:{exclusiveLocks='[e01c6100-3cc6-4f44-9308-38f562c1012c=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2015-07-16 14:40:04,590 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Command 'org.ovirt.engine.core.bll.RunVmCommand' failed: null
2015-07-16 14:40:04,590 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Exception: java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) [rt.jar:1.7.0_79]
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) [rt.jar:1.7.0_79]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.GetVdsManager(ResourceManager.java:305) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.GetVdsManager(ResourceManager.java:301) [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.RunVmCommandBase.getMonitor(RunVmCommandBase.java:336) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommandBase.getBlockingQueue(RunVmCommandBase.java:326) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommandBase.decreasePendingVm(RunVmCommandBase.java:297) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommandBase.decreasePendingVm(RunVmCommandBase.java:291) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommandBase.runningFailed(RunVmCommandBase.java:137) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommand.runningFailed(RunVmCommand.java:1137) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:288) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:411) [bll.jar:]
	at org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:335) [bll.jar:]
	at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1211) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1355) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1979) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:374) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:202) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:170) [bll.jar:]
	at org.ovirt.engine.core.bll.SortedMultipleActionsRunnerBase.runCommands(SortedMultipleActionsRunnerBase.java:20) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:179) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:92) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_79]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_79]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_79]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_79]
	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_79]

2015-07-16 14:40:04,602 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-22) [3df9f7f] Correlation ID: 3df9f7f, Job ID: 3efd481e-c00b-4680-8c25-31c681d12e43, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM f21-new (User: admin@internal).



Expected results:
vm should run on host B

Additional info:
the above is all relevant log, no other info in the log.
i verified that manually the vm can run on host B (run once and select the host)

Comment 1 Artyom 2015-07-23 10:29:08 UTC
This bug not appear under version ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch.

Comment 2 Omer Frenkel 2015-08-03 12:26:50 UTC
please note that the exception was already fixed by 
https://gerrit.ovirt.org/#/c/44243/

but still there is an issue with the scheduler that doesn't return host in getVdsToRunOn() although there is available host to run on.

Comment 3 Dudi Maroshi 2015-08-12 09:59:16 UTC
Problem reconstruction failed on master build.

Found a different problem, here is the reconstruction scenario.
1. Have 2 hosts in cluster A, B. Both hosts idle with no running vms.
2. Assign vm dedicated hosts A and B.
3. Start vm (vm starts on A).
4. Put host A on maintenance
5. vm migrating to host B.
6. Migration failed!, host A is pending maintenance.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7. Stop vm, host A reached maintenance status.
8. Start vm, vm starts on host B. As expected.

Yet migration fails, manual or automatic. Migration is not working.
Reported dug 1252820

Comment 4 Omer Frenkel 2015-08-12 14:22:20 UTC
why did you move to modified without a fix?
i verified on latest master this still happens.

Comment 5 Dudi Maroshi 2015-08-12 15:24:03 UTC
Bug is not reproducible for Dudi Maroshi, running oVirt master build.
Nor for Artyom. See comment 1.

Comment 6 Omer Frenkel 2015-08-12 16:12:01 UTC
please describe in details how you test and what was the result

Comment 7 Dudi Maroshi 2015-08-13 07:09:49 UTC
In reply to comment 6.

Prepare correct build:
----------------------
cd ~/git/ovirt-engine
git checkout master
git fetch
git rebase origin/master
make clean install-dev PREFIX="$HOME/ovirt-engine"

Prepare and run correct engine
------------------------------
$HOME/ovirt-engine/bin/engine-setup --jboss-home="${JBOSS_HOME}"
echo "ENGINE_JAVA_MODULEPATH=\"/usr/share/ovirt-engine-wildfly-overlay/modules:\${ENGINE_JAVA_MODULEPATH}\"" \
  ~/ovirt-engine/etc/ovirt-engine/engine.conf.d/10-setup-jboss-overlay.conf
cd $HOME/ovirt-engine
./share/ovirt-engine/services/ovirt-engine/ovirt-engine.py start

Reconstruct bug
---------------
1. Have to active hosts A and B. With no running VMs.
2. Create a new VM
3. On newly created VM, apply dedicated host A and host B. Apply migration policy "Allow manual and automatic migration"
4. Run newly created VM. Assuming it run on host A. Wait for up state.
5. Stop newly create VM.
6. Put host A on maintenance.
7. Run newly created VM.
8. Check VM runs on host B.

Comment 8 Omer Frenkel 2015-08-13 07:15:28 UTC
> 3. On newly created VM, apply dedicated host A and host B. Apply migration policy "Allow manual and automatic migration"

this is the problem ^^
if you check the bug description, you should set ONLY host A
then put it to maintenance, vm should still run on host B, but it doesnt.

i verified again on latest master this happens, re-opening the bug.

Comment 9 Dudi Maroshi 2015-08-13 09:06:56 UTC
Had private bug demo with ofrenkel .
Bug was reconstructed, understood and fixed.

Comment 10 Martin Sivák 2015-09-03 09:06:34 UTC
Omer, what leads you to the assumption that the VM should start on host B? If Start running on contains only host A, then the VM can't be started unless host A is available.

There is nothing in the documentation that would imply otherwise, the RHEV-M guide actually says:

10.5.5 table 10.8

"The virtual machine will start running on a particular host in the cluster. However, the Manager or an administrator can migrate the virtual machine to a different host in the cluster depending on the migration and high-availability settings of the virtual machine."

Comment 11 Omer Frenkel 2015-09-06 13:36:26 UTC
(In reply to Martin Sivák from comment #10)
> Omer, what leads you to the assumption that the VM should start on host B?
> If Start running on contains only host A, then the VM can't be started
> unless host A is available.

its not an assumption, this is how it always worked.
(be my guest to check any previous version)
since i choose: "Allow manual and automatic migration" the system can choose to migrate the vm, and also choose to start it on another host. the list of hosts is just a preference. think of this use case:
i prefer my vm to start on host A first, if possible. otherwise any other host..

> 
> There is nothing in the documentation that would imply otherwise, the RHEV-M
> guide actually says:
> 
> 10.5.5 table 10.8
> 
> "The virtual machine will start running on a particular host in the cluster.
> However, the Manager or an administrator can migrate the virtual machine to
> a different host in the cluster depending on the migration and
> high-availability settings of the virtual machine."

docs need to be updated.

Comment 12 Martin Sivák 2015-09-07 10:29:15 UTC
Omer, the behaviour you describe has not been there since at least oVirt 3.3 (since the new scheduler was introduced). And the documentation actually describes the current code accurately.

Is there any use case associated with the preferred host? It looks like it can be used to do manual scheduling override... but it is not useful or predictable at all:

- The engine might start the VM on your preferred host and migrate it immediately because of balancing (making the preference setting useless in this case)
- or it can start your VM anywhere else in the case you describe (and the preference setting is again useless).

I do not see any other scenario where having preference together with the ability to choose any other host would make sense.


Dudi's patch broke RunVmOnce (internally, it worked due to another internal bug :) and MigrateTo flows for example so we have to decide what the behaviour should be.

Comment 13 Omer Frenkel 2015-09-07 14:26:45 UTC
I just verified this flow works as i described on rhevm-3.5.4.2-1.3.el6ev
created vm and set the specific host to 'X' and "Allow manual and automatic migration"
moved host 'X' to maintenance and started the vm.
the vm started on another host.

I'm not sure if there is a use case for this flow, but its a regression.
I use this flow for my needs: 
during development i want vms to start on specific host, if its available, but sometimes the host is used by other developer, and i still want the vm to start.

Comment 14 Martin Sivák 2015-09-07 15:43:48 UTC
Hmm.. interesting (because the code wasn't exactly written like that..), would you expect Run Once and Migrate To to behave the same or should those fail when the host is not available?

There was quite a lot of confusion in how the scheduling API (whitelist vs destHostId) was used and this needs to be cleared up.

Comment 15 Omer Frenkel 2015-09-13 09:25:32 UTC
migrate - yes, if i allow automatic migration, then first try to migrate to my selected (preferred) host(s), if not available, try anything else

run-once - i guess that its missing the migration option, but as it is right now - if user select run-once and specify host X, it should not try anything else if X is not available/fails to run the vm.

adding need-info on Moran so he could share his opinion, if different than mine.

Comment 16 Michal Skrivanek 2015-09-18 09:31:45 UTC
(In reply to Omer Frenkel from comment #15)
> migrate - yes, if i allow automatic migration, then first try to migrate to
> my selected (preferred) host(s), if not available, try anything else

not sure. We then probably need to differentiate between a hard constraint list of allowed host(s). When you only want to express a preference then Affinity would be the right feature to kick in, no?


> run-once - i guess that its missing the migration option, but as it is right
> now - if user select run-once and specify host X, it should not try anything
> else if X is not available/fails to run the vm.

I believe that's correct

Comment 17 Red Hat Bugzilla Rules Engine 2015-09-22 07:43:45 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 18 Roy Golan 2015-09-24 07:30:47 UTC
This issue should be split in to 2. We first fix this and open a differnt bz for the evolving of the pinning, affinity of vm to host work.

Comment 19 Roy Golan 2015-09-24 11:00:24 UTC
Martin please adapt your patch to fix the behaviour according to this bug, preserving the behaviour as is.

Comment 20 Roy Golan 2015-10-08 10:13:37 UTC
https://gerrit.ovirt.org/#/c/45244 is enough to fix it. Dudi please revive the patch

Comment 21 Artyom 2015-10-25 12:43:51 UTC
Verified on rhevm-3.6.0.2-0.1.el6.noarch
1) Create vm with start on equal to host_1
2) put host_1 to maintenance
3) start vm
4) vm start on host_2

Comment 23 Sandro Bonazzola 2015-11-04 11:37:09 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.