Bug 1444611 - Start VM failed with the exception, if the score module does not respond before the timeout
Summary: Start VM failed with the exception, if the score module does not respond befo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-scheduler-proxy
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ovirt-4.1.3
: ---
Assignee: Andrej Krejcir
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-23 08:50 UTC by Artyom
Modified: 2017-07-06 13:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Timeout handling for external scheduler was incorrect. When external function timed out, it returned empty result, which caused an exception when scheduling a VM. Consequence: The VM could not be scheduled and started. Fix: Timeout handling is correct now. Result: Scheduling does not fail and VM can start.
Clone Of:
Environment:
Last Closed: 2017-07-06 13:31:21 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.1+
mgoldboi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine log (8.17 MB, text/plain)
2017-04-23 08:50 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1459911 0 unspecified CLOSED Errata rebase bug 2021-02-22 00:41:40 UTC
oVirt gerrit 76465 0 master MERGED scheduler: Fix empty scores returned by external scheduler 2017-05-05 12:42:41 UTC
oVirt gerrit 76500 0 master MERGED Fix timeout and forced stop of scheduler methods 2017-05-09 13:34:01 UTC
oVirt gerrit 76511 0 ovirt-engine-4.1 MERGED scheduler: Fix empty scores returned by external scheduler 2017-05-15 09:19:15 UTC

Internal Links: 1459911

Description Artyom 2017-04-23 08:50:48 UTC
Created attachment 1273445 [details]
engine log

Description of problem:
Start VM failed with the exception, if the score module does not respond before the timeout

2017-04-23 04:44:41,346-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Command 'org.ovirt.engine.core.bll.RunVmCommand' failed: null
2017-04-23 04:44:41,348-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.sumScoreResults(SchedulingManager.java:816) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runExternalFunctions(SchedulingManager.java:810) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runFunctions(SchedulingManager.java:766) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.selectBestHost(SchedulingManager.java:456) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.schedule(SchedulingManager.java:330) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.getVdsToRunOn(RunVmCommand.java:826) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:227) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:401) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:326) [bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:120) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176) [bll.jar:]
        at org.ovirt.engine.core.bll.SortedMultipleActionsRunnerBase.runCommands(SortedMultipleActionsRunnerBase.java:20) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:92) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]

2017-04-23 04:44:41,359-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] EVENT_ID: USER_FAILED_RUN_VM(54), Correlation ID: cd2602e9-8af0-48c0-9916-be8efbbc1290, Job ID: ffd1b3a8-87cb-4e31-a929-05d9d6a84c58, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM golden_env_mixed_virtio_0 (User: admin@internal-authz).

Version-Release number of selected component (if applicable):
ovirt-scheduler-proxy-0.1.3-7.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install and enable via the engine ovirt-scheduler-proxy
2. Create simple python script under /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py
#!/bin/env python
from time import sleep
class ExternalSchedulerTimeoutPlugin(object):
    def do_score(self, hosts, vm, args):
        sleep(125)
        print [("some_host_uuid", 50)]
# chmod +x /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py
3. Restart ovirt-scheduler-proxy and the ovirt-engine services
4. Create new scheduling policy with the weight ExternalSchedulerTimeoutPlugin module
5. Attach scheduling policy to the cluster
6. Start the VM

Actual results:
Start the VM fails, with the traceback above

Expected results:
Start the VM succeeds with the warning that the engine skipped external scheduling score modules(the same behavior that we have for the filter modules)

Additional info:

Comment 1 Sandro Bonazzola 2017-05-30 14:31:28 UTC
This bug is on modified, with all referenced patches merged on ovirt-engine while the bug is on ovirt-scheduler-proxy.
Please check if the product is the right one, I suppose it should be ovirt-engine.
Please also check modified status is correct, looks like one of the referenced patches has not been backported to 4.1 branch.

Comment 2 Andrej Krejcir 2017-05-30 14:48:11 UTC
One of the patches is for ovirt-scheduler-proxy [1]. It has not been backported because scheduler-proxy does not have 4.1 branch.

[1] - https://gerrit.ovirt.org/#/c/76500/

Comment 3 Sandro Bonazzola 2017-06-07 15:36:41 UTC
(In reply to Andrej Krejcir from comment #2)
> One of the patches is for ovirt-scheduler-proxy [1]. It has not been
> backported because scheduler-proxy does not have 4.1 branch.
> 
> [1] - https://gerrit.ovirt.org/#/c/76500/

Can you please build it for being released in 4.1.3 and add it to 4.1.3_rc1 release configuration as in https://gerrit.ovirt.org/77960 ?

Comment 4 Artyom 2017-06-18 11:33:09 UTC
Verified on rhevm-4.1.3.2-0.1.el7.noarch
2017-06-18 07:30:33,454-04 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-30) [77df6c5e-e853-4925-9883-058535975d3a] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 1540741b
2017-06-18 07:30:33,543-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] Running command: RunVmCommand internal: false. Entities affected :  ID: f3ee19f2-8f89-43a9-ba10-445c10e861f2 Type: VMAction group RUN_VM with role type USER
2017-06-18 07:31:03,602-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] EVENT_ID: EXTERNAL_SCHEDULER_ERROR(10,501), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Running the external scheduler failed: 'Waiting on score functions timed out'


Note You need to log in before you can comment on or make changes to this bug.