Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1444611

Summary: Start VM failed with the exception, if the score module does not respond before the timeout
Product: [oVirt] ovirt-scheduler-proxy Reporter: Artyom <alukiano>
Component: CoreAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: low    
Version: ---CC: akrejcir, bugs, dfediuck, mavital, mgoldboi, stirabos
Target Milestone: ovirt-4.1.3Keywords: Triaged
Target Release: ---Flags: rule-engine: ovirt-4.1+
mgoldboi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Timeout handling for external scheduler was incorrect. When external function timed out, it returned empty result, which caused an exception when scheduling a VM. Consequence: The VM could not be scheduled and started. Fix: Timeout handling is correct now. Result: Scheduling does not fail and VM can start.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-06 13:31:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log none

Description Artyom 2017-04-23 08:50:48 UTC
Created attachment 1273445 [details]
engine log

Description of problem:
Start VM failed with the exception, if the score module does not respond before the timeout

2017-04-23 04:44:41,346-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Command 'org.ovirt.engine.core.bll.RunVmCommand' failed: null
2017-04-23 04:44:41,348-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.sumScoreResults(SchedulingManager.java:816) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runExternalFunctions(SchedulingManager.java:810) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runFunctions(SchedulingManager.java:766) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.selectBestHost(SchedulingManager.java:456) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.schedule(SchedulingManager.java:330) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.getVdsToRunOn(RunVmCommand.java:826) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:227) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:401) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:326) [bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:120) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176) [bll.jar:]
        at org.ovirt.engine.core.bll.SortedMultipleActionsRunnerBase.runCommands(SortedMultipleActionsRunnerBase.java:20) [bll.jar:]
        at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:92) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]

2017-04-23 04:44:41,359-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] EVENT_ID: USER_FAILED_RUN_VM(54), Correlation ID: cd2602e9-8af0-48c0-9916-be8efbbc1290, Job ID: ffd1b3a8-87cb-4e31-a929-05d9d6a84c58, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM golden_env_mixed_virtio_0 (User: admin@internal-authz).

Version-Release number of selected component (if applicable):
ovirt-scheduler-proxy-0.1.3-7.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install and enable via the engine ovirt-scheduler-proxy
2. Create simple python script under /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py
#!/bin/env python
from time import sleep
class ExternalSchedulerTimeoutPlugin(object):
    def do_score(self, hosts, vm, args):
        sleep(125)
        print [("some_host_uuid", 50)]
# chmod +x /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py
3. Restart ovirt-scheduler-proxy and the ovirt-engine services
4. Create new scheduling policy with the weight ExternalSchedulerTimeoutPlugin module
5. Attach scheduling policy to the cluster
6. Start the VM

Actual results:
Start the VM fails, with the traceback above

Expected results:
Start the VM succeeds with the warning that the engine skipped external scheduling score modules(the same behavior that we have for the filter modules)

Additional info:

Comment 1 Sandro Bonazzola 2017-05-30 14:31:28 UTC
This bug is on modified, with all referenced patches merged on ovirt-engine while the bug is on ovirt-scheduler-proxy.
Please check if the product is the right one, I suppose it should be ovirt-engine.
Please also check modified status is correct, looks like one of the referenced patches has not been backported to 4.1 branch.

Comment 2 Andrej Krejcir 2017-05-30 14:48:11 UTC
One of the patches is for ovirt-scheduler-proxy [1]. It has not been backported because scheduler-proxy does not have 4.1 branch.

[1] - https://gerrit.ovirt.org/#/c/76500/

Comment 3 Sandro Bonazzola 2017-06-07 15:36:41 UTC
(In reply to Andrej Krejcir from comment #2)
> One of the patches is for ovirt-scheduler-proxy [1]. It has not been
> backported because scheduler-proxy does not have 4.1 branch.
> 
> [1] - https://gerrit.ovirt.org/#/c/76500/

Can you please build it for being released in 4.1.3 and add it to 4.1.3_rc1 release configuration as in https://gerrit.ovirt.org/77960 ?

Comment 4 Artyom 2017-06-18 11:33:09 UTC
Verified on rhevm-4.1.3.2-0.1.el7.noarch
2017-06-18 07:30:33,454-04 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-30) [77df6c5e-e853-4925-9883-058535975d3a] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 1540741b
2017-06-18 07:30:33,543-04 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] Running command: RunVmCommand internal: false. Entities affected :  ID: f3ee19f2-8f89-43a9-ba10-445c10e861f2 Type: VMAction group RUN_VM with role type USER
2017-06-18 07:31:03,602-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] EVENT_ID: EXTERNAL_SCHEDULER_ERROR(10,501), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Running the external scheduler failed: 'Waiting on score functions timed out'