Bug 1444611
| Summary: | Start VM failed with the exception, if the score module does not respond before the timeout | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-scheduler-proxy | Reporter: | Artyom <alukiano> | ||||
| Component: | Core | Assignee: | Andrej Krejcir <akrejcir> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | --- | CC: | akrejcir, bugs, dfediuck, mavital, mgoldboi, stirabos | ||||
| Target Milestone: | ovirt-4.1.3 | Keywords: | Triaged | ||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.1+
mgoldboi: planning_ack+ dfediuck: devel_ack+ mavital: testing_ack+ |
||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause:
Timeout handling for external scheduler was incorrect. When external function timed out, it returned empty result, which caused an exception when scheduling a VM.
Consequence:
The VM could not be scheduled and started.
Fix:
Timeout handling is correct now.
Result:
Scheduling does not fail and VM can start.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-07-06 13:31:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This bug is on modified, with all referenced patches merged on ovirt-engine while the bug is on ovirt-scheduler-proxy. Please check if the product is the right one, I suppose it should be ovirt-engine. Please also check modified status is correct, looks like one of the referenced patches has not been backported to 4.1 branch. One of the patches is for ovirt-scheduler-proxy [1]. It has not been backported because scheduler-proxy does not have 4.1 branch. [1] - https://gerrit.ovirt.org/#/c/76500/ (In reply to Andrej Krejcir from comment #2) > One of the patches is for ovirt-scheduler-proxy [1]. It has not been > backported because scheduler-proxy does not have 4.1 branch. > > [1] - https://gerrit.ovirt.org/#/c/76500/ Can you please build it for being released in 4.1.3 and add it to 4.1.3_rc1 release configuration as in https://gerrit.ovirt.org/77960 ? Verified on rhevm-4.1.3.2-0.1.el7.noarch 2017-06-18 07:30:33,454-04 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-30) [77df6c5e-e853-4925-9883-058535975d3a] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 1540741b 2017-06-18 07:30:33,543-04 INFO [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] Running command: RunVmCommand internal: false. Entities affected : ID: f3ee19f2-8f89-43a9-ba10-445c10e861f2 Type: VMAction group RUN_VM with role type USER 2017-06-18 07:31:03,602-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [77df6c5e-e853-4925-9883-058535975d3a] EVENT_ID: EXTERNAL_SCHEDULER_ERROR(10,501), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Running the external scheduler failed: 'Waiting on score functions timed out' |
Created attachment 1273445 [details] engine log Description of problem: Start VM failed with the exception, if the score module does not respond before the timeout 2017-04-23 04:44:41,346-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Command 'org.ovirt.engine.core.bll.RunVmCommand' failed: null 2017-04-23 04:44:41,348-04 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] Exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.scheduling.SchedulingManager.sumScoreResults(SchedulingManager.java:816) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runExternalFunctions(SchedulingManager.java:810) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.runFunctions(SchedulingManager.java:766) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.selectBestHost(SchedulingManager.java:456) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.schedule(SchedulingManager.java:330) [bll.jar:] at org.ovirt.engine.core.bll.RunVmCommand.getVdsToRunOn(RunVmCommand.java:826) [bll.jar:] at org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:227) [bll.jar:] at org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:401) [bll.jar:] at org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:326) [bll.jar:] at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:120) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:] at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204) [bll.jar:] at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176) [bll.jar:] at org.ovirt.engine.core.bll.SortedMultipleActionsRunnerBase.runCommands(SortedMultipleActionsRunnerBase.java:20) [bll.jar:] at org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:92) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121] 2017-04-23 04:44:41,359-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [cd2602e9-8af0-48c0-9916-be8efbbc1290] EVENT_ID: USER_FAILED_RUN_VM(54), Correlation ID: cd2602e9-8af0-48c0-9916-be8efbbc1290, Job ID: ffd1b3a8-87cb-4e31-a929-05d9d6a84c58, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM golden_env_mixed_virtio_0 (User: admin@internal-authz). Version-Release number of selected component (if applicable): ovirt-scheduler-proxy-0.1.3-7.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Install and enable via the engine ovirt-scheduler-proxy 2. Create simple python script under /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py #!/bin/env python from time import sleep class ExternalSchedulerTimeoutPlugin(object): def do_score(self, hosts, vm, args): sleep(125) print [("some_host_uuid", 50)] # chmod +x /usr/share/ovirt-scheduler-proxy/plugins/test_plugin.py 3. Restart ovirt-scheduler-proxy and the ovirt-engine services 4. Create new scheduling policy with the weight ExternalSchedulerTimeoutPlugin module 5. Attach scheduling policy to the cluster 6. Start the VM Actual results: Start the VM fails, with the traceback above Expected results: Start the VM succeeds with the warning that the engine skipped external scheduling score modules(the same behavior that we have for the filter modules) Additional info: