Bug 1099068
Summary: | [scale] monitoring: separate VDS and VM monitoring | |||
---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Roy Golan <rgolan> | |
Component: | General | Assignee: | Roy Golan <rgolan> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eldad Marciano <emarcian> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | --- | CC: | bugs, gklein, istein, laravot, michal.skrivanek, mkalinin, rbalakri, rgolan, yeylon, yobshans | |
Target Milestone: | ovirt-3.6.0-rc | Flags: | rule-engine:
ovirt-3.6.0+
ylavi: planning_ack+ rule-engine: devel_ack+ gklein: testing_ack+ |
|
Target Release: | 3.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | ovirt-engine-3.6.0-0.0.master.20150412172306.git55ba764 | Doc Type: | Enhancement | |
Doc Text: |
Separation of VM and Host monitoring increases robustness and performance of large scale deployments. Several issues when hosts became non-responsive were fixed, now such hosts do not affect the rest of the system
|
Story Points: | --- | |
Clone Of: | ||||
: | 1099081 (view as bug list) | Environment: | ||
Last Closed: | 2016-03-11 07:18:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1099081 |
Description
Roy Golan
2014-05-19 12:21:17 UTC
How to reproduce and verify this bug? Exactly like said in the description, after this fix, a slow shutdown vm shouldn't stall the domain monitoring thread. Thus the host won't go into non-operational etc... So, you can either load the system with shutdown vm calls, hack a host to replay with a delay to shutdown and see that the system behaves. related to this thread. i added 120 sec delay in shutdown and destroy VM methods in vdsm. while the shutdown is running refresh vds capabilities were executed from the engine. no issues were found. this scenario is fair enough ? (In reply to Eldad Marciano from comment #3) > related to this thread. > i added 120 sec delay in shutdown and destroy VM methods in vdsm. > while the shutdown is running refresh vds capabilities were executed from > the engine. > no issues were found. > this scenario is fair enough ? Yes that should do the job. Now you need to verify that the update of the pool domains is performed by the Host Monitoring cycle independent of 120 sec stall. Since I don't see a specific log to it, @Liron please give a direction on how to verify that VdsManager.ontimer - IrsBrokerCommand.updateVdsDomainsData(cachedVds, storagePoolId, domainsList); is actually called? You can create a problematic domain report (by blocking the connection for some domain for example), you'll get an engine report that the domain is in problem, then you can perform the operation you added the delay on and unblock the domain. when the updateVdsDomainsData() method is called it should log that the domain has recovered from problem. once storage blocked a vm shutdown (with latency of 120sec). storage unblocked, and recover messages logged into engine log. "recovered from problem. vds: 'host20-*" once the delay for the vm passed, the vm shutdown correctly. moving to verified on top of 3.6.3 |