Bug 1048790
Summary: | Not possible to power off VM that failed migration. | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ilanit Stein <istein> | ||||||||||||
Component: | ovirt-engine | Assignee: | Arik <ahadas> | ||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ilanit Stein <istein> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 3.3.0 | CC: | acathrow, iheim, lpeer, mavital, michal.skrivanek, ofrenkel, Rhev-m-bugs, sherold, yeylon | ||||||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||||||
Target Release: | 3.4.0 | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | virt | ||||||||||||||
Fixed In Version: | ovirt-3.4.0-beta2 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1058764 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2014-06-12 14:04:30 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1058764, 1078909, 1142926 | ||||||||||||||
Attachments: |
|
Description
Ilanit Stein
2014-01-06 10:53:06 UTC
Created attachment 846019 [details]
engine log
Hosts to maint @ 10:58. VDSNetworkException to SPM host while migration running@ 11:02, Try&fail to power off VM @ 11:13.
Created attachment 846085 [details]
host_1 logs
host time 2 hours behind rhevm time.
Created attachment 846086 [details]
host_2 logs
host time 2 hours behind rhevm time.
Created attachment 846110 [details]
host_a_put_in_maint_logs
time is behind engine time in 2 hours.
Created attachment 846111 [details]
host_b_put_in_maint_logs
time is behind engine time in 2 hours.
from our investigation it seems that 'rerun' procedure clears the destinationVdsId for the command, but the command is still in the 'asyncRunningCommands' cache, so on next async command (stop/migrate) we try to use it (reportCompleted) and it fails because destinationVdsId was cleared. we probably need to remove the command from the cache, or make sure code can handle missing destinationVdsId (i favor option 1 if it can work ok) (In reply to Omer Frenkel from comment #7) > we probably need to remove the command from the cache, or make sure code can > handle missing destinationVdsId (i favor option 1 if it can work ok) I agree that it will be good to remove the command from the cache and imo the reportCompleted method is not the place to decrease the pending memory, it should be moved to other async callback methods. But as we understand that this bug happens now because of the addition of the call to decrease pending memory method in reportCompleted and we didn't have other issues with the command being in the cache, I suggest we'll just add null-check at this point to keep it safe and easy to backport and I'll make some refactoring to improve that code in u/s later on. merged to master, pending 3.4 backport Verified on ovirt-engine-3.4.0-0.7.beta2.el6.noarch. Have 3 rhel hosts. - Run VM on host1. - Have host 3 avail memory not sufficient to contain this VM. - Migrate VM (to any host). - VM starts to migrate to host2 (since host 3 memory is full) - While migration is running, Kill qemu process on host2 by: ps aux | grep qemu | grep -v grep | grep -v supervdsmServer | awk '{print $2}' | xargs -I^ kill -9 ^ - As a result: a. migration fail b. Trying another host (host 3) - which fail as well. VM stays running on host1. - Power off the VM succeeded. Closing as part of 3.4.0 |