Bug 867347 - engine: we re-migrate a vm after cancel migration by user and vm completes the migration
engine: we re-migrate a vm after cancel migration by user and vm completes th...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
urgent Severity urgent
: ---
: ---
Assigned To: Omer Frenkel
Dafna Ron
virt
: TestBlocker
Depends On: 873792
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-17 07:06 EDT by Dafna Ron
Modified: 2012-12-04 15:04 EST (History)
11 users (show)

See Also:
Fixed In Version: si23
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 15:04:45 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (376.66 KB, application/x-gzip)
2012-10-17 07:06 EDT, Dafna Ron
no flags Details
logs (716.32 KB, application/x-gzip)
2012-11-05 10:16 EST, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-10-17 07:06:54 EDT
Created attachment 628677 [details]
logs

Description of problem:

we automatically re-migrate vm's that their migration was cancelled by the user which means that even though the user cancelled the migration it will succeed. 

Version-Release number of selected component (if applicable):

si20

How reproducible:

100%

Steps to Reproduce:
1. start vm on two host cluster
2. start migrate
3. cancel migration
  
Actual results:

cancel migration succeeds but engine re-migrates the vm and completes the migration

Expected results:

we should not re-migrate if the user cancelled the migration. 

Additional info: logs
Comment 3 Omer Frenkel 2012-10-21 04:03:44 EDT
http://gerrit.ovirt.org/#/c/8658/
Comment 6 Dafna Ron 2012-11-05 10:16:09 EST
tested in si24

there is still an issue if I cancel the migration right after the migration starts. (full logs will be attached)

2012-11-05 16:59:43,613 INFO  [org.ovirt.engine.core.bll.CancelMigrateVmCommand] (pool-4-thread-46) [f19fb51] Running command: CancelMigrateVmCommand internal: false. Entities affected :  ID: 299b1397-3b99-4477-8106-9b57ecf36b16 Type: VM
2012-11-05 16:59:43,615 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand] (pool-4-thread-46) [f19fb51] START, CancelMigrateVDSCommand(HostName = gold-vdsd, HostId = ce3fab0c-2734-11e2-83bc-001a4a169741, vmId=299b1397-3b99-4477-8106-9b57ecf36b16), log id: 34d3c74c
2012-11-05 16:59:43,700 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand] (pool-4-thread-46) [f19fb51] FINISH, CancelMigrateVDSCommand, log id: 34d3c74c
2012-11-05 16:59:44,061 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-61) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done
2012-11-05 16:59:46,114 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-64) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done
2012-11-05 16:59:48,157 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-54) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done
2012-11-05 16:59:50,201 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-69) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done
2012-11-05 16:59:52,264 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done
2012-11-05 16:59:54,597 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-77) START, DestroyVDSCommand(HostName = gold-vdsd, HostId = ce3fab0c-2734-11e2-83bc-001a4a169741, vmId=299b1397-3b99-4477-8106-9b57ecf36b16, force=false, secondsToWait=0, gracefully=false), log id: 45f73d5d
2012-11-05 16:59:54,668 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-77) FINISH, DestroyVDSCommand, log id: 45f73d5d
2012-11-05 16:59:54,668 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-77) Setting VM migrate 299b1397-3b99-4477-8106-9b57ecf36b16 to status unknown
2012-11-05 16:59:54,669 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-77) vm migrate running in db and not running in vds - add to rerun treatment. vds gold-vdsd
Comment 7 Dafna Ron 2012-11-05 10:16:38 EST
Created attachment 638698 [details]
logs
Comment 8 Omer Frenkel 2012-11-06 09:43:17 EST
It looks like the migration cancel returned successfully to engine,
but despite that, the migration completed successfully (cancel didn't work)

relevant vdsm log (vm-id is 299b1397-3b99-4477-8106-9b57ecf36b16):

Thread-1800::DEBUG::2012-11-05 16:59:40,548::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmMigrate with ({'src': 'gold-vdsd.qa.lab.tlv.redhat.com', 'dst': 'gold-vdsc.qa.lab.tlv.redhat.com:54321', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'method': 'online'},) {} flowID [28cae918]
Thread-1800::DEBUG::2012-11-05 16:59:40,548::API::421::vds::(migrate) {'src': 'gold-vdsd.qa.lab.tlv.redhat.com', 'dst': 'gold-vdsc.qa.lab.tlv.redhat.com:54321', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'method': 'online'}
Thread-1801::DEBUG::2012-11-05 16:59:40,549::vm::125::vm.Vm::(_setupVdsConnection) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Destination server is: gold-vdsc.qa.lab.tlv.redhat.com:54321
Thread-1800::DEBUG::2012-11-05 16:59:40,550::BindingXMLRPC::900::vds::(wrapper) return vmMigrate with {'status': {'message': 'Migration process starting', 'code': 0}}
Thread-1801::DEBUG::2012-11-05 16:59:40,550::vm::127::vm.Vm::(_setupVdsConnection) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Initiating connection with destination
Thread-1801::DEBUG::2012-11-05 16:59:40,612::libvirtvm::214::vm.Vm::(_getCpuStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::CPU stats not available
Thread-1801::DEBUG::2012-11-05 16:59:40,613::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc stats not available
Thread-1801::DEBUG::2012-11-05 16:59:40,613::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda stats not available
Thread-1801::DEBUG::2012-11-05 16:59:40,614::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc latency not available
Thread-1801::DEBUG::2012-11-05 16:59:40,614::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda latency not available
Thread-1801::DEBUG::2012-11-05 16:59:40,616::vm::173::vm.Vm::(_prepareGuest) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration Process begins
Thread-1801::DEBUG::2012-11-05 16:59:40,632::vm::237::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration semaphore acquired
Thread-1801::DEBUG::2012-11-05 16:59:41,090::libvirtvm::454::vm.Vm::(_startUnderlyingMigration) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::starting migration to qemu+tls://gold-vdsc.qa.lab.tlv.redhat.com/system
Thread-1803::DEBUG::2012-11-05 16:59:41,091::libvirtvm::352::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration downtime thread started
Thread-1804::DEBUG::2012-11-05 16:59:41,092::libvirtvm::380::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::starting migration monitor thread
Thread-1805::DEBUG::2012-11-05 16:59:42,210::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmMigrationCancel with ('299b1397-3b99-4477-8106-9b57ecf36b16',) {} flowID [f19fb51]
Thread-1805::DEBUG::2012-11-05 16:59:42,255::BindingXMLRPC::900::vds::(wrapper) return vmMigrationCancel with {'status': {'message': 'Migration process stopped', 'code': 0}}
Thread-1808::DEBUG::2012-11-05 16:59:46,893::task::588::TaskManager.Task::(_updateState) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::moving from state init -> state preparing
Thread-1808::INFO::2012-11-05 16:59:46,893::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1808::INFO::2012-11-05 16:59:46,894::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'af5bb9ad-ebe7-4ade-8d0d-a6882d11af44': {'delay': '0.0111539363861', 'lastCheck': '8.9', 'code': 0, 'valid': True}, u'5140b72b-0eac-4d4c-b475-5724bc9c27b7': {'delay': '0.00832009315491', 'lastCheck': '7.8', 'code': 0, 'valid': True}, u'2d52459b-ccd1-4a99-bff3-1264ca77a2c4': {'delay': '0.00824379920959', 'lastCheck': '7.5', 'code': 0, 'valid': True}, u'940726d2-b911-49ae-aed5-cda5791bb4b6': {'delay': '0.00918412208557', 'lastCheck': '8.0', 'code': 0, 'valid': True}, u'72ec1321-a114-451f-bee1-6790cbca1bc6': {'delay': '0.00177216529846', 'lastCheck': '2.2', 'code': 0, 'valid': True}}
Thread-1808::DEBUG::2012-11-05 16:59:46,894::task::1172::TaskManager.Task::(prepare) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::finished: {u'af5bb9ad-ebe7-4ade-8d0d-a6882d11af44': {'delay': '0.0111539363861', 'lastCheck': '8.9', 'code': 0, 'valid': True}, u'5140b72b-0eac-4d4c-b475-5724bc9c27b7': {'delay': '0.00832009315491', 'lastCheck': '7.8', 'code': 0, 'valid': True}, u'2d52459b-ccd1-4a99-bff3-1264ca77a2c4': {'delay': '0.00824379920959', 'lastCheck': '7.5', 'code': 0, 'valid': True}, u'940726d2-b911-49ae-aed5-cda5791bb4b6': {'delay': '0.00918412208557', 'lastCheck': '8.0', 'code': 0, 'valid': True}, u'72ec1321-a114-451f-bee1-6790cbca1bc6': {'delay': '0.00177216529846', 'lastCheck': '2.2', 'code': 0, 'valid': True}}
Thread-1808::DEBUG::2012-11-05 16:59:46,895::task::588::TaskManager.Task::(_updateState) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::moving from state preparing -> state finished
Thread-1808::DEBUG::2012-11-05 16:59:46,895::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1808::DEBUG::2012-11-05 16:59:46,896::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1808::DEBUG::2012-11-05 16:59:46,896::task::978::TaskManager.Task::(_decref) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::ref 0 aborting False
Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc stats not available
Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda stats not available
Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc latency not available
Thread-1809::DEBUG::2012-11-05 16:59:46,935::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda latency not available
Thread-27::DEBUG::2012-11-05 16:59:48,027::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/af5bb9ad-ebe7-4ade-8d0d-a6882d11af44/metadata bs=4096 count=1' (cwd None)
Thread-27::DEBUG::2012-11-05 16:59:48,037::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000685711 s, 6.0 MB/s\n'; <rc> = 0
Thread-26::DEBUG::2012-11-05 16:59:48,870::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/940726d2-b911-49ae-aed5-cda5791bb4b6/metadata bs=4096 count=1' (cwd None)
Thread-26::DEBUG::2012-11-05 16:59:48,879::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000657354 s, 6.2 MB/s\n'; <rc> = 0
Thread-23::DEBUG::2012-11-05 16:59:49,089::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/5140b72b-0eac-4d4c-b475-5724bc9c27b7/metadata bs=4096 count=1' (cwd None)
Thread-23::DEBUG::2012-11-05 16:59:49,098::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000583391 s, 7.0 MB/s\n'; <rc> = 0
Thread-24::DEBUG::2012-11-05 16:59:49,406::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/2d52459b-ccd1-4a99-bff3-1264ca77a2c4/metadata bs=4096 count=1' (cwd None)
Thread-24::DEBUG::2012-11-05 16:59:49,419::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00533608 s, 768 kB/s\n'; <rc> = 0
Thread-1804::INFO::2012-11-05 16:59:51,095::libvirtvm::412::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Migration Progress: 9 seconds elapsed, 52% of data processed, 52% of mem processed
libvirtEventLoop::DEBUG::2012-11-05 16:59:51,529::libvirtvm::2679::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::event Suspended detail 0 opaque None
libvirtEventLoop::DEBUG::2012-11-05 16:59:52,257::libvirtvm::2679::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::event Stopped detail 3 opaque None
Thread-1801::DEBUG::2012-11-05 16:59:52,260::libvirtvm::367::vm.Vm::(cancel) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::canceling migration downtime thread
Thread-1801::DEBUG::2012-11-05 16:59:52,261::libvirtvm::417::vm.Vm::(stop) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::stopping migration monitor thread
Thread-1803::DEBUG::2012-11-05 16:59:52,261::libvirtvm::364::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration downtime thread exiting
Thread-1801::DEBUG::2012-11-05 16:59:52,275::vm::969::vm.Vm::(setDownStatus) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Changed state to Down: Migration succeeded
Thread-1801::DEBUG::2012-11-05 16:59:52,276::vmChannels::152::vds::(unregister) Delete fileno 28 from listener.
Thread-1801::DEBUG::2012-11-05 16:59:52,276::utils::335::vm.Vm::(stop) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Stop statistics collection
VM Channels Listener::DEBUG::2012-11-05 16:59:52,277::vmChannels::88::vds::(_do_del_channels) fileno 28 was removed from listener.
Thread-1793::DEBUG::2012-11-05 16:59:52,277::utils::366::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Stats thread finished
Thread-1813::DEBUG::2012-11-05 16:59:53,171::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmGetStats with ('299b1397-3b99-4477-8106-9b57ecf36b16',) {}
Thread-1813::DEBUG::2012-11-05 16:59:53,172::BindingXMLRPC::900::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '-7368088289532812409', 'exitMessage': 'Migration succeeded', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'timeOffset': '-43200', 'exitCode': 0}]}
Thread-1814::DEBUG::2012-11-05 16:59:53,192::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmDestroy with
Comment 9 Dafna Ron 2012-11-06 12:40:16 EST
Michal from libvirt looked at it and its a libvirt bug:  vdsm isued 'migrate_cancel' before libvirt gets to 'migrate'; I mean libvirt has set up a job, but was doing some preparation before instructing qemu to migrate

bug opened for libvirt: 

https://bugzilla.redhat.com/show_bug.cgi?id=873792
Comment 10 Barak 2012-11-07 07:31:18 EST
Per discussion with Kaul, I'm moving this BZ back to ON_QA for verification that the engine is not re-migrating. 

The other libvirt issue needs to be handled ASAP as it is a rhevm  GA blocker.
Comment 11 Dafna Ron 2012-11-21 03:16:22 EST
verified on si24.4

Note You need to log in before you can comment on or make changes to this bug.