Created attachment 628677 [details] logs Description of problem: we automatically re-migrate vm's that their migration was cancelled by the user which means that even though the user cancelled the migration it will succeed. Version-Release number of selected component (if applicable): si20 How reproducible: 100% Steps to Reproduce: 1. start vm on two host cluster 2. start migrate 3. cancel migration Actual results: cancel migration succeeds but engine re-migrates the vm and completes the migration Expected results: we should not re-migrate if the user cancelled the migration. Additional info: logs
http://gerrit.ovirt.org/#/c/8658/
tested in si24 there is still an issue if I cancel the migration right after the migration starts. (full logs will be attached) 2012-11-05 16:59:43,613 INFO [org.ovirt.engine.core.bll.CancelMigrateVmCommand] (pool-4-thread-46) [f19fb51] Running command: CancelMigrateVmCommand internal: false. Entities affected : ID: 299b1397-3b99-4477-8106-9b57ecf36b16 Type: VM 2012-11-05 16:59:43,615 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand] (pool-4-thread-46) [f19fb51] START, CancelMigrateVDSCommand(HostName = gold-vdsd, HostId = ce3fab0c-2734-11e2-83bc-001a4a169741, vmId=299b1397-3b99-4477-8106-9b57ecf36b16), log id: 34d3c74c 2012-11-05 16:59:43,700 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand] (pool-4-thread-46) [f19fb51] FINISH, CancelMigrateVDSCommand, log id: 34d3c74c 2012-11-05 16:59:44,061 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-61) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done 2012-11-05 16:59:46,114 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-64) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done 2012-11-05 16:59:48,157 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-54) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done 2012-11-05 16:59:50,201 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-69) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done 2012-11-05 16:59:52,264 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 299b1397-3b99-4477-8106-9b57ecf36b16 is migrating to vds gold-vdsc ignoring it in the refresh until migration is done 2012-11-05 16:59:54,597 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-77) START, DestroyVDSCommand(HostName = gold-vdsd, HostId = ce3fab0c-2734-11e2-83bc-001a4a169741, vmId=299b1397-3b99-4477-8106-9b57ecf36b16, force=false, secondsToWait=0, gracefully=false), log id: 45f73d5d 2012-11-05 16:59:54,668 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-77) FINISH, DestroyVDSCommand, log id: 45f73d5d 2012-11-05 16:59:54,668 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-77) Setting VM migrate 299b1397-3b99-4477-8106-9b57ecf36b16 to status unknown 2012-11-05 16:59:54,669 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-77) vm migrate running in db and not running in vds - add to rerun treatment. vds gold-vdsd
Created attachment 638698 [details] logs
It looks like the migration cancel returned successfully to engine, but despite that, the migration completed successfully (cancel didn't work) relevant vdsm log (vm-id is 299b1397-3b99-4477-8106-9b57ecf36b16): Thread-1800::DEBUG::2012-11-05 16:59:40,548::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmMigrate with ({'src': 'gold-vdsd.qa.lab.tlv.redhat.com', 'dst': 'gold-vdsc.qa.lab.tlv.redhat.com:54321', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'method': 'online'},) {} flowID [28cae918] Thread-1800::DEBUG::2012-11-05 16:59:40,548::API::421::vds::(migrate) {'src': 'gold-vdsd.qa.lab.tlv.redhat.com', 'dst': 'gold-vdsc.qa.lab.tlv.redhat.com:54321', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'method': 'online'} Thread-1801::DEBUG::2012-11-05 16:59:40,549::vm::125::vm.Vm::(_setupVdsConnection) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Destination server is: gold-vdsc.qa.lab.tlv.redhat.com:54321 Thread-1800::DEBUG::2012-11-05 16:59:40,550::BindingXMLRPC::900::vds::(wrapper) return vmMigrate with {'status': {'message': 'Migration process starting', 'code': 0}} Thread-1801::DEBUG::2012-11-05 16:59:40,550::vm::127::vm.Vm::(_setupVdsConnection) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Initiating connection with destination Thread-1801::DEBUG::2012-11-05 16:59:40,612::libvirtvm::214::vm.Vm::(_getCpuStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::CPU stats not available Thread-1801::DEBUG::2012-11-05 16:59:40,613::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc stats not available Thread-1801::DEBUG::2012-11-05 16:59:40,613::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda stats not available Thread-1801::DEBUG::2012-11-05 16:59:40,614::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc latency not available Thread-1801::DEBUG::2012-11-05 16:59:40,614::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda latency not available Thread-1801::DEBUG::2012-11-05 16:59:40,616::vm::173::vm.Vm::(_prepareGuest) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration Process begins Thread-1801::DEBUG::2012-11-05 16:59:40,632::vm::237::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration semaphore acquired Thread-1801::DEBUG::2012-11-05 16:59:41,090::libvirtvm::454::vm.Vm::(_startUnderlyingMigration) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::starting migration to qemu+tls://gold-vdsc.qa.lab.tlv.redhat.com/system Thread-1803::DEBUG::2012-11-05 16:59:41,091::libvirtvm::352::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration downtime thread started Thread-1804::DEBUG::2012-11-05 16:59:41,092::libvirtvm::380::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::starting migration monitor thread Thread-1805::DEBUG::2012-11-05 16:59:42,210::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmMigrationCancel with ('299b1397-3b99-4477-8106-9b57ecf36b16',) {} flowID [f19fb51] Thread-1805::DEBUG::2012-11-05 16:59:42,255::BindingXMLRPC::900::vds::(wrapper) return vmMigrationCancel with {'status': {'message': 'Migration process stopped', 'code': 0}} Thread-1808::DEBUG::2012-11-05 16:59:46,893::task::588::TaskManager.Task::(_updateState) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::moving from state init -> state preparing Thread-1808::INFO::2012-11-05 16:59:46,893::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-1808::INFO::2012-11-05 16:59:46,894::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'af5bb9ad-ebe7-4ade-8d0d-a6882d11af44': {'delay': '0.0111539363861', 'lastCheck': '8.9', 'code': 0, 'valid': True}, u'5140b72b-0eac-4d4c-b475-5724bc9c27b7': {'delay': '0.00832009315491', 'lastCheck': '7.8', 'code': 0, 'valid': True}, u'2d52459b-ccd1-4a99-bff3-1264ca77a2c4': {'delay': '0.00824379920959', 'lastCheck': '7.5', 'code': 0, 'valid': True}, u'940726d2-b911-49ae-aed5-cda5791bb4b6': {'delay': '0.00918412208557', 'lastCheck': '8.0', 'code': 0, 'valid': True}, u'72ec1321-a114-451f-bee1-6790cbca1bc6': {'delay': '0.00177216529846', 'lastCheck': '2.2', 'code': 0, 'valid': True}} Thread-1808::DEBUG::2012-11-05 16:59:46,894::task::1172::TaskManager.Task::(prepare) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::finished: {u'af5bb9ad-ebe7-4ade-8d0d-a6882d11af44': {'delay': '0.0111539363861', 'lastCheck': '8.9', 'code': 0, 'valid': True}, u'5140b72b-0eac-4d4c-b475-5724bc9c27b7': {'delay': '0.00832009315491', 'lastCheck': '7.8', 'code': 0, 'valid': True}, u'2d52459b-ccd1-4a99-bff3-1264ca77a2c4': {'delay': '0.00824379920959', 'lastCheck': '7.5', 'code': 0, 'valid': True}, u'940726d2-b911-49ae-aed5-cda5791bb4b6': {'delay': '0.00918412208557', 'lastCheck': '8.0', 'code': 0, 'valid': True}, u'72ec1321-a114-451f-bee1-6790cbca1bc6': {'delay': '0.00177216529846', 'lastCheck': '2.2', 'code': 0, 'valid': True}} Thread-1808::DEBUG::2012-11-05 16:59:46,895::task::588::TaskManager.Task::(_updateState) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::moving from state preparing -> state finished Thread-1808::DEBUG::2012-11-05 16:59:46,895::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-1808::DEBUG::2012-11-05 16:59:46,896::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-1808::DEBUG::2012-11-05 16:59:46,896::task::978::TaskManager.Task::(_decref) Task=`39deac96-cab6-40d9-8e43-b4298135e714`::ref 0 aborting False Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc stats not available Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::267::vm.Vm::(_getDiskStats) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda stats not available Thread-1809::DEBUG::2012-11-05 16:59:46,934::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hdc latency not available Thread-1809::DEBUG::2012-11-05 16:59:46,935::libvirtvm::296::vm.Vm::(_getDiskLatency) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Disk hda latency not available Thread-27::DEBUG::2012-11-05 16:59:48,027::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/af5bb9ad-ebe7-4ade-8d0d-a6882d11af44/metadata bs=4096 count=1' (cwd None) Thread-27::DEBUG::2012-11-05 16:59:48,037::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000685711 s, 6.0 MB/s\n'; <rc> = 0 Thread-26::DEBUG::2012-11-05 16:59:48,870::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/940726d2-b911-49ae-aed5-cda5791bb4b6/metadata bs=4096 count=1' (cwd None) Thread-26::DEBUG::2012-11-05 16:59:48,879::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000657354 s, 6.2 MB/s\n'; <rc> = 0 Thread-23::DEBUG::2012-11-05 16:59:49,089::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/5140b72b-0eac-4d4c-b475-5724bc9c27b7/metadata bs=4096 count=1' (cwd None) Thread-23::DEBUG::2012-11-05 16:59:49,098::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000583391 s, 7.0 MB/s\n'; <rc> = 0 Thread-24::DEBUG::2012-11-05 16:59:49,406::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd iflag=direct if=/dev/2d52459b-ccd1-4a99-bff3-1264ca77a2c4/metadata bs=4096 count=1' (cwd None) Thread-24::DEBUG::2012-11-05 16:59:49,419::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00533608 s, 768 kB/s\n'; <rc> = 0 Thread-1804::INFO::2012-11-05 16:59:51,095::libvirtvm::412::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Migration Progress: 9 seconds elapsed, 52% of data processed, 52% of mem processed libvirtEventLoop::DEBUG::2012-11-05 16:59:51,529::libvirtvm::2679::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::event Suspended detail 0 opaque None libvirtEventLoop::DEBUG::2012-11-05 16:59:52,257::libvirtvm::2679::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::event Stopped detail 3 opaque None Thread-1801::DEBUG::2012-11-05 16:59:52,260::libvirtvm::367::vm.Vm::(cancel) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::canceling migration downtime thread Thread-1801::DEBUG::2012-11-05 16:59:52,261::libvirtvm::417::vm.Vm::(stop) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::stopping migration monitor thread Thread-1803::DEBUG::2012-11-05 16:59:52,261::libvirtvm::364::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::migration downtime thread exiting Thread-1801::DEBUG::2012-11-05 16:59:52,275::vm::969::vm.Vm::(setDownStatus) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Changed state to Down: Migration succeeded Thread-1801::DEBUG::2012-11-05 16:59:52,276::vmChannels::152::vds::(unregister) Delete fileno 28 from listener. Thread-1801::DEBUG::2012-11-05 16:59:52,276::utils::335::vm.Vm::(stop) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Stop statistics collection VM Channels Listener::DEBUG::2012-11-05 16:59:52,277::vmChannels::88::vds::(_do_del_channels) fileno 28 was removed from listener. Thread-1793::DEBUG::2012-11-05 16:59:52,277::utils::366::vm.Vm::(run) vmId=`299b1397-3b99-4477-8106-9b57ecf36b16`::Stats thread finished Thread-1813::DEBUG::2012-11-05 16:59:53,171::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmGetStats with ('299b1397-3b99-4477-8106-9b57ecf36b16',) {} Thread-1813::DEBUG::2012-11-05 16:59:53,172::BindingXMLRPC::900::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '-7368088289532812409', 'exitMessage': 'Migration succeeded', 'vmId': '299b1397-3b99-4477-8106-9b57ecf36b16', 'timeOffset': '-43200', 'exitCode': 0}]} Thread-1814::DEBUG::2012-11-05 16:59:53,192::BindingXMLRPC::894::vds::(wrapper) client [10.35.97.65]::call vmDestroy with
Michal from libvirt looked at it and its a libvirt bug: vdsm isued 'migrate_cancel' before libvirt gets to 'migrate'; I mean libvirt has set up a job, but was doing some preparation before instructing qemu to migrate bug opened for libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=873792
Per discussion with Kaul, I'm moving this BZ back to ON_QA for verification that the engine is not re-migrating. The other libvirt issue needs to be handled ASAP as it is a rhevm GA blocker.
verified on si24.4