Description of problem: Migrating 20 VMs from VMware to RHV. Environment: on CFME, that has 2 conversion hosts, added via rails console, and configured to max concurrent tasks=10, VDDK. Provider max concurrent migrations was set to 20, using rest api, custom attributes. In UI, migration settings (though I am not sure this has an actual affect): Provider max concurrent migrations=20 Host max concurrent migrations=10 All 20 VMs were directed to migrate to a single VM, though there are 2 Valid conversion hosts in the RHV cluster. Version-Release number of selected component (if applicable): CFME-5.10.5.1 RHV-4.3.4 Additional info: The migration itself succeeded only for 10 out of the 20 VMs, and failed for the rest 10 VMs. I shall open another bug for this 10 VMs migration failure.
Created attachment 1576506 [details] evm.log
Created attachment 1576507 [details] automation.log
Created attachment 1576509 [details] v2v import log
Comment on attachment 1576509 [details] v2v import log v2v import log for one of the 10 VM failing migration (VM "v2v_migration_vm_1").
Created attachment 1576510 [details] v2v import wrapper log v2v import wrapper log, for one of the 10 VM failing migration (VM "v2v_migration_vm_1").
Forgot to mentioned the migrated VMs have a 100GB disk each. and that the migration is from VMware: ISCSI to RHV: ISCSI.
On conversion host: [root@lynx18 import]# rpm -qa | grep v2v v2v-conversion-host-wrapper-1.13.1-1.el7ev.noarch virt-v2v-1.38.2-12.29.lp.el7ev.x86_64 v2v-conversion-host-ansible-1.13.1-1.el7ev.noarch
Here's a read of the Conversion hosts table in rails console. It shows that the conversion hosts are configured to max concurrent migration = 10 (max_concurrent_tasks=10): irb(main):004:0> ConversionHost.all.each { |ch| puts "[#{ch.id}] #{ch.name}" } PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 [1] host_mixed_2 [2] host_mixed_1 => [#<ConversionHost id: 1, name: "host_mixed_2", address: nil, type: nil, resource_type: "Host", resource_id: 3, version: nil, max_concurrent_tasks: 10, vddk_transport_supported: true, ssh_transport_supported: nil, created_at: "2019-06-02 14:57:58", updated_at: "2019-06-02 14:58:30", concurrent_transformation_limit: nil, cpu_limit: nil, memory_limit: nil, network_limit: nil, blockio_limit: nil>, #<ConversionHost id: 2, name: "host_mixed_1", address: nil, type: nil, resource_type: "Host", resource_id: 2, version: nil, max_concurrent_tasks: 10, vddk_transport_supported: true, ssh_transport_supported: nil, created_at: "2019-06-02 18:23:25", updated_at: "2019-06-02 18:24:01", concurrent_transformation_limit: nil, cpu_limit: nil, memory_limit: nil, network_limit: nil, blockio_limit: nil>] Here's the read of Provider max concurrent VM migration ("Transformation max runners") value in rails console, that show it is configured to 20: root@acanan-rhevm vmdb]# rails c Loading production environment (Rails 5.0.7.2) irb(main):001:0> $evm = MiqAeMethodService::MiqAeService.new(MiqAeEngine::MiqAeWorkspaceRuntime.new) => #<MiqAeMethodService::MiqAeService:0x0000000002a27b80 @tracking_label=nil, @drb_server_references=[], @inputs={}, @workspace=#<MiqAeEngine::MiqAeWorkspaceRuntime:0x0000000002a2eea8 @readonly=false, @nodes=[], @current=[], @datastore_cache={}, @class_methods={}, @dom_search=#<MiqAeEngine::MiqAeDomainSearch:0x0000000002a2dbc0 @fqns_id_cache={}, @fqns_id_class_cache={}, @partial_ns=[], @prepend_namespace=nil>, @persist_state_hash={}, @current_state_info={}, @state_machine_objects=[], @ae_user=nil, @rbac=false, @lookup_hash={}>, @persist_state_hash={}, @logger=#<VMDBLogger:0x00000000027ac310 @level=1, @progname=nil, @default_formatter=#<Logger::Formatter:0x00000000027ac1d0 @datetime_format=nil>, @formatter=#<VMDBLogger::Formatter:0x00000000027ac018 @datetime_format=nil>, @logdev=#<Logger::LogDevice:0x00000000027ac130 @shift_period_suffix="%Y%m%d", @shift_size=1048576, @shift_age=0, @filename=#<Pathname:/var/www/miq/vmdb/log/automation.log>, @dev=#<File:/var/www/miq/vmdb/log/automation.log>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00000000027ac0b8>>, @write_lock=#<Thread::Mutex:0x00000000027a7f68>, @local_levels={}, @thread_hash_level_key=:"ThreadSafeLogger#20799880@level">> irb(main):003:0> $evm.vmdb(:ext_management_system).find_by(:name => "RHV").custom_get("Max Transformation Runners") PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 => "20"
Here's another test, in which the conversion host max concurrent tasks value is not fulfilled: VMware->RHV VM migration, of 10 VMs, 100G disk (66% usage), with 2 conversion hosts, VDDK. Migration took 3 hours. Each conversion host, is set to: concurrent_max_tasks = 5. However, 7 VMs were migrated to conversion host #1 3 VMs were migrated to conversion host #2 While each conversion host should have max 5 VM migrated, conversion host #1 had 7 VMs migrated in parallel.
Adding versions to comment #11: CFME-5.10.5.1/RHV-4.3.4
Adding regression keyword, since migration of 20 VMs used to divide well between 2 available hosts, and now all 20 VMs were directed to only one conversion host, though there are 2.
By the results of testing on CFME-5.10.5.1/5.10.6.0 of 20 and 10 VMs migration, we see: 1. Bad balancing between the conversion hosts of the number of migrated VMs. 2. Max_concurrent_tasks per conversion host is not honored. For example, for 10 VMs migrated, and 2 conversion hosts, with max_concurrent_tasks=5, the Migration result was: 7 VMs migrated to one conversion host, and 3 VMs migrated to the second conversion host.
In progress: https://github.com/ManageIQ/manageiq/pull/18860
*** Bug 1719700 has been marked as a duplicate of this bug. ***
Avital, I reviewed the 1.2 docs. Comments: 1. Regarding the concurrent migrations - I think it would be better to mention this part at the beginning of CHAPTER 3. MIGRATING THE VIRTUAL MACHINES, because usually this is set (if desired, as it is optional, of course) before the migration plan is set & started. (though of course u can change it on the fly too, like mentioned in the doc). 2. In the known issues section, the cancel migration bug appear twice, BZ#1666799 - correct bug. BZ#666799 - redundant & Incorrect bug id.
Tested on these version: CFME-5.11.0.18.20190806180636_1dd6378 RHV-4.3.5.3-0.1.el7 RHV-hosts (2, that serve as conversion hosts): * Special packages of: libguestfs libguestfs-tools-c virt-v2v python-libguestfs: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64. * OS Version:RHEL - 7.7 - 9.el7 * OS Description: Red Hat Enterprise Linux Server 7.7 Beta (Maipo) * Kernel Version: 3.10.0 - 957.21.3.el7.x86_64 * KVM Version:2.12.0 - 33.el7 * LIBVIRT Version: libvirt-4.5.0-23.el7 * VDSM Version: vdsm-4.30.19-1.el7ev Did 2 Runs of 20 VMs, once with 100GB disk, and once with 20GB disk. Conversion host max concurrent tasks = 10. Provider concurrent tasks = 20. In both runs, all the 20 VMs failed to migrate, on this new ovirt-engine bug: Bug 1740021 - [v2v][Scale][RHV] 20 VMs migration fail on "timed out waiting for disk to become unlocked" Regarding the VMs distribution, In the first run, the distribution was 8, and 12 VMs In the second run, the distribution was 12, and 8 VMs. The CFME log is set to 'debug' mode. Attached evm.log of the 2 runs. Fabien/Dan, Can you please advise, why the distribution is not even (10:10) as expected, and as we sew in the past versions?
Created attachment 1602934 [details] evm.log1.tgz
Created attachment 1602935 [details] evm.log2.tgz
I checked it on CFME-5.11.0.19/RHV-4.3.5: VDDK migration 20 VMs of 20 GB disk run, on the new RHV I got, and the migration passed, for all 20 VMs. Though each of the conversion hosts are set to max_concurrent_tasks=10, one host got 15 VMs, and the second only 5 VMs. @Fabien, Maybe I am missing something, on how the conversion host are evaluated, (Maybe other considerations are taken account here, that I am not a ware of). My understanding is that the VMs should be distributed evenly. In ALL my runs, in RDU lab RHV systems, the distribution is not even, as it was seen in previous versions of CFME. (using CFME-5.11.0.18, CFME-5.11.0.19).
@ilanit, can we get access to the appliance and run the migration plans on our own ? From the logs, we see that the number of running tasks is not updated, so the least utilized host is not always the same, until the value gets updated. @dan, can you look into this please ?
https://github.com/ManageIQ/manageiq/pull/19213
New commit detected on ManageIQ/manageiq/ivanchuk: https://github.com/ManageIQ/manageiq/commit/a4010f7a3817ebb25f0d770e1d60700650a4120c commit a4010f7a3817ebb25f0d770e1d60700650a4120c Author: Adam Grare <agrare> AuthorDate: Fri Aug 30 08:29:27 2019 -0400 Commit: Adam Grare <agrare> CommitDate: Fri Aug 30 08:29:27 2019 -0400 Merge pull request #19213 from djberg96/conversion_host_pending_state [V2V] Add pending state as a valid active task. (cherry picked from commit c5e268341c65f8c33899a411e46b74d71dc86ffc) https://bugzilla.redhat.com/show_bug.cgi?id=1716283 app/models/conversion_host.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Tested cfme-5.11.0.22 + a fix for this bug rhv-4.3.5.4-0.1.el7 (small scale) v2v migration ended successfully, for all the following tests. VMs were distributed evenly, between the 2 conversion hosts, as expected: test1: 20VMs, 16GB disk, 2 conversion hosts, vddk, provider max concurrent migrations=20, provider max concurrent migrations=10 test2: 20VMs, 16GB disk, 2 conversion hosts, vddk, provider max concurrent migrations=20, provider max concurrent migrations=5 test3: 20VMs, 100GB disk, 2 conversion hosts, vddk, provider max concurrent migrations=20, provider max concurrent migrations=10 Logs can be found here: https://drive.google.com/drive/u/0/folders/1hO3pvxLMP4SKznVDOTJrWA70_lCudxSw : evm_log1.log - last 10:10, 5:5 16GB 20 VMs migration evm_log2.log - last 10:10, 100GB 20 VMs migration
Ilanit, Can this be marked as verified based on Comment 34.
in reply to comment #35 No because I still need to check it is working on CFME-5.11.0.23
Verified on CFME-5.11.0.24/RHV-4.3.5.4-0.1.el7. Tested with 20 VMs, 20GB disk each. 2 conversion hosts. conversion host Concurrent max tasks = 10 Provider concurrent migrations = 20 VMs were distributed evenly, between 2 conversion hosts, 10:10.