Description of problem: Apparently the underlying migration never started, there was no qemu process on the destination (just wrapper), but vdsm was still monitoring the migration on the source. SOURCE: 2018-04-19 23:09:40,372-0500 INFO (jsonrpc/5) [vdsm.api] START migrate(params={u'incomingLimit': 2, u'src': u'rs-rhev-h-2', u'dstqemu': u'172.28.43.240', u'autoConverge': u'true', u'tunneled': u'false', u'enableGuestEvents': True, u'dst': u'rs-rhev-h-5:54321', u'convergenceSchedule': {u'init': [{u'params': [u'100'], u'name': u'setDowntime'}], u'stalling': [{u'action': {u'params': [u'150'], u'name': u'setDowntime'}, u'limit': 1}, {u'action': {u'params': [u'200'], u'name': u'setDowntime'}, u'limit': 2}, {u'action': {u'params': [u'300'], u'name': u'setDowntime'}, u'limit': 3}, {u'action': {u'params': [u'400'], u'name': u'setDowntime'}, u'limit': 4}, {u'action': {u'params': [u'500'], u'name': u'setDowntime'}, u'limit': 6}, {u'action': {u'params': [], u'name': u'abort'}, u'limit': -1}]}, u'vmId': u'0bda3dbe-fdaa-4213-ac86-939388c328eb', u'abortOnError': u'true', u'outgoingLimit': 2, u'compressed': u'false', u'maxBandwidth': 10000, u'method': u'online', 'mode': 'remote'}) from=::ffff:172.28.43.235,49396, flow_id=d9c63f11-5cc9-4723-859f-d37875799bb7 (api:46) 2018-04-19 23:09:43,917-0500 INFO (migsrc/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Creation of destination VM took: 3 seconds (migration:455) 2018-04-19 23:09:43,917-0500 INFO (migsrc/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') starting migration to qemu+tls://rs-rhev-h-5/system with miguri tcp://172.28.43.240 (migration:480) 2018-04-19 23:09:53,921-0500 INFO (migmon/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Migration Progress: 10 seconds elapsed, 0% of data processed, total data: 0MB, processed data: 0MB, remaining data: 0MB, transfer speed 0MBps, zero pages: 0MB, compressed: 0MB, dirty rate: 0, memory iteration: 0 (migration:815) ..... until .... 2018-04-23 12:45:00,759-0500 INFO (migmon/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Migration Progress: 308116 seconds elapsed, 0% of data processed, total data: 0MB, processed data: 0MB, remaining data: 0MB, transfer speed 0MBps, zero pages: 0MB, compressed: 0MB, dirty rate: 0, memory iteration: 0 (migration:815) DESTINATION: 2018-04-19 23:09:40,709-0500 INFO (vm/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') VM wrapper has started (vm:1968) 2018-04-19 23:09:40,734-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: (clientIF:380) 2018-04-19 23:09:41,391-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/9b2ab9b3-93e7-4ba8-bfdd-7d11f3946b24/bfb17f01-24e7-4964-8dfa-07b2b72471c2 (clientIF:380) 2018-04-19 23:09:41,790-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/80f95b1f-0fbe-4e4c-ae6f-38a830c94f48/bc1452c2-c7bc-42e9-b708-723dd710dbee (clientIF:380) 2018-04-19 23:09:42,285-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/628520c2-f7b8-4c49-b588-f1523c8e1b35/70a359c2-52d9-42fe-9d5e-e0c063f34527 (clientIF:380) 2018-04-19 23:09:42,581-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/e3588e99-481e-47e3-b9d0-13b50b45a1e1/290f010a-47ca-4e6b-ad5f-e64fadab6dcd (clientIF:380) 2018-04-19 23:09:43,311-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/46252ec0-f4b5-46ec-a2a4-433830121463/cb38b7c6-1484-4d43-97aa-0bbbc5c6abe1 (clientIF:380) 2018-04-19 23:09:43,773-0500 INFO (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/739aae87-730a-4cae-9079-dd41d9ff315c/1188c234-e09e-4290-bc28-a8e12f4067e0 (clientIF:380) 2018-04-19 23:09:43,885-0500 INFO (jsonrpc/7) [root] (hooks:108) 2018-04-19 23:09:43,912-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate succeeded in 3.23 seconds (__init__:539) 2018-04-19 23:18:15,484-0500 INFO (jsonrpc/5) [throttled] Current getAllVmStats: {u'0bda3dbe-fdaa-4213-ac86-939388c328eb': 'Migration Destination', u'e45c70fe-9e35-46ae-91a9-f1b128e4820e': 'Migration Destination'} (throttledlog:105) .... until .... 2018-04-23 13:50:12,541-0500 INFO (jsonrpc/2) [throttled] Current getAllVmStats: {u'0bda3dbe-fdaa-4213-ac86-939388c328eb': 'Migration Destination', u'e45c70fe-9e35-46ae-91a9-f1b128e4820e': 'Migration Destination'} (throttledlog:105) 2018-04-23 13:50:47,326-0500 INFO (jsonrpc/4) [vdsm.api] START destroy(gracefulAttempts=1) from=::ffff:172.28.43.235,39460, flow_id=64fc1d97 (api:46) 2018-04-23 13:50:47,327-0500 INFO (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Release VM resources (vm:4298) 2018-04-23 13:50:47,330-0500 INFO (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Stopping connection (guestagent:430) 2018-04-23 13:50:47,338-0500 INFO (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Stopping connection (guestagent:430) 2018-04-23 13:50:47,338-0500 WARN (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/0bda3dbe-fdaa-4213-ac86-939388c328eb.com.redhat.rhevm.vdsm already removed (utils:120) 2018-04-23 13:50:47,339-0500 WARN (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/0bda3dbe-fdaa-4213-ac86-939388c328eb.org.qemu.guest_agent.0 already removed (utils:120) 2018-04-23 13:50:47,340-0500 INFO (jsonrpc/4) [vdsm.api] START inappropriateDevices(thiefId=u'0bda3dbe-fdaa-4213-ac86-939388c328eb') from=::ffff:172.28.43.235,39460, flow_id=64fc1d97, task_id=86eda50c-3031-4f6d-8cc1-ca63b26bf078 (api:46) 2018-04-23 13:50:47,341-0500 INFO (jsonrpc/4) [vdsm.api] FINISH inappropriateDevices return=None from=::ffff:172.28.43.235,39460, flow_id=64fc1d97, task_id=86eda50c-3031-4f6d-8cc1-ca63b26bf078 (api:52) From the logs on the destination (systemd/machined/messages) we can infer that only the wrapper existed, the actual VM (qemu) was never created and there was never an underlying migration. But I can't find any error or message to confirm this. Version-Release number of selected component (if applicable): rhevm-4.1.10.3-0.1.el7.noarch vdsm-4.19.50-1.el7ev.x86_64 libvirt-daemon-3.2.0-14.el7_4.9.x86_64 How reproducible: Unknown
Seen the same thing a second time, now on a el6 to el7 migration under 3.5 CL. Same thing, it looks like there never was a destination qemu process on the destination, but vdsm keeps monitoring an non-existent migration and there are no errors in the logs. I'll try to get more data (requested libvirt debug) if the customer agrees to continue investigation. vdsm-4.17.43-1.el7ev.noarch libvirt-daemon-3.2.0-14.el7_4.7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
(In reply to Germano Veit Michel from comment #2) > Seen the same thing a second time, now on a el6 to el7 migration under 3.5 > CL. > > Same thing, it looks like there never was a destination qemu process on the > destination, but vdsm keeps monitoring an non-existent migration and there > are no errors in the logs. > > I'll try to get more data (requested libvirt debug) if the customer agrees > to continue investigation. Thanks, setting needinfo. This is very strange... > > vdsm-4.17.43-1.el7ev.noarch > libvirt-daemon-3.2.0-14.el7_4.7.x86_64 > qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
Rebooted the host and the problem went away. Feels like qemu doesn't start and vdsm doesn't pick it. Libvirt problably involved as it is between both. So no more data about this. As this is quite weird, I don't think we have data to solve this now. If you want to close as insufficient data, that's fine. Since I already saw this twice, it will probably happen again at some point, hopefully with better logs. Then we can continue.
BZ<2>Jira Resync