Bug 1572451 - Migration was stuck, apparently no qemu process on the destination.
Summary: Migration was stuck, apparently no qemu process on the destination.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-27 03:37 UTC by Germano Veit Michel
Modified: 2020-08-03 15:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-18 09:33:07 UTC
oVirt Team: Virt
Target Upstream Version:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)

Description Germano Veit Michel 2018-04-27 03:37:15 UTC
Description of problem:

Apparently the underlying migration never started, there was no qemu process on the destination (just wrapper), but vdsm was still monitoring the migration on the source.

SOURCE:
2018-04-19 23:09:40,372-0500 INFO  (jsonrpc/5) [vdsm.api] START migrate(params={u'incomingLimit': 2, u'src': u'rs-rhev-h-2', u'dstqemu': u'172.28.43.240', u'autoConverge': u'true', u'tunneled': u'false', u'enableGuestEvents': True, u'dst': u'rs-rhev-h-5:54321', u'convergenceSchedule': {u'init': [{u'params': [u'100'], u'name': u'setDowntime'}], u'stalling': [{u'action': {u'params': [u'150'], u'name': u'setDowntime'}, u'limit': 1}, {u'action': {u'params': [u'200'], u'name': u'setDowntime'}, u'limit': 2}, {u'action': {u'params': [u'300'], u'name': u'setDowntime'}, u'limit': 3}, {u'action': {u'params': [u'400'], u'name': u'setDowntime'}, u'limit': 4}, {u'action': {u'params': [u'500'], u'name': u'setDowntime'}, u'limit': 6}, {u'action': {u'params': [], u'name': u'abort'}, u'limit': -1}]}, u'vmId': u'0bda3dbe-fdaa-4213-ac86-939388c328eb', u'abortOnError': u'true', u'outgoingLimit': 2, u'compressed': u'false', u'maxBandwidth': 10000, u'method': u'online', 'mode': 'remote'}) from=::ffff:172.28.43.235,49396, flow_id=d9c63f11-5cc9-4723-859f-d37875799bb7 (api:46)
2018-04-19 23:09:43,917-0500 INFO  (migsrc/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Creation of destination VM took: 3 seconds (migration:455)                        
2018-04-19 23:09:43,917-0500 INFO  (migsrc/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') starting migration to qemu+tls://rs-rhev-h-5/system with miguri tcp://172.28.43.240 (migration:480)
2018-04-19 23:09:53,921-0500 INFO  (migmon/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Migration Progress: 10 seconds elapsed, 0% of data processed, total data: 0MB, processed data: 0MB, remaining data: 0MB, transfer speed 0MBps, zero pages: 0MB, compressed: 0MB, dirty rate: 0, memory iteration: 0 (migration:815)

..... until ....

2018-04-23 12:45:00,759-0500 INFO  (migmon/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Migration Progress: 308116 seconds elapsed, 0% of data processed, total data: 0MB, processed data: 0MB, remaining data: 0MB, transfer speed 0MBps, zero pages: 0MB, compressed: 0MB, dirty rate: 0, memory iteration: 0 (migration:815) 


DESTINATION:
2018-04-19 23:09:40,709-0500 INFO  (vm/0bda3dbe) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') VM wrapper has started (vm:1968)                                                      

2018-04-19 23:09:40,734-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path:  (clientIF:380)
2018-04-19 23:09:41,391-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/9b2ab9b3-93e7-4ba8-bfdd-7d11f3946b24/bfb17f01-24e7-4964-8dfa-07b2b72471c2 (clientIF:380)
2018-04-19 23:09:41,790-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/80f95b1f-0fbe-4e4c-ae6f-38a830c94f48/bc1452c2-c7bc-42e9-b708-723dd710dbee (clientIF:380)
2018-04-19 23:09:42,285-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/628520c2-f7b8-4c49-b588-f1523c8e1b35/70a359c2-52d9-42fe-9d5e-e0c063f34527 (clientIF:380)
2018-04-19 23:09:42,581-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/e3588e99-481e-47e3-b9d0-13b50b45a1e1/290f010a-47ca-4e6b-ad5f-e64fadab6dcd (clientIF:380)
2018-04-19 23:09:43,311-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/46252ec0-f4b5-46ec-a2a4-433830121463/cb38b7c6-1484-4d43-97aa-0bbbc5c6abe1 (clientIF:380)
2018-04-19 23:09:43,773-0500 INFO  (vm/0bda3dbe) [vds] prepared volume path: /rhev/data-center/00000001-0001-0001-0001-000000000311/f7749f65-a6b3-44f1-9840-ef21fa11a98b/images/739aae87-730a-4cae-9079-dd41d9ff315c/1188c234-e09e-4290-bc28-a8e12f4067e0 (clientIF:380)
                                                                                                 
2018-04-19 23:09:43,885-0500 INFO  (jsonrpc/7) [root]  (hooks:108)                                                                                                                             

2018-04-19 23:09:43,912-0500 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate succeeded in 3.23 seconds (__init__:539)  

2018-04-19 23:18:15,484-0500 INFO  (jsonrpc/5) [throttled] Current getAllVmStats: {u'0bda3dbe-fdaa-4213-ac86-939388c328eb': 'Migration Destination', u'e45c70fe-9e35-46ae-91a9-f1b128e4820e': 'Migration Destination'} (throttledlog:105) 

.... until ....

2018-04-23 13:50:12,541-0500 INFO  (jsonrpc/2) [throttled] Current getAllVmStats: {u'0bda3dbe-fdaa-4213-ac86-939388c328eb': 'Migration Destination', u'e45c70fe-9e35-46ae-91a9-f1b128e4820e': 'Migration Destination'} (throttledlog:105) 

2018-04-23 13:50:47,326-0500 INFO  (jsonrpc/4) [vdsm.api] START destroy(gracefulAttempts=1) from=::ffff:172.28.43.235,39460, flow_id=64fc1d97 (api:46)                                         
2018-04-23 13:50:47,327-0500 INFO  (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Release VM resources (vm:4298)                                                          
2018-04-23 13:50:47,330-0500 INFO  (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Stopping connection (guestagent:430)      
2018-04-23 13:50:47,338-0500 INFO  (jsonrpc/4) [virt.vm] (vmId='0bda3dbe-fdaa-4213-ac86-939388c328eb') Stopping connection (guestagent:430)                                                    
2018-04-23 13:50:47,338-0500 WARN  (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/0bda3dbe-fdaa-4213-ac86-939388c328eb.com.redhat.rhevm.vdsm already removed (utils:120)              
2018-04-23 13:50:47,339-0500 WARN  (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/0bda3dbe-fdaa-4213-ac86-939388c328eb.org.qemu.guest_agent.0 already removed (utils:120)             
2018-04-23 13:50:47,340-0500 INFO  (jsonrpc/4) [vdsm.api] START inappropriateDevices(thiefId=u'0bda3dbe-fdaa-4213-ac86-939388c328eb') from=::ffff:172.28.43.235,39460, flow_id=64fc1d97, task_id=86eda50c-3031-4f6d-8cc1-ca63b26bf078 (api:46)                                                                                                 
2018-04-23 13:50:47,341-0500 INFO  (jsonrpc/4) [vdsm.api] FINISH inappropriateDevices return=None from=::ffff:172.28.43.235,39460, flow_id=64fc1d97, task_id=86eda50c-3031-4f6d-8cc1-ca63b26bf078 (api:52) 

From the logs on the destination (systemd/machined/messages) we can infer that only the wrapper existed, the actual VM (qemu) was never created and there was never an underlying migration. But I can't find any error or message to confirm this.

Version-Release number of selected component (if applicable):
rhevm-4.1.10.3-0.1.el7.noarch
vdsm-4.19.50-1.el7ev.x86_64
libvirt-daemon-3.2.0-14.el7_4.9.x86_64

How reproducible:
Unknown

Comment 2 Germano Veit Michel 2018-05-11 00:48:29 UTC
Seen the same thing a second time, now on a el6 to el7 migration under 3.5 CL.

Same thing, it looks like there never was a destination qemu process on the destination, but vdsm keeps monitoring an non-existent migration and there are no errors in the logs.

I'll try to get more data (requested libvirt debug) if the customer agrees to continue investigation.

vdsm-4.17.43-1.el7ev.noarch
libvirt-daemon-3.2.0-14.el7_4.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64

Comment 3 Yaniv Kaul 2018-05-17 13:48:38 UTC
(In reply to Germano Veit Michel from comment #2)
> Seen the same thing a second time, now on a el6 to el7 migration under 3.5
> CL.
> 
> Same thing, it looks like there never was a destination qemu process on the
> destination, but vdsm keeps monitoring an non-existent migration and there
> are no errors in the logs.
> 
> I'll try to get more data (requested libvirt debug) if the customer agrees
> to continue investigation.

Thanks, setting needinfo.
This is very strange...

> 
> vdsm-4.17.43-1.el7ev.noarch
> libvirt-daemon-3.2.0-14.el7_4.7.x86_64
> qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64

Comment 4 Germano Veit Michel 2018-05-17 23:53:03 UTC
Rebooted the host and the problem went away. Feels like qemu doesn't start and vdsm doesn't pick it. Libvirt problably involved as it is between both.

So no more data about this. As this is quite weird, I don't think we have data to  solve this now.

If you want to close as insufficient data, that's fine. Since I already saw this twice, it will probably happen again at some point, hopefully with better logs. Then we can continue.

Comment 5 Franta Kust 2019-05-16 13:09:09 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.