Bug 1566594

Summary: migration failure error when migration succeeds
Product: [oVirt] ovirt-engine Reporter: Dafna Ron <dron>
Component: Backend.CoreAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: bugs, dron, michal.skrivanek
Target Milestone: ---Flags: dron: planning_ack?
dron: devel_ack?
dron: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-20 12:28:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2018-04-12 15:09:20 UTC
Created attachment 1420889 [details]
logs

We are seeing a race condition in OST which causes a failure in test 006_migrations.migrate_vm

From the logs, I think that the issue seems to be that when a vm is migrated, at some point, engine seems to fail metrics collection and assume the migration failed. 
it re-calls the migration and gets an "already exists" error which is failing the test. 
I am not sure if this only effects the test or if the user will also see issues in the UI (Assuming it will), but in any case, I think that we should check what has changed in collection which can fail this test. 

here is a link to one of the failed jobs: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1677/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/

Comment 1 Dafna Ron 2018-04-12 15:15:19 UTC
2018-04-12 08:05:04,744-0400 INFO  (jsonrpc/0) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5acb0866-f66c-4071-a737-f766e720b948': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/11d01bac-fe22-42c2-b33a-ed0a63af8033/images/86834940-1070-4b17-8cf8-3afa03c21566/df490fff-ffb1-420b-9cce-71cbbc292136', 'name': 'vda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/f9f8e933-9154-41a5-ad82-2ecc81e51805/images/f5329064-bf9d-475e-9468-79e65907b670/64aa2961-2379-46b7-8a37-fe040a990a83', 'name': 'vdb'}]}}} from=::1,54678 (api:52)
2018-04-12 08:05:04,744-0400 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:573)
2018-04-12 08:05:11,357-0400 INFO  (monitor/b5e6b24) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:733)
2018-04-12 08:05:12,994-0400 ERROR (periodic/1) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


2018-04-12 08:06:26,148-0400 ERROR (periodic/0) [virt.vm] (vmId='5acb0866-f66c-4071-a737-f766e720b948') Failed to set block threshold on 'vdb' (/rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/f9f8e933-9154-41a5-ad82-2ecc81e51805/images/f5329064-bf9d-475e-9468-79e65907b670/6d468235-6cd4-403e-8253-5899cadff482): Operation not supported: threshold currently can't be set for block device 'vdb' (drivemonitor:120)

2018-04-12 08:11:48,336-0400 ERROR (jsonrpc/6) [api] FINISH create error=Virtual machine already exists (api:129)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create
    raise exception.VMExists()
VMExists: Virtual machine already exists
2018-04-12 08:11:48,338-0400 INFO  (jsonrpc/6) [api.virt] FINISH create return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.200.3,57506, vmId=5acb0866-f66c-4071-a737-f766e720b948 (api:52)
2018-04-12 08:11:48,339-0400 INFO  (jsonrpc/6) [api.virt] FINISH migrationCreate return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.200.3,57506, vmId=5acb0866-f66c-4071-a737-f766e720b948 (api:52)

Comment 2 Michal Skrivanek 2018-04-16 11:57:21 UTC
IIUC this should be solved by https://gerrit.ovirt.org/#/c/90166/, please confirm

Comment 3 Dafna Ron 2018-04-16 12:30:17 UTC
Please leave open until the end of the week. 
since the failures were random I would like it to run for a few days to make sure that the issue is resolved.

Comment 4 Michal Skrivanek 2018-04-20 11:32:31 UTC
(In reply to Dafna Ron from comment #3)
> Please leave open until the end of the week. 
> since the failures were random I would like it to run for a few days to make
> sure that the issue is resolved.

sure

Comment 5 Dafna Ron 2018-04-20 12:28:39 UTC
Martin, I cannot be sure since we have been having some issues with the CQ alerts but I have not seen migration related issues since the merge. 

lets close this as not a bug since Gal's fix is not related to your component