Bug 1566594 - migration failure error when migration succeeds
Summary: migration failure error when migration succeeds
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-12 15:09 UTC by Dafna Ron
Modified: 2018-04-20 12:28 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-04-20 12:28:39 UTC
oVirt Team: Virt
Embargoed:
dron: planning_ack?
dron: devel_ack?
dron: testing_ack?


Attachments (Terms of Use)
logs (18.44 MB, application/x-gzip)
2018-04-12 15:09 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2018-04-12 15:09:20 UTC
Created attachment 1420889 [details]
logs

We are seeing a race condition in OST which causes a failure in test 006_migrations.migrate_vm

From the logs, I think that the issue seems to be that when a vm is migrated, at some point, engine seems to fail metrics collection and assume the migration failed. 
it re-calls the migration and gets an "already exists" error which is failing the test. 
I am not sure if this only effects the test or if the user will also see issues in the UI (Assuming it will), but in any case, I think that we should check what has changed in collection which can fail this test. 

here is a link to one of the failed jobs: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1677/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/

Comment 1 Dafna Ron 2018-04-12 15:15:19 UTC
2018-04-12 08:05:04,744-0400 INFO  (jsonrpc/0) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5acb0866-f66c-4071-a737-f766e720b948': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/11d01bac-fe22-42c2-b33a-ed0a63af8033/images/86834940-1070-4b17-8cf8-3afa03c21566/df490fff-ffb1-420b-9cce-71cbbc292136', 'name': 'vda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/f9f8e933-9154-41a5-ad82-2ecc81e51805/images/f5329064-bf9d-475e-9468-79e65907b670/64aa2961-2379-46b7-8a37-fe040a990a83', 'name': 'vdb'}]}}} from=::1,54678 (api:52)
2018-04-12 08:05:04,744-0400 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:573)
2018-04-12 08:05:11,357-0400 INFO  (monitor/b5e6b24) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:733)
2018-04-12 08:05:12,994-0400 ERROR (periodic/1) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


2018-04-12 08:06:26,148-0400 ERROR (periodic/0) [virt.vm] (vmId='5acb0866-f66c-4071-a737-f766e720b948') Failed to set block threshold on 'vdb' (/rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/f9f8e933-9154-41a5-ad82-2ecc81e51805/images/f5329064-bf9d-475e-9468-79e65907b670/6d468235-6cd4-403e-8253-5899cadff482): Operation not supported: threshold currently can't be set for block device 'vdb' (drivemonitor:120)

2018-04-12 08:11:48,336-0400 ERROR (jsonrpc/6) [api] FINISH create error=Virtual machine already exists (api:129)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create
    raise exception.VMExists()
VMExists: Virtual machine already exists
2018-04-12 08:11:48,338-0400 INFO  (jsonrpc/6) [api.virt] FINISH create return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.200.3,57506, vmId=5acb0866-f66c-4071-a737-f766e720b948 (api:52)
2018-04-12 08:11:48,339-0400 INFO  (jsonrpc/6) [api.virt] FINISH migrationCreate return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.200.3,57506, vmId=5acb0866-f66c-4071-a737-f766e720b948 (api:52)

Comment 2 Michal Skrivanek 2018-04-16 11:57:21 UTC
IIUC this should be solved by https://gerrit.ovirt.org/#/c/90166/, please confirm

Comment 3 Dafna Ron 2018-04-16 12:30:17 UTC
Please leave open until the end of the week. 
since the failures were random I would like it to run for a few days to make sure that the issue is resolved.

Comment 4 Michal Skrivanek 2018-04-20 11:32:31 UTC
(In reply to Dafna Ron from comment #3)
> Please leave open until the end of the week. 
> since the failures were random I would like it to run for a few days to make
> sure that the issue is resolved.

sure

Comment 5 Dafna Ron 2018-04-20 12:28:39 UTC
Martin, I cannot be sure since we have been having some issues with the CQ alerts but I have not seen migration related issues since the merge. 

lets close this as not a bug since Gal's fix is not related to your component


Note You need to log in before you can comment on or make changes to this bug.