Bug 1981307 - Skip setting disk thresholds on the destination host during migration
Summary: Skip setting disk thresholds on the destination host during migration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.80
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.8
: ---
Assignee: Vojtech Juranek
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-12 11:00 UTC by Vojtech Juranek
Modified: 2021-11-04 19:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-19 06:23:16 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 115643 0 master MERGED virt: don't raise when setting disk threshold and VM is not running 2021-07-15 13:48:23 UTC

Description Vojtech Juranek 2021-07-12 11:00:50 UTC
Description of problem:
Recently we added refreshing the disk on the destination host during migration when disk is extended on the host and setting new disk thresholds (see BZ #1883399). However, if the VM is not started on the destination host, setting disk threshold fails and corresponding error in the vdsm log can be confusing. On the destination, do only disk refresh and skip setting thresholds during migration. Once migration is finished, set new thresholds for all disks on the destination host.

Comment 1 Nir Soffer 2021-07-12 15:05:45 UTC
It would be useful to show here the errors we get during migration, and
explain how to reproduce this issue. I think the steps to reproduce are
already in bug 1883399.

Comment 2 Vojtech Juranek 2021-07-13 14:03:17 UTC
Example exception, taken from BZ #1883399, reproducer is there as well (or extracted in also in BZ #1981079):

2021-07-08 17:19:38,852+0300 ERROR (jsonrpc/1) [api] FINISH refresh_disk error=Failed to refresh drive: {'vm_id': '2eb53807-6c20-4976-9a8b-31531b5fdef1', 'drive': <Drive name=sda, type=block, path=/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/10883a68-06ec-460b-a115-d23260482fa9/6f350411-34ce-4b13-8214-17aa0ea829b9 threshold=unset at 0x7f0604788da0>, 'reason': "VM '2eb53807-6c20-4976-9a8b-31531b5fdef1' was not defined yet or was undefined"} (api:131)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1415, in refresh_disk
    self._update_drive_volume_size(drive, vol_size)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1607, in _update_drive_volume_size
    self.drive_monitor.set_threshold(drive, volsize.apparentsize)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/drivemonitor.py", line 121, in set_threshold
    self._vm._dom.setBlockThreshold(drive.name, threshold)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 50, in __getattr__
    % self.vmid)
vdsm.virt.virdomain.NotConnectedError: VM '2eb53807-6c20-4976-9a8b-31531b5fdef1' was not defined yet or was undefined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 795, in refresh_disk
    return self.vm.refresh_disk(drive)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1425, in refresh_disk
    drive=drive)
vdsm.common.exception.DriveRefreshError: Failed to refresh drive: {'vm_id': '2eb53807-6c20-4976-9a8b-31531b5fdef1', 'drive': <Drive name=sda, type=block, path=/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/10883a68-06ec-460b-a115-d23260482fa9/6f350411-34ce-4b13-8214-17aa0ea829b9 threshold=unset at 0x7f0604788da0>, 'reason': "VM '2eb53807-6c20-4976-9a8b-31531b5fdef1' was not defined yet or was undefined"}

Comment 3 Nir Soffer 2021-07-14 11:09:03 UTC
This issue can cause migration failure or lead to data corruption.

Comment 4 Nir Soffer 2021-07-15 23:11:59 UTC
Based on analysts from https://bugzilla.redhat.com/1883399#c82
this bug is likely to cause migration failure, but not data
corruption, because the failure happens after the volume was
extended on the destination host.

Lowering severity since failed migration can be retried without
downtime.

But since this is a regression introduced by the fix for bug 1883399
and the fix is already merged, there is to delay this bug.

If needed, we can back port the trivial fix to 4.4.7.

Comment 6 sshmulev 2021-07-25 10:31:02 UTC
According to the steps to reproduce from https://bugzilla.redhat.com/show_bug.cgi?id=1883399#c88, are there any  additional important facts during this test,
for example: 
1) Should the migration be from the SPM host or to the SPM host or it doesn't matter?
2) Should the dest host be with the latest version that supports the refresh or not?

Vojtech, please add here these notes or any other important remarks that I didn't mention here.

Comment 7 Vojtech Juranek 2021-07-26 07:51:24 UTC
(In reply to sshmulev from comment #6)
> According to the steps to reproduce from
> https://bugzilla.redhat.com/show_bug.cgi?id=1883399#c88, are there any 
> additional important facts during this test,
> for example: 
> 1) Should the migration be from the SPM host or to the SPM host or it
> doesn't matter?

doesn't matter

> 2) Should the dest host be with the latest version that supports the refresh
> or not?

yes, destination host has to support disk refresh (and source as well, otherwise it won't even try to refresh it on the destination upon disk extension)

> Vojtech, please add here these notes or any other important remarks that I
> didn't mention here.

nothing else comes to my mind

Comment 8 sshmulev 2021-07-26 11:56:11 UTC
After testing with the script from here: https://bugzilla.redhat.com/show_bug.cgi?id=1883399#c8, the migration succeeded and refresh was on the dest from the beginning of the migration, although when I tried to test it with the old script, I found a new problem which caused the VM to pause.

From the UI log, it was reported so: "VM New_VM has been paused due to no Storage space error."

This is the exception I got in the dest host:
2021-07-26 13:58:01,498+0300 ERROR (vm/427a6cb7) [virt.vm] (vmId='427a6cb7-8e29-4ff2-bca6-3fa1cf512085') The vm start process failed (vm:991)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 918, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2861, in _run
    self._devices = self._make_devices()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2741, in _make_devices
    disk_objs = self._perform_host_local_adjustment()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2779, in _perform_host_local_adjustment
    self._preparePathsForDrives(disk_params)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1100, in _preparePathsForDrives
    drive, self.id, path=path
  File "/usr/lib/python3.6/site-packages/vdsm/clientIF.py", line 430, in prepareVolumePath
    raise vm.VolumeError(drive)
vdsm.virt.vm.VolumeError: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'block', 'specParams': {'pinToIoThread': '1'}, 'alias': 'ua-98f25d38-2d09-4efb-997d-60c865878f5e', 'address': {'type': 'pci', 'domain': '0x0000', 'bus': '0x05', 'slot': '0x00', 'function': '0x0'}, 'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'guestName': '/dev/vda', 'imageID': '98f25d38-2d09-4efb-997d-60c865878f5e', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'volumeID': '663104c5-6cba-49a9-9996-240283ffaa33', 'managed': False, 'volumeChain': [{'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'imageID': '98f25d38-2d09-4efb-997d-60c865878f5e', 'leaseOffset': 112197632, 'leasePath': '/dev/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/leases', 'path': '/rhev/data-center/mnt/blockSD/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/images/98f25d38-2d09-4efb-997d-60c865878f5e/663104c5-6cba-49a9-9996-240283ffaa33', 'volumeID': '663104c5-6cba-49a9-9996-240283ffaa33'}], 'path': '/rhev/data-center/mnt/blockSD/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/images/98f25d38-2d09-4efb-997d-60c865878f5e/663104c5-6cba-49a9-9996-240283ffaa33', 'discard': False, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'virtio', 'name': 'vda', 'bootOrder': '1', 'serial': '98f25d38-2d09-4efb-997d-60c865878f5e', 'index': 0, 'reqsize': '0', 'truesize': '10871635968', 'apparentsize': '10871635968'}
2021-07-26 13:58:01,498+0300 INFO  (vm/427a6cb7) [virt.vm] (vmId='427a6cb7-8e29-4ff2-bca6-3fa1cf512085') Changed state to Down: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'block', 'specParams': {'pinToIoThread': '1'}, 'alias': 'ua-98f25d38-2d09-4efb-997d-60c865878f5e', 'address': {'type': 'pci', 'domain': '0x0000', 'bus': '0x05', 'slot': '0x00', 'function': '0x0'}, 'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'guestName': '/dev/vda', 'imageID': '98f25d38-2d09-4efb-997d-60c865878f5e', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'volumeID': '663104c5-6cba-49a9-9996-240283ffaa33', 'managed': False, 'volumeChain': [{'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'imageID': '98f25d38-2d09-4efb-997d-60c865878f5e', 'leaseOffset': 112197632, 'leasePath': '/dev/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/leases', 'path': '/rhev/data-center/mnt/blockSD/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/images/98f25d38-2d09-4efb-997d-60c865878f5e/663104c5-6cba-49a9-9996-240283ffaa33', 'volumeID': '663104c5-6cba-49a9-9996-240283ffaa33'}], 'path': '/rhev/data-center/mnt/blockSD/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/images/98f25d38-2d09-4efb-997d-60c865878f5e/663104c5-6cba-49a9-9996-240283ffaa33', 'discard': False, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'virtio', 'name': 'vda', 'bootOrder': '1', 'serial': '98f25d38-2d09-4efb-997d-60c865878f5e', 'index': 0, 'reqsize': '0', 'truesize': '10871635968', 'apparentsize': '10871635968'} (code=1) (vm:1911)


......'refresh_disk_supported': True, 'netConfigDirty': 'False', 'openstack_binding_host_ids': {'OVIRT_PROVIDER_OVN': 'f18b0538-c8ea-457f-8a57-2c9679df6423'}}} from=::ffff:10.46.12.145,41176 (api:54)
2021-07-26 13:58:12,348+0300 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getCapabilities took more than 1.00 seconds to succeed: 2.66 (__init__:316)
2021-07-26 13:58:12,355+0300 INFO  (jsonrpc/5) [api.virt] START refresh_disk(drive={'device': 'disk', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'imageID': '25b03192-dc71-429d-9d28-f2b49dc74413', 'volumeID': 'c52a818e-52e8-46fe-8ff4-88e884ec20f7'}) from=::ffff:10.46.12.145,41176, vmId=427a6cb7-8e29-4ff2-bca6-3fa1cf512085 (api:48)
2021-07-26 13:58:12,355+0300 INFO  (jsonrpc/5) [api] FINISH refresh_disk error=Virtual machine does not exist: {'vmId': '427a6cb7-8e29-4ff2-bca6-3fa1cf512085'} (api:129)
2021-07-26 13:58:12,355+0300 INFO  (jsonrpc/5) [api.virt] FINISH refresh_disk return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '427a6cb7-8e29-4ff2-bca6-3fa1cf512085'}"}} from=::ffff:10.46.12.145,41176, vmId=427a6cb7-8e29-4ff2-bca6-3fa1cf512085 (api:54)
2021-07-26 13:58:12,355+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.refresh_disk failed (error 1) in 0.00 seconds (__init__:312)
2021-07-26 13:58:16,928+0300 INFO  (jsonrpc/6) [api.host] START getStats() from=::ffff:10.46.12.144,48386 (api:48)



Another exception occurred in earlier tries:
2021-07-26 11:29:09,179+0300 ERROR (migsrc/427a6cb7) [virt.vm] (vmId='427a6cb7-8e29-4ff2-bca6-3fa1cf512085') Failed to migrate (migration:503)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 477, in _regular_run
    time.time(), machineParams
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 578, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 667, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 596, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 159, in call
    return getattr(self._vm._dom, name)(*a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2119, in migrateToURI3
    raise libvirtError('virDomainMigrateToURI3() failed')
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-07-26T08:29:07.074391Z qemu-kvm: check_section_footer: Read section footer failed: -5
2021-07-26T08:29:07.074965Z qemu-kvm: load of migration failed: Invalid argument



This is the script I used on the guest:
for i in $(seq -w 11); do
    echo "$(date '+%H:%M:%S.%N')  writing..."
    dd if=/dev/zero bs=1M count=128 of=data.$i oflag=direct conv=fsync
    echo "$(date '+%H:%M:%S.%N')  waiting..."
    sleep 6
done

(before I ran it I changed the volume-utilizaion cof file on both vdsms:
vi /etc/vdsm/vdsm.conf.d/volume-utilization.conf
    [irs]
    volume_utilization_percent = 50
    volume_utilization_chunk_mb = 128
)

Comment 10 Nir Soffer 2021-07-26 12:03:17 UTC
(In reply to sshmulev from comment #8)
...
> This is the script I used on the guest:
> for i in $(seq -w 11); do
>     echo "$(date '+%H:%M:%S.%N')  writing..."
>     dd if=/dev/zero bs=1M count=128 of=data.$i oflag=direct conv=fsync

This is likey to cause the vm to pause.

>     echo "$(date '+%H:%M:%S.%N')  waiting..."
>     sleep 6
> done

A more robust way would be to write the first 64 MiB, which would trigger
an extend, and then wait to give the system time to extend without pausing.
Then write the next 64 MiB.

See how this done in the attached scripts in the related bugs.

Comment 11 sshmulev 2021-07-26 13:36:58 UTC
(In reply to Nir Soffer from comment #10)
> (In reply to sshmulev from comment #8)
> ...
> > This is the script I used on the guest:
> > for i in $(seq -w 11); do
> >     echo "$(date '+%H:%M:%S.%N')  writing..."
> >     dd if=/dev/zero bs=1M count=128 of=data.$i oflag=direct conv=fsync
> 
> This is likey to cause the vm to pause.
> 
> >     echo "$(date '+%H:%M:%S.%N')  waiting..."
> >     sleep 6
> > done
> 
> A more robust way would be to write the first 64 MiB, which would trigger
> an extend, and then wait to give the system time to extend without pausing.
> Then write the next 64 MiB.
> 
> See how this done in the attached scripts in the related bugs.


Did it as you just suggested but still get this error log and it fails the migration.
Used the Reproducer script from bug1883399 (just changed there 64 instead of 512).
(Noticed that it happens when the dest is the SPM)

2021-07-26 16:21:33,149+0300 DEBUG (check/loop) [storage.check] START check '/dev/870bbec7-f7b0-4955-883c-52428822a19a/metadata' (delay=0.00) (check:289)
2021-07-26 16:21:33,200+0300 DEBUG (check/loop) [storage.asyncevent] Process <subprocess.Popen object at 0x7fa4e02b5e48> terminated (count=1) (asyncevent:660)
2021-07-26 16:21:33,201+0300 DEBUG (check/loop) [storage.check] FINISH check '/dev/870bbec7-f7b0-4955-883c-52428822a19a/metadata' (rc=0, elapsed=0.05) (check:359)
2021-07-26 16:21:33,224+0300 INFO  (migsrc/0a110347) [virt.vm] (vmId='0a110347-2076-4872-9a9c-5a1d6f27505a') Switching from State.STARTED to State.FAILED (migration:234)
2021-07-26 16:21:33,226+0300 ERROR (migsrc/0a110347) [virt.vm] (vmId='0a110347-2076-4872-9a9c-5a1d6f27505a') Failed to migrate (migration:503)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 477, in _regular_run
    time.time(), machineParams
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 578, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 667, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 596, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 159, in call
    return getattr(self._vm._dom, name)(*a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2119, in migrateToURI3
    raise libvirtError('virDomainMigrateToURI3() failed')
libvirt.libvirtError: operation aborted: migration out job: canceled by client

Comment 12 Nir Soffer 2021-07-26 14:17:46 UTC
(In reply to sshmulev from comment #11)
...
> Did it as you just suggested but still get this error log and it fails the
> migration.
>
> Used the Reproducer script from bug 1883399 (just changed there 64 instead of
> 512).
> (Noticed that it happens when the dest is the SPM)
...
> migrateToURI3
>     raise libvirtError('virDomainMigrateToURI3() failed')
> libvirt.libvirtError: operation aborted: migration out job: canceled by
> client

This happens when stressing the migration too much and it take too much
time. We want to test here that refreshes do not fail on the destination
so there is no reason to stress the system too much. Increase the sleep
so the migration does not fail.

There is a better reproducer here:
https://bugzilla.redhat.com/show_bug.cgi?id=1983882#c3

It stop the stress process right after the extend, so it less likely
to cause trouble.

Comment 13 sshmulev 2021-07-26 15:47:17 UTC
(In reply to Nir Soffer from comment #12)
> Thanks Nir,

Verified it with the following results:

Versions: 
vdsm-4.40.70.2-1.el8ev.x86_64
ovirt-engine-4.4.8.1-0.9.el8ev

Source:
2021-07-26 18:06:58,938+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Migration semaphore: acquiring (migration:460)
2021-07-26 18:06:58,938+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Migration semaphore: acquired (migration:462)
2021-07-26 18:06:58,939+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Switching from State.INITIALIZED to State.PREPARED (migration:234)
2021-07-26 18:06:58,978+0300 INFO  (libvirt/events) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Block threshold 1946157056 exceeded by 720896 for drive 'sda' (/rhev/data-center/mnt/blockSD/870bbec7-f7b0-4955-883c-52428822a19a/images/268e8fd4-7350-4cdf-b536-efad23eb28ad/d22724aa-99f9-492a-a642-6832e403d16d) (drivemonitor:163)

2021-07-26 18:06:59,650+0300 INFO  (periodic/0) [vdsm.api] START sendExtendMsg(spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', volDict={'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'internal': False, 'name': 'sda', 'newSize': 2147483648, 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=2147483648, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7f1644325518>>) from=internal, task_id=36fa19de-ee22-4a9f-91d6-ef933bca458b (api:48)

2021-07-26 18:07:01,009+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Creation of destination VM took: 2 seconds (migration:526)
2021-07-26 18:07:01,009+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Switching from State.PREPARED to State.STARTED (migration:234)
2021-07-26 18:07:01,009+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') starting migration to ....

2021-07-26 18:07:01,707+0300 INFO  (mailbox-hsm/4) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Volume sda (domainID: 870bbec7-f7b0-4955-883c-52428822a19a, volumeID: d22724aa-99f9-492a-a642-6832e403d16d) was extended during migration, refreshing it on destination host. (vm:1444)

021-07-26 18:07:03,512+0300 INFO  (jsonrpc/5) [api.virt] START refresh_disk(drive={'device': 'disk', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d'}) from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:48)
2021-07-26 18:07:03,512+0300 INFO  (jsonrpc/5) [vdsm.api] START refreshVolume(sdUUID='870bbec7-f7b0-4955-883c-52428822a19a', spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', imgUUID='268e8fd4-7350-4cdf-b536-efad23eb28ad', volUUID='d22724aa-99f9-492a-a642-6832e403d16d') from=::ffff:10.46.12.147,36810, task_id=d3031c8e-621b-4476-a716-f7cd356ad554 (api:48)
2021-07-26 18:07:03,673+0300 INFO  (jsonrpc/5) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.147,36810, task_id=d3031c8e-621b-4476-a716-f7cd356ad554 (api:54)
2021-07-26 18:07:03,677+0300 INFO  (jsonrpc/5) [api.virt] FINISH refresh_disk return={'result': {'apparentsize': '2147483648', 'truesize': '2147483648'}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:54)

2021-07-26 18:07:05,777+0300 INFO  (mailbox-hsm/0) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Volume sda (domainID: 870bbec7-f7b0-4955-883c-52428822a19a, volumeID: d22724aa-99f9-492a-a642-6832e403d16d) was extended during migration, refreshing it on destination host. (vm:1444)

2021-07-26 18:07:05,780+0300 INFO  (jsonrpc/1) [api.virt] START refresh_disk(drive={'device': 'disk', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d'}) from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:48)
2021-07-26 18:07:05,781+0300 INFO  (jsonrpc/1) [vdsm.api] START refreshVolume(sdUUID='870bbec7-f7b0-4955-883c-52428822a19a', spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', imgUUID='268e8fd4-7350-4cdf-b536-efad23eb28ad', volUUID='d22724aa-99f9-492a-a642-6832e403d16d') from=::ffff:10.46.12.147,36810, task_id=ec03592a-c409-4704-9a97-398d54040c5c (api:48)

2021-07-26 18:07:06,058+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.147,36810, task_id=ec03592a-c409-4704-9a97-398d54040c5c (api:54)
2021-07-26 18:07:06,062+0300 INFO  (jsonrpc/1) [api.virt] FINISH refresh_disk return={'result': {'apparentsize': '2147483648', 'truesize': '2147483648'}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:54)

021-07-26 18:07:11,037+0300 INFO  (migmon/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Migration Progress: 10.016 seconds elapsed, 32% of data processed, total data: 1096MB, processed data: 327MB, remaining data: 754MB, transfer speed 412Mbps, zero pages: 3791MB, compressed: 0MB, dirty rate: 0, memory iteration: 1 (migration:886)

2021-07-26 18:07:21,726+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') migration took 22 seconds to complete (migration:580)
2021-07-26 18:07:21,726+0300 INFO  (migsrc/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Changed state to Down: Migration succeeded (code=4) (vm:1911)

This is known issue (https://bugzilla.redhat.com/show_bug.cgi?id=1981079)
2021-07-26 18:07:21,728+0300 ERROR (mailbox-hsm/4) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Failed to set block threshold for drive 'sda[1]': Requested operation is not valid: domain is not running (drivemonitor:122)
2021-07-26 18:07:21,728+0300 ERROR (mailbox-hsm/4) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') cannot cont while Down (vm:1725)



Destination:
021-07-26 18:07:03,512+0300 INFO  (jsonrpc/5) [api.virt] START refresh_disk(drive={'device': 'disk', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d'}) from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:48)
2021-07-26 18:07:03,512+0300 INFO  (jsonrpc/5) [vdsm.api] START refreshVolume(sdUUID='870bbec7-f7b0-4955-883c-52428822a19a', spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', imgUUID='268e8fd4-7350-4cdf-b536-efad23eb28ad', volUUID='d22724aa-99f9-492a-a642-6832e403d16d') from=::ffff:10.46.12.147,36810, task_id=d3031c8e-621b-4476-a716-f7cd356ad554 (api:48)
2021-07-26 18:07:03,673+0300 INFO  (jsonrpc/5) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.147,36810, task_id=d3031c8e-621b-4476-a716-f7cd356ad554 (api:54)
2021-07-26 18:07:03,677+0300 INFO  (jsonrpc/5) [api.virt] FINISH refresh_disk return={'result': {'apparentsize': '2147483648', 'truesize': '2147483648'}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:54)
2021-07-26 18:07:05,780+0300 INFO  (jsonrpc/1) [api.virt] START refresh_disk(drive={'device': 'disk', 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d'}) from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:48)
2021-07-26 18:07:05,781+0300 INFO  (jsonrpc/1) [vdsm.api] START refreshVolume(sdUUID='870bbec7-f7b0-4955-883c-52428822a19a', spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', imgUUID='268e8fd4-7350-4cdf-b536-efad23eb28ad', volUUID='d22724aa-99f9-492a-a642-6832e403d16d') from=::ffff:10.46.12.147,36810, task_id=ec03592a-c409-4704-9a97-398d54040c5c (api:48)
2021-07-26 18:07:06,058+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.147,36810, task_id=ec03592a-c409-4704-9a97-398d54040c5c (api:54)
2021-07-26 18:07:06,062+0300 INFO  (jsonrpc/1) [api.virt] FINISH refresh_disk return={'result': {'apparentsize': '2147483648', 'truesize': '2147483648'}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:10.46.12.147,36810, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:54)
2021-07-26 18:06:59,650+0300 INFO  (periodic/0) [vdsm.api] START sendExtendMsg(spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', volDict={'domainID': '870bbec7-f7b0-4955-883c-52428822a19a', 'imageID': '268e8fd4-7350-4cdf-b536-efad23eb28ad', 'internal': False, 'name': 'sda', 'newSize': 2147483648, 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'volumeID': 'd22724aa-99f9-492a-a642-6832e403d16d', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=2147483648, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7f1644325518>>) from=internal, task_id=36fa19de-ee22-4a9f-91d6-ef933bca458b (api:48)

2021-07-26 18:07:21,942+0300 INFO  (periodic/231) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Setting block threshold to 10804527104 bytes for drive 'vda[2]' apparentsize 10871635968 (drivemonitor:111)
2021-07-26 18:07:21,963+0300 INFO  (periodic/231) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') Setting block threshold to 2080374784 bytes for drive 'sda[1]' apparentsize 2147483648 (drivemonitor:111)

2021-07-26 18:07:20,300+0300 INFO  (vm/397b45a9) [virt.vm] (vmId='397b45a9-c159-4f5d-8f0c-aa73cb985c5b') End of migration (vm:4090)
2021-07-26 18:07:21,322+0300 INFO  (jsonrpc/7) [api.virt] START getMigrationStatus() from=::ffff:10.46.12.144,48386, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:48)
2021-07-26 18:07:21,322+0300 INFO  (jsonrpc/7) [api.virt] FINISH getMigrationStatus return={'status': {'code': 0, 'message': 'Done'}, 'migrationStats': {'status': {'code': 0, 'message': 'Migration in progress'}, 'progress': 0}} from=::ffff:10.46.12.144,48386, vmId=397b45a9-c159-4f5d-8f0c-aa73cb985c5b (api:54)

Comment 14 Sandro Bonazzola 2021-08-19 06:23:16 UTC
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.