Bug 1983882 - Guest data corruption after migration
Summary: Guest data corruption after migration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.80
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.4.9
: 4.40.90.1
Assignee: Vojtech Juranek
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-20 05:15 UTC by Vojtech Juranek
Modified: 2021-12-23 16:33 UTC (History)
6 users (show)

Fixed In Version: vdsm-4.40.90.1
Clone Of:
Environment:
Last Closed: 2021-10-21 07:27:14 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+
pelauter: blocker+
pelauter: planning_ack+


Attachments (Terms of Use)
reproducer (1.42 KB, application/x-shellscript)
2021-07-20 05:15 UTC, Vojtech Juranek
no flags Details
reproducer-logs (22.32 KB, application/gzip)
2021-07-20 05:16 UTC, Vojtech Juranek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 116449 0 master MERGED storage: don't update device mapper size on SPM during LV extend 2021-09-10 08:01:12 UTC
oVirt gerrit 116618 0 ovirt-4.4.z MERGED storage: don't update device mapper size on SPM during LV extend 2021-09-13 07:26:25 UTC

Description Vojtech Juranek 2021-07-20 05:15:38 UTC
Created attachment 1803484 [details]
reproducer

Description of problem:
When the disk of the VM is extended during migration, guest data can be corrupted when the source host is SPM. Fix for BZ #1883399 should prevent data corruption in most of the cases, refreshing first storage on the destination and then on the source, failing the extension when destination host doesn't support disk refresh. However, when the source host is SPM, which does the disk extend, storage is refreshed on the source right after extending the disk, no matter if destination host supports disk refresh or not. As a result, in such case the disk can be refreshed only on the source and guest data can be corrupted after migration if VM wrote to the extended area on source before migration finished.

This was originally reported by Nir in https://bugzilla.redhat.com/show_bug.cgi?id=1883399#c88
Reproducer and logs are also taken from BZ #1883399.



Steps to Reproduce:
1. In the guest, start the script:

[root@f32 ~]# ./reproduce.sh
17:26:27.660876580 Starting stress
Press Enter when migration was started:


2. In engine, start the migration

3. Wait until the migration show first non-zero progress message
   Clicking refresh on engine UI every few seconds helps

4. Press Enter in the guest script to continue the reproducer

17:27:03.735988728 Writing to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.21757 s, 242 MB/s
17:27:05.966949840 Waiting until disk is extended
17:27:16.040376014 Writing more data to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.84322 s, 189 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0197118 s, 53.2 MB/s
17:27:19.041140885 Computing checksum...
17:27:29.890113434 Disk checksum 1a5350b04ce65e0ffac9a10d46f282769865e3db
17:27:29.908828362 Stopping stress
17:27:29.944121095 Computing checksum...
17:27:37.706975020 Disk checksum 1a5350b04ce65e0ffac9a10d46f282769865e3db
17:27:37.708425142 Computing checksum...
17:27:43.754118454 Disk checksum 91d50642dd930e9542c39d36f0516d45f4e1af0d
17:27:43.756070975 Disk was corrupted


The scripts writes data to the disk, triggering extension. Then it writes
more data, including the first MiB after the end of the original disk size.

When writing is done, the script compute the first checksum.

When done, it kills the stress processes so the migration can complete
quickly.

Then the script loops, computing the checksum of the disk. When the migration
is done, the checksum on the destination will be different because the
destination volume was never extended.


Attached log analysis:

## src.log

1. Migration started

2021-07-16 00:26:38,357+0300 INFO  (jsonrpc/5) [api.virt] START migrate(params={'abortOnError': 'true', 'autoConverge': 'true', 'dst': 'host4:54321', 'method': 'online', 'vmId': '9c4050b1-8070-429f-ae36-9eae0935ccab', 'src': 'host3', 'dstqemu': '192.168.122.23', 'convergenceSchedule': {'init': [{'name': 'setDowntime', 'params': ['100']}], 'stalling': [{'limit': 1, 'action': {'name': 'setDowntime', 'params': ['150']}}, {'limit': 2, 'action': {'name': 'setDowntime', 'params': ['200']}}, {'limit': 3, 'action': {'name': 'setDowntime', 'params': ['300']}}, {'limit': 4, 'action': {'name': 'setDowntime', 'params': ['400']}}, {'limit': 6, 'action': {'name': 'setDowntime', 'params': ['500']}}, {'limit': -1, 'action': {'name': 'abort', 'params': []}}]}, 'outgoingLimit': 2, 'enableGuestEvents': True, 'tunneled': 'false', 'encrypted': False, 'compressed': 'false', 'incomingLimit': 2}) from=::ffff:192.168.122.11,48102, flow_id=199fded5-d0fb-45dd-b53a-e3326ecc1aa9, vmId=9c4050b1-8070-429f-ae36-9eae0935ccab (api:48)
2021-07-16 00:26:38,360+0300 INFO  (jsonrpc/5) [api.virt] FINISH migrate return={'status': {'code': 0, 'message': 'Migration in progress'}, 'progress': 0} from=::ffff:192.168.122.11,48102, flow_id=199fded5-d0fb-45dd-b53a-e3326ecc1aa9, vmId=9c4050b1-8070-429f-ae36-9eae0935ccab (api:54)

2. Migration progress started

2021-07-16 00:27:01,532+0300 INFO  (migmon/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Migration Progress: 20.012 seconds elapsed, 83% of data processed, total data: 1096MB, processed data: 858MB, remaining data: 188MB, transfer spe
ed 416Mbps, zero pages: 13331MB, compressed: 0MB, dirty rate: 0, memory iteration: 1 (migration:886)


3. Block threshold exceeded

2021-07-16 00:27:05,718+0300 INFO  (libvirt/events) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Block threshold 536870912 exceeded by 327680 for drive 'sdg' (/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/53594314-c33d-410e-a3c3-910f405b5131/c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa) (drivemonitor:173)

4. Drive extended

2021-07-16 00:27:07,137+0300 INFO  (mailbox-spm/0) [storage.LVM] Extending LV feab3738-c158-4d48-8a41-b5a95c057a50/c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa to 2048 megabytes (lvm:1668)

5. Detecting that the destination does not support extension, disabling drive monitoring

2021-07-16 00:27:09,493+0300 WARN  (mailbox-hsm/2) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Migration destination host does not support extending disk during migration, disabling disk extension during migration (vm:1593)

6. Migration continues after drive was extended

2021-07-16 00:27:11,535+0300 INFO  (migmon/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Migration Progress: 30.014 seconds elapsed, 90% of data processed, total data: 1096MB, processed data: 1373MB, remaining data: 118MB, transfer speed 416Mbps, zero pages: 43933MB, compressed: 0MB, dirty rate: 13953, memory iteration: 3 (migration:886)
...
2021-07-16 00:27:31,545+0300 INFO  (migmon/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Migration Progress: 50.022 seconds elapsed, 86% of data processed, total data: 1096MB, processed data: 2413MB, remaining data: 157MB, transfer speed 416Mbps, zero pages: 45896MB, compressed: 0MB, dirty rate: 12900, memory iteration: 7 (migration:886)

7. Migration completes successfully

2021-07-16 00:27:39,048+0300 INFO  (libvirt/events) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') CPU stopped: onSuspend (vm:6018)
2021-07-16 00:27:40,778+0300 INFO  (libvirt/events) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') new computed progress 86 < than old value 90, discarded (migration:255)
2021-07-16 00:27:40,780+0300 INFO  (migsrc/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') migration took 62 seconds to complete (migration:580)
2021-07-16 00:27:40,780+0300 INFO  (migsrc/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Changed state to Down: Migration succeeded (code=4) (vm:1902)


## dst.log

1. Migration started

2021-07-16 00:26:38,398+0300 INFO  (jsonrpc/2) [api.virt] START migrationCreate...

2. Destination disk volume size is 1G

2021-07-16 00:26:39,435+0300 INFO  (vm/9c4050b1) [vdsm.api] START getVolumeSize(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='53594314-c33d-410e-a3c3-910f405b5131', volUUID='c7be3c53-ce6d-4cd5-92cd
-fe810b6aa2aa', options=None) from=internal, task_id=31c71d10-c09f-4e7f-ab67-35a8f3343e3d (api:48)

2021-07-16 00:26:39,435+0300 INFO  (vm/9c4050b1) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '1073741824', 'truesize': '1073741824'} from=internal, task_id=31c71d10-c09f-4e7f-ab67-35a8f3343e3d (api:54)

3. Migration completed

2021-07-16 00:27:39,199+0300 INFO  (libvirt/events) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') CPU running: onResume (vm:6018)
2021-07-16 00:27:39,214+0300 INFO  (libvirt/events) [vds] Channel state for vm_id=9c4050b1-8070-429f-ae36-9eae0935ccab changed from=disconnected(2) to=connected(1) (qemuguestagent:289)
2021-07-16 00:27:39,221+0300 INFO  (vm/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Starting connection (guestagent:252)
2021-07-16 00:27:39,226+0300 INFO  (vm/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') CPU running: domain initialization (vm:6018)
2021-07-16 00:27:39,236+0300 INFO  (vm/9c4050b1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') End of migration (vm:4081)

4. Setting block threshold on the extended disk

2021-07-16 00:27:40,365+0300 INFO  (periodic/3) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 536870912 bytes for drive 'sdg' (/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/53594314-c33d-410e-a3c3-910f405b5131/c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa) apparentsize 1073741824 (drivemonitor:116)

The destination host still see 1 GiB disk, since the disk was not refreshed
on the destination.

The disk on the destination is not extended after the migration:

$ egrep -A1 getVolumeSize.+c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa vdsm.log 
2021-07-16 00:26:39,435+0300 INFO  (vm/9c4050b1) [vdsm.api] START getVolumeSize(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='53594314-c33d-410e-a3c3-910f405b5131', volUUID='c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa', options=None) from=internal, task_id=31c71d10-c09f-4e7f-ab67-35a8f3343e3d (api:48)
2021-07-16 00:26:39,435+0300 INFO  (vm/9c4050b1) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '1073741824', 'truesize': '1073741824'} from=internal, task_id=31c71d10-c09f-4e7f-ab67-35a8f3343e3d (api:54)
...
2021-07-16 00:42:33,940+0300 INFO  (periodic/0) [vdsm.api] START getVolumeSize(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='53594314-c33d-410e-a3c3-910f405b5131', volUUID='c7be3c53-ce6d-4cd5-92cd-fe810b6aa2aa', options=None) from=internal, task_id=be57bda6-143c-44fb-bf7a-f5bcb5382aa7 (api:48)
2021-07-16 00:42:33,940+0300 INFO  (periodic/0) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '1073741824', 'truesize': '1073741824'} from=internal, task_id=be57bda6-143c-44fb-bf7a-f5bcb5382aa7 (api:54)

This will cause guest data corruption as soon as the guest try to read
from the missing parts of the disk.

Comment 1 Vojtech Juranek 2021-07-20 05:16:25 UTC
Created attachment 1803485 [details]
reproducer-logs

Comment 2 Nir Soffer 2021-07-20 09:55:52 UTC
Here the possible flow of events:
1. Disk extended on the source host
2. QEMU writing data after end of the old device
3. Migration completes, seeing old disk size

Data written after end of the device will read as zeroes on the destination host,
corrupting the disk.

We have 2 cases:

1. Destination does not support disk refresh (vdsm < 4.4.7)

In this case on the first extend attempt we disable automatic disk extension, but
the source disk was already extended on the SPM. Here we have pretty large window
when the guest can write data after the end of the old device.

The reproducer script (attachment 1803484 [details]) reproduces this corruption reliably.

2. Destination supports disk refresh (vdsm >= 4.4.7)

In this case we refresh the disk after every extend, avoiding the data corruption.
But there is still tiny window when the data corruption can occur - after disk was
extended, but before the disk was refreshed.  If the migration finished during this
tiny window the disk will be corrupted.

Upgrading to vdsm >= 4.4.7 is the fix to avoid the first case.

We don't have a fix for the second case.

Updating severity, corrupting user data is the worst kind of bug.

Comment 3 Nir Soffer 2021-07-20 11:03:45 UTC
Created attachment 1803651 [details]
reproducer script v2

This version is more reliable, creating a checksum only for block 1024 instead of entire disk. This 2048 times faster then computing a checksum of the entire disk, preventing migration failure with slow network.

Here is example run:

# ./reproduce.sh 
13:53:58.942847860 Starting stress
Press Enter when migration was started:
13:54:33.498894131 Writing to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.31383 s, 232 MB/s
13:54:35.830913298 Waiting until disk is extended
13:54:45.847324723 Writing more data to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.5595 s, 210 MB/s
0+1 records in
0+1 records out
17 bytes copied, 0.0225126 s, 0.8 kB/s
13:54:48.571706100 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:48.576898723 Stopping stress
13:54:49.638947169 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:50.648707636 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:51.662211024 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:52.677077952 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:53.696843911 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:54.715865137 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:55.725831834 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:56.753593973 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
13:54:57.774374789 Block 1024 checksum 3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3
13:54:57.776767720 Disk was corrupted

Note: this cannot reproduce the issue with new hosts (vdsm >= 4.4.7), since it
waits after writing the first 512 MiB. To reproduce the corruption with new host
we must write to block 1024 immediately after the disk is extended and migration
must finish quickly before the source host refresh the disk on the destination.
I don't think we have a chance to reproduce this case.

Comment 4 Nir Soffer 2021-07-20 18:58:26 UTC
Created attachment 1803850 [details]
Reproducer script for testing new host

This script run 2 threads:
- One writing data, trigger disk extends
- Other comparing disk checksums, detecting disk corruption

How to use:
1. set the chunk size to 128 MiB. see the script for example
   configuration
2. add 3g thin disk on block storage
3. In the guest, run the script:

    # ./reproduce-new-host.py 

4. Migrate the vm from the SPM and back

The bug may be reproduce when migrating from the SPM.


Example log:

2021-07-20 21:24:50,265 INFO [write] Writing extent 0...
2021-07-20 21:24:58,912 INFO [write] Writing extent 1...
2021-07-20 21:25:07,615 INFO [write] Writing extent 2...
2021-07-20 21:25:16,424 INFO [write] Writing extent 3...
2021-07-20 21:25:25,256 INFO [write] Writing extent 4...
2021-07-20 21:25:34,198 INFO [write] Writing extent 5...
2021-07-20 21:25:43,102 INFO [write] Writing extent 6...
2021-07-20 21:25:51,926 INFO [write] Writing extent 7...
2021-07-20 21:26:00,747 INFO [write] Writing extent 8...
2021-07-20 21:26:09,552 INFO [write] Writing extent 9...
2021-07-20 21:26:18,596 INFO [write] Writing extent 10...
2021-07-20 21:26:27,418 INFO [write] Writing extent 11...
2021-07-20 21:26:36,370 INFO [write] Writing extent 12...
2021-07-20 21:26:45,221 INFO [write] Writing extent 13...
2021-07-20 21:26:54,107 INFO [write] Writing extent 14...
2021-07-20 21:27:02,872 INFO [write] Writing extent 15...
2021-07-20 21:27:11,664 INFO [write] Writing extent 16...
2021-07-20 21:27:20,561 INFO [write] Writing extent 17...
2021-07-20 21:27:29,264 INFO [write] Writing extent 18...
2021-07-20 21:27:37,923 INFO [write] Writing extent 19...
2021-07-20 21:27:46,845 INFO [write] Writing extent 20...
2021-07-20 21:27:55,638 INFO [write] Writing extent 21...
2021-07-20 21:28:04,370 INFO [write] Writing extent 22...
2021-07-20 21:28:13,210 INFO [write] Writing extent 23...


I could not reproduce the issue. Maybe using a stress test
running 100s or migrations we have a chance to reproduce
this.

But the script is useful for testing migration flows.

Comment 5 Nir Soffer 2021-07-20 19:02:12 UTC
Created attachment 1803851 [details]
Reproducer script for testing new host

New version fixes bad logging call if corruption was detected.

Comment 6 Nir Soffer 2021-07-20 20:13:54 UTC
There are several directions to solve this issue:

- Cancel migration on the SPM if disk is extended. The user will have to retry
  the migration. If the VM is very active and migration take lot of time, this
  may never succeed. Bad for user experience.

- Change libvirt/qemu to give vdsm control at the end of migration, so vdsm can
  refresh lvs extended on the source before libvirt stats the guest on the 
  destination host. I think this will be very complicated, and adding a refresh
  will increase the downtime.

- Change libvirt/qemu so vdsm can specify the device size when using block device.
  qemu will treat such write after the specified size as ENOSPC without submitting
  any I/O, and will treat read in the same way, returning zeroes without doing 
  actual I/O. Libvirt will add new API to control the device actual size. Vdsm
  will update libvirt about the lv size after refreshing the lv. This does not
  effect migration downtime, but may be very complicated.

- Change LVM to allow extending a lv *without* updating the device size, so extend
  when running on the SPM will behave exactly like extend when running on another
  host. I think this is the most promising direction since it is very close to 
  what we already do and how lvm works.

The root problem is that on the SPM we run:

    lvextend ... vg/lv

This command changes lvm metadata in the pv, and update the kernel state
(resize /dev/dm-N device).

When we migrage from the SPM, we want to separate the changes, so
lvextend will only change lvm metadata without modifying /dev/dm-N.

Then, after we refresh the disk on the destination, the device mapper
device will be updated by:

    lvchange --refresh vg/lv

Turns out that lvextend has the option:

       --driverloaded y|n
              If set to no, the command will not attempt to use device-mapper.
              For testing and debugging.

Which may do what we need.

We can add a flag to extend refresh=True. When False, call with

    lvextend --driverloaded n ... vg/lv

to avoid the device update.

This will affect automatic extend running on the SPM, since we don't know if
a migration will be started right after we started an extend, and waiting
for inflight extend when starting migration is complicated and bad for the
user experience.

Comment 7 Nir Soffer 2021-07-20 20:34:25 UTC
Initial testing of --driverloaded option show that it works:

# lvcreate -n test --size 1g 313e6d78-80f7-41ab-883b-d1bddf77a5da
  Logical volume "test" created.

# lsblk | grep test
  └─313e6d78--80f7--41ab--883b--d1bddf77a5da-test                                     253:29   0    1G  0 lvm   

# lvextend --driverloaded n --size +1g 313e6d78-80f7-41ab-883b-d1bddf77a5da/test
  WARNING: Activation disabled. No device-mapper interaction will be attempted.
  Size of logical volume 313e6d78-80f7-41ab-883b-d1bddf77a5da/test changed from 1.00 GiB (8 extents) to 2.00 GiB (16 extents).
  Logical volume 313e6d78-80f7-41ab-883b-d1bddf77a5da/test successfully resized.

# lsblk | grep test
  └─313e6d78--80f7--41ab--883b--d1bddf77a5da-test                                     253:29   0    1G  0 lvm   

# lvchange --refresh 313e6d78-80f7-41ab-883b-d1bddf77a5da/test

# lsblk | grep test
  └─313e6d78--80f7--41ab--883b--d1bddf77a5da-test                                     253:29   0    2G  0 lvm

We need to test with this with running best during migration.

Comment 12 sshmulev 2021-10-03 10:34:05 UTC
I tried to test it with the latest script Nir attached (Reproducer script for testing new host)
Since it didn't have stress tool in the script, I turned it on manually before the migration and turned it off when I saw the first extension.


Here are the logs from the source host(SPM):
2021-10-03 12:39:27,655+0300 INFO  (periodic/4) [virt.vm] (vmId='4838c7f8-13b0-4605-966a-3bccbe30ac22') Requesting extension for volume bc7c81db-abb8-4ae5-b772-95fc1c366fbe on domain 
f76ddb17-7c14-4cfc-abed-ea1349ef0ef1 (apparent: 1207959552, capacity: 3221225472, allocated: 1141309440, physical: 1207959552 threshold_state: exceeded) (vm:1350)

2021-10-03 12:39:27,655+0300 INFO  (periodic/4) [vdsm.api] START sendExtendMsg(spUUID='84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', volDict={'domainID': 'f76ddb17-7c14-4cfc-abed-ea1349ef0ef1', 'imageID': 'c5ab2a91-b0c3-48a2-a653-61c80f32aacc', 'internal': False, 'name': 'sdb', 'newSize': 1342177280, 'poolID': '84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', 'volumeID': 'bc7c81db-abb8-4ae5-b772-95fc1c366fbe', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=1342177280, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7fb4785b5278>>) from=internal, task_id=94e85d42-a7db-458a-8ab0-6d651e95c009 (api:48)

2021-10-03 12:39:29,659+0300 INFO  (periodic/0) [virt.vm] (vmId='4838c7f8-13b0-4605-966a-3bccbe30ac22') Requesting extension for volume bc7c81db-abb8-4ae5-b772-95fc1c366fbe on domain f76ddb17-7c14-4cfc-abed-ea1349ef0ef1 (apparent: 1207959552, capacity: 3221225472, allocated: 1141309440, physical: 1207959552 threshold_state: exceeded) (vm:1350)

2021-10-03 12:39:29,660+0300 INFO  (periodic/0) [vdsm.api] START sendExtendMsg(spUUID='84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', volDict={'domainID': 'f76ddb17-7c14-4cfc-abed-ea1349ef0ef1', 'imageID': 'c5ab2a91-b0c3-48a2-a653-61c80f32aacc', 'internal': False, 'name': 'sdb', 'newSize': 1342177280, 'poolID': '84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', 'volumeID': 'bc7c81db-abb8-4ae5-b772-95fc1c366fbe', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=1342177280, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7fb4785b5278>>) from=internal, task_id=3bd9358c-2864-48a1-a81a-8b568f3d2a6d (api:48)

2021-10-03 12:39:29,701+0300 INFO  (mailbox-hsm/0) [storage.ThreadPool.WorkerThread] START task 6e2ffb70-bb2b-427d-be45-97b345abb1f0 (cmd=<function runTask at 0x7fb4d5e122f0>, args=(<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7fb4785b5278>>, {'domainID': 'f76ddb17-7c14-4cfc-abed-ea1349ef0ef1', 'imageID': 'c5ab2a91-b0c3-48a2-a653-61c80f32aacc', 'internal': False, 'name': 'sdb', 'newSize': 1342177280, 'poolID': '84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', 'volumeID': 'bc7c81db-abb8-4ae5-b772-95fc1c366fbe', 'clock': <Clock(total=2.05*, extend-volume=2.05, refresh-destination-volume=0.00*)>, 'callback': None})) (threadPool:146)

2021-10-03 12:39:30,125+0300 WARN  (mailbox-hsm/0) [virt.vm] (vmId='4838c7f8-13b0-4605-966a-3bccbe30ac22') Migration destination host does not support extending disk during migration, disabling disk extension during migration (vm:1597)


Before the extension:(data from source)
2021-10-03 12:39:23,410+0300 INFO  (periodic/5) [vdsm.api] START getVolumeSize(sdUUID='f76ddb17-7c14-4cfc-abed-ea1349ef0ef1', spUUID='84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', imgUUID='c5ab2a91-b0c3-48a2-a653-61c80f32aacc', volUUID='bc7c81db-abb8-4ae5-b772-95fc1c366fbe') from=internal, task_id=6412041d-14fe-46e6-9a3a-953a22af32f3 (api:48)
2021-10-03 12:39:23,410+0300 INFO  (periodic/5) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '1207959552', 'truesize': '1207959552'} from=internal, task_id=6412041d-14fe-46e6-9a3a-953a22af32f3 (api:54)


After the extension:(data from dest)
2021-10-03 12:43:24,974+0300 INFO  (periodic/2) [vdsm.api] START getVolumeSize(sdUUID='f76ddb17-7c14-4cfc-abed-ea1349ef0ef1', spUUID='84ba839f-92fa-4bb2-8a55-a4b771e7eb1d', imgUUID='c5ab2a91-b0c3-48a2-a653-61c80f32aacc', volUUID='bc7c81db-abb8-4ae5-b772-95fc1c366fbe', options=None) from=internal, task_id=148b5b3d-5d48-4951-962a-5f0735f441ee (api:48)
2021-10-03 12:43:24,974+0300 INFO  (periodic/2) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '1342177280', 'truesize': '1342177280'} from=internal, task_id=148b5b3d-5d48-4951-962a-5f0735f441ee (api:54)

Versions:
SPM - vdsm-4.40.90.1-1.el8ev.x86_64
NON SPM - vdsm-4.40.60.6-1.el8ev.x86_64



1) I didn't have any data corruption in all of my tries to reproduce the issue, is that enough to say this is ok?
2) As it was mentioned in the fix for the bug, after the first extension in this scenario, further extensions will be blocked - After the migration is fully done, should the disk be blocked for extensions as well? tried with or without migration and it seems to be still turned off.

Comment 15 Vojtech Juranek 2021-10-04 09:50:54 UTC
> 1) I didn't have any data corruption in all of my tries to reproduce the
> issue, is that enough to say this is ok?

You should be able to reproduce the issue. Nir wrote in comment #4 that he wasn't able to reproduce the issue with "Reproducer script for testing new host". IMHO you should use "reproducer script v2". I was able to reproduce the issue with it very reliably. Once reproduce, you should apply the patch and verify that issue doesn't appear any more.

> 2) As it was mentioned in the fix for the bug, after the first extension in
> this scenario, further extensions will be blocked - After the migration is
> fully done, should the disk be blocked for extensions as well? tried with or
> without migration and it seems to be still turned off.

no, once migrated and there are more writes, the disk should be extended as usual (and VM resumed if it was suspended due to lack of disk space during migration)

Comment 16 sshmulev 2021-10-04 16:04:31 UTC
Verified using the "reproducer script v2" attached.


Tried to reproduce the issue with a none fixed version:
SPM with vdsm-4.40.80.5-1.el8ev.x86_64
None-SPM - vdsm-4.40.60.7-1.el8ev.x86_64

problem reproduced:
15:25:48.776940681 Starting stress
Press Enter when migration was started:
15:26:17.571062005 Writing to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.02058 s, 89.2 MB/s
15:26:23.643644764 Waiting until disk is extended
15:26:33.986112410 Writing more data to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.01468 s, 89.3 MB/s
0+1 records in
0+1 records out
17 bytes copied, 0.036696 s, 0.5 kB/s
15:26:40.546513745 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:40.573175317 Stopping stress
./rep.sh: line 39:  5258 Terminated              stress --quiet --vm 1 --vm-bytes 256M
15:26:41.785037050 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:42.807564796 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:43.830035774 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:44.856188442 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:45.880841733 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:46.918733875 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:47.943472268 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:48.968195296 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:49.985555331 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:51.009182605 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:52.034402910 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:53.056241145 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:54.088545379 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:55.113889349 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:56.140813197 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
15:26:57.222328483 Block 1024 checksum 3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3
15:26:57.227912105 Disk was corrupted
[1]+  Terminated              stress --quiet --vm 1 --vm-bytes 256M



On the fixed version it passed:
16:23:38.376413592 Starting stress
Press Enter when migration was started:
16:24:05.723303394 Writing to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.07161 s, 88.4 MB/s
16:24:11.849690679 Waiting until disk is extended
16:24:21.879880452 Writing more data to disk...
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.18676 s, 86.8 MB/s
0+1 records in
0+1 records out
17 bytes copied, 0.0224842 s, 0.8 kB/s
16:24:28.203507895 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:28.216593750 Stopping stress
./rep.sh: line 39:  7436 Terminated              stress --quiet --vm 1 --vm-bytes 256M
16:24:29.369323173 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:30.393301942 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:31.411367575 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:32.430409806 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:33.449047009 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:34.553433788 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:35.597597104 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:36.617205793 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:37.640333386 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:38.665031176 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:39.689913731 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:40.706734709 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
16:24:41.723192333 Block 1024 checksum 7623926e2157f135ec276d28b13943f9d6d2e5ca
......
Continued to loop for a long time after the migration is done because the checksum is identical - which means the disk was not corrupted.

Comment 17 Sandro Bonazzola 2021-10-21 07:27:14 UTC
This bugzilla is included in oVirt 4.4.9 release, published on October 20th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.9 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.