Bug 1892403 - Image download via SDK broken with older engines
Summary: Image download via SDK broken with older engines
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.35
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.4
: 4.40.40
Assignee: Nir Soffer
QA Contact: Ilan Zuckerman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-28 16:58 UTC by Nir Soffer
Modified: 2023-09-15 00:50 UTC (History)
5 users (show)

Fixed In Version: vdsm-4.40.40
Doc Type: Bug Fix
Doc Text:
Cause: Incorrect handling of missing argument when vdsm is accessed by older engines (engine < 4.4.3). Consequence: Downloading a disk with snapshots using NBD return incorrect data only from the top snapshot. Fix: Vdsm handles missing argument properly when using older engine versions. Result: Download of disks with snapshots return data from all snapshots as expected.
Clone Of:
Environment:
Last Closed: 2021-01-12 16:23:51 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+
aoconnor: blocker-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111970 0 master MERGED tests: Modernize nbd tests 2021-01-03 14:55:28 UTC
oVirt gerrit 111971 0 master MERGED tests: Test NBD.start_server backing chain support 2021-01-03 14:55:31 UTC
oVirt gerrit 111972 0 master MERGED nbd: Fix compatibility with older engines 2021-01-03 14:55:31 UTC
oVirt gerrit 111980 0 ovirt-4.4.3 ABANDONED tests: Modernize nbd tests 2021-01-03 14:56:06 UTC
oVirt gerrit 111981 0 ovirt-4.4.3 ABANDONED tests: Test NBD.start_server backing chain support 2021-01-03 14:55:28 UTC
oVirt gerrit 111982 0 ovirt-4.4.3 ABANDONED nbd: Fix compatibility with older engines 2021-01-03 14:55:29 UTC

Description Nir Soffer 2020-10-28 16:58:16 UTC
Description of problem:

Vdsm 4.40.32 added support for downloading single volume from a volume chain
by specifying new backing_chain argument. Engine 4.4.3.7 or later always
specify backing_chain=False or backing_chain=True.

When using newer vdsm >= 4.4.32 with older engine < 4.4.3.7 that does not
specify the backing_chain argument, the NBD server is configured with
backing_chain=None, which is treated as backing_chain=False instead of backing_chain=True.

Users downloading images with multiple snapshots will get the only the data
of the last snapshot.

Users downloading specific snapshot will get only the sepcidied snapshot
instead of the entire chain starting at the snapshot.

Here is example commands showing the wrong NBD server configuration:

Original chain:

# qemu-img info --backing-chain /rhev/data-center/mnt/nfs1\:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/0a44d697-41bd-4f38-81f2-e955f51799c4
image: /rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/0a44d697-41bd-4f38-81f2-e955f51799c4
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 72.4 MiB
cluster_size: 65536
backing file: 20391fbc-fa77-4fef-9aea-cf59f27f90b5 (actual path: /rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/20391fbc-fa77-4fef-9aea-cf59f27f90b5)
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/20391fbc-fa77-4fef-9aea-cf59f27f90b5
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 1.65 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Starting NBD server:

# cat nbd.json 
{
  "server_id": "test",
  "config": {
    "sd_id": "f5915245-0ac5-4712-b8b2-dd4d4be7cdc4",
      "img_id": "4b62aa6d-3bdd-4db3-b26f-0484c4124631",
      "vol_id": "0a44d697-41bd-4f38-81f2-e955f51799c4",
      "readonly": true
  }
}

# vdsm-client -f nbd.json NBD start_server
"nbd:unix:/run/vdsm/nbd/test.sock"

The qemu-nbd process:

# ps -ef | grep qemu-nbd | grep -v grep
vdsm      252402       1  0 18:49 ?        00:00:00 /usr/bin/qemu-nbd --socket /run/vdsm/nbd/test.sock --persistent --shared=8 --export-name= --cache=none --aio=native --read-only json:{"driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/0a44d697-41bd-4f38-81f2-e955f51799c4"}, "backing": null}

Note: "backing": null - the server export only the top volume.
(this is the root cause).

Downloading the image:

# qemu-img convert -f raw -O qcow2 nbd:unix:/run/vdsm/nbd/test.sock download.qcow2

The downloaded image:

# qemu-img info download.qcow2 
image: download.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 42.9 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

The download includes only data from the top volume, instead of
the entire chain.

Version-Release number of selected component (if applicable):
v4.40.32

How reproducible:
Always

Steps to Reproduce:
1. Use engine < 4.4.3.7 with newer vdsm >= 4.40.32
2. Create VM with one disk with some data
3. Stop the VM
4. Create a snapshot including the disk
4. Download the disk using download_disk.py

Actual results:
The download includes only the snapshot. Since the snapshot is empty,
the download will be empty image.

Expected results:
the download includes the entire disk.

Comment 1 Nir Soffer 2020-10-28 17:00:06 UTC
This is a regression introduced by fix for bug 1847090.

Comment 2 RHEL Program Management 2020-10-28 17:00:59 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Nir Soffer 2020-10-29 14:48:34 UTC
Tested with engine 4.4.3.2.

Without this fix (vdsm 4.40.35.1)

$ ./download_disk.py -c engine3 8f2ce956-47f3-49d7-ae68-bcf810addee5 download.qcow2
[   0.0 ] Connecting...
[   0.3 ] Creating image transfer...
[   1.8 ] Transfer ID: b5f93f9d-13f1-4823-80ed-9534e7da278c
[   1.8 ] Transfer host name: host3
[   1.8 ] Downloading image...
[ 100.00% ] 6.00 GiB, 0.13 seconds, 46.24 GiB/s                                
[   1.9 ] Finalizing image transfer...

$ qemu-img info download.qcow2 
image: download.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
With the fix:

$ ./download_disk.py -c engine3 8f2ce956-47f3-49d7-ae68-bcf810addee5 download.qcow2
[   0.0 ] Connecting...
[   0.1 ] Creating image transfer...
[   1.4 ] Transfer ID: 568f95e1-bff7-42c2-84ff-b3ae474a42d3
[   1.4 ] Transfer host name: host3
[   1.4 ] Downloading image...
[ 100.00% ] 6.00 GiB, 12.83 seconds, 478.81 MiB/s                              
[  14.2 ] Finalizing image transfer...

$ qemu-img info download.qcow2 
image: download.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.51 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 10 Ilan Zuckerman 2021-01-06 13:42:44 UTC
Tested on:

rhv-release-4.4.3-7-001.noarch
ovirt-engine-4.4.3.12-0.1.el8ev.noarch
vdsm-4.40.35.1-1.el8ev.x86_64

This setup complies to the requirement from description:
"1. Use engine < 4.4.3.7 with newer vdsm >= 4.40.32"

Steps:
1. Use engine < 4.4.3.7 with newer vdsm >= 4.40.32
2. Create VM with one disk with some data
3. Stop the VM
4. Create a snapshot including the disk
5. Download the disk using download_disk.py

[root@storage-ge10-vdsm1 examples]# python3 download_disk.py 5bc1c7c6-c6b5-4936-8893-f400fe4993c7 /tmp/downloaded -c engine
[   0.0 ] Connecting...
[   0.2 ] Creating image transfer...
[   2.9 ] Transfer ID: 4dc9e7e7-6a0f-4f1c-ba8f-b76731696e41
[   2.9 ] Transfer host name: host_mixed_1
[   2.9 ] Downloading image...
[ 100.00% ] 10.00 GiB, 11.81 seconds, 866.90 MiB/s                             
[  14.7 ] Finalizing image transfer...


Expected:
The download includes the entire disk. Not just the snapshot.

Actual result: 
As expected

[root@storage-ge10-vdsm1 examples]# qemu-img info /tmp/downloaded 
image: /tmp/downloaded
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 2.3 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 11 Sandro Bonazzola 2021-01-12 16:23:51 UTC
This bugzilla is included in oVirt 4.4.4 release, published on December 21st 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 12 Red Hat Bugzilla 2023-09-15 00:50:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.