Bug 1657713 - RFE: virt-v2v: Implement NBD support for -o rhv-upload mode
Summary: RFE: virt-v2v: Implement NBD support for -o rhv-upload mode
Keywords:
Status: NEW
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On: 1751212
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-10 10:29 UTC by Richard W.M. Jones
Modified: 2020-06-19 15:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1751212 (view as bug list)
Environment:
Last Closed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2018-12-10 10:29:13 UTC
Description of problem:

Nir has added NBD support to oVirt/imageio.  We should support
this for -o rhv-upload as soon as possible as it is likely
to be better than using the current HTTP path (and easier to
maintain).

For full details see this email:

https://www.redhat.com/archives/libguestfs/2018-December/msg00111.html

Comment 1 Nir Soffer 2018-12-10 12:07:07 UTC
Martin, can we consider this for 4.3.0? 4.3.z?

Comment 4 Nir Soffer 2019-08-15 17:03:45 UTC
Most of the work is already done on RHV side as part of incremental backup
work.

The missing parts are:
- Provide protocol parameter to image transfer
- Update the transfer flow to support nbd only protocol. Mainly skip
  stuff that is not needed when using nbd, like installing tickets and
  monitoring progress.

Comment 6 Nir Soffer 2020-06-18 13:09:33 UTC
This change requires:

- Small change in oVirt API to enable NBD transfer. In this mode engine
  tranfer_url will be a nbd URL (e.g. "nbd:unix:/path/to/socket")

- Adapt engine monitoring logic for NBD transfer, where we don't have
  any information on progress. Basically this means that engine will not
  pause the session when client is not active since client activity is
  not visible in this mode.

- rewrite of virt-v2v rhv-upload-plugin internals
  without changing the user visible behavior.

So from QE point of view this require retesting of virt-v2v flow to 
make sure there are no regressions.

If QE has negative flows tests for client inactivity these tests will
may break when using NBD protocol, since in this mode we don't have any
visibility on client activity.

Comment 7 Richard W.M. Jones 2020-06-19 07:49:54 UTC
(In reply to Nir Soffer from comment #6)
> This change requires:
> 
> - Small change in oVirt API to enable NBD transfer. In this mode engine
>   tranfer_url will be a nbd URL (e.g. "nbd:unix:/path/to/socket")

There's now an actual specification for NBD URIs, so best to use it
rather than making up one or using the deprecated QEMU format:

https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md

> - Adapt engine monitoring logic for NBD transfer, where we don't have
>   any information on progress. Basically this means that engine will not
>   pause the session when client is not active since client activity is
>   not visible in this mode.

Can't progress be approximated by something like the highest byte offset
written (not zeroed) by the client?  It depends if you need only
rough progress or something that is always perfectly accurate.
Activity in general is just whether or not the NBD client has sent
anything recently.

(Of course if the problem is that qemu-nbd isn't providing this information,
then use nbdkit + log filter, or a custom filter).
http://libguestfs.org/nbdkit-log-filter.1.html
http://libguestfs.org/nbdkit-filter.3.html

Comment 8 Nir Soffer 2020-06-19 15:41:10 UTC
(In reply to Richard W.M. Jones from comment #7)
> (In reply to Nir Soffer from comment #6)
> > This change requires:
> > 
> > - Small change in oVirt API to enable NBD transfer. In this mode engine
> >   tranfer_url will be a nbd URL (e.g. "nbd:unix:/path/to/socket")
> 
> There's now an actual specification for NBD URIs, so best to use it
> rather than making up one or using the deprecated QEMU format:
> 
> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md

The URL we use is supported by qemu-nbd, so it is valid even if 
it is non-standard. qemu syntax predated the standard.

Currently we can use only the qemu syntax since imageio does not support
the new syntax. I file bug 1849091 for imageio, and bug 1849097 for vdsm
which generates these URLs.

I'm not sure when the new syntax will be supported. Vdsm can generate
the new URL only when all hosts in the DC which may be selected for
image transfer support the new syntax, meaning running imageio version
that understands the new syntax. Since we allow mixing different versions
of vdsm and imageio on different hosts, I'm not sure we have a good 
way to switch the new syntax before we introduce cluster compatibility
version 4.5. Using cluster compatibility version, we can ensure that
all hosts are running compatible vdsm version, which imply compatible
imageio version.

So most likely if we introduce NBD transport in 4.4.z, virt-v2v will
have to use the existing nbd:unix:/socket:exportname=disk syntax.

> > - Adapt engine monitoring logic for NBD transfer, where we don't have
> >   any information on progress. Basically this means that engine will not
> >   pause the session when client is not active since client activity is
> >   not visible in this mode.
> 
> Can't progress be approximated by something like the highest byte offset
> written (not zeroed) by the client?  It depends if you need only
> rough progress or something that is always perfectly accurate.
> Activity in general is just whether or not the NBD client has sent
> anything recently.
> 
> (Of course if the problem is that qemu-nbd isn't providing this information,
> then use nbdkit + log filter, or a custom filter).
> http://libguestfs.org/nbdkit-log-filter.1.html
> http://libguestfs.org/nbdkit-filter.3.html

Progress is nice to have but not required. More important is being able to
detect client activity, which means in imageio extending the ticket on 
every client request. If a client stops sending requests, engine will 
pause the transfer after inactivity timeout defined in the transfer.

With having client activity, we will have to disable the inactivity 
timeout, or use very high value since we are not in the loop when
transferring via NBD.

We cannot use nbdkit because of 2 reasons:
- It does not support qcow2
- It does not support direct I/O.

Not using direct I/O is very bad in oVirt, and can lead to sanlock
renewal timeouts, which may lead to expiring leases and killing vdsm
or vms holding leases.

For example see:
https://bugzilla.redhat.com/show_bug.cgi?id=1247135#c30

This is the reason all oVirt disk operations use direct I/O.

When we are converting images with qemu-img convert we always use
"-t none -T none".

We run qemu-nbd with "--cache=none"

And imageio file backend always uses direct I/O.

direct I/O + fsync() is also usually faster then buffered I/O
and gives more predictable behavior.


Note You need to log in before you can comment on or make changes to this bug.