1819240 – RFE: virt-v2v should use qemu's nbd reconnection parameter

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1819240 - RFE: virt-v2v should use qemu's nbd reconnection parameter

Summary: RFE: virt-v2v should use qemu's nbd reconnection parameter

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	virt-v2v
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	beta
Target Release:	---
Assignee:	Virtualization Maintenance
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-31 14:12 UTC by mxie@redhat.com
Modified:	2021-08-11 16:34 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-11 16:34:30 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
retry-filter-default.log (110.52 KB, text/plain) 2020-03-31 14:12 UTC, mxie@redhat.com	no flags	Details
qemu-m-1-salvage.log (869.65 KB, text/plain) 2020-04-01 03:51 UTC, mxie@redhat.com	no flags	Details
View All

Description mxie@redhat.com 2020-03-31 14:12:01 UTC

Created attachment 1675091 [details]
retry-filter-default.log

Description of problem:
nbdkit connection can't be disconnected with retry filter after trying specified times

Version-Release number of selected component (if applicable):
nbdkit-1.16.2-2.module+el8.2.0+5664+dd92f997.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Check nbdkit-retry-filter man page, will retry 5 times before fail the operation by default

# man nbdkit-retry-filter
....     PARAMETERS
         retries=N
         The number of times any single operation will be retried before we give up and fail the operation.  The default is 5.


2.Convert a image from Xen server to rhel8.2 conversion using below command
# nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry  retry-exponential=no
....
nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=2097152
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=2097152
nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=4194304


2.Stop sshd service and kill the ssh process of nbdkit on Xen server during converting, Found nbdkit operation will not stop after retrying 5 times, details pls refer to log"retry-filter-default.log"
# nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry  retry-exponential=no
.....
nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: Connection refused
nbdkit: ssh[1]: debug: ssh: open returned handle (nil)
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: debug: retry 5: original errno = 108
nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
nbdkit: ssh[1]: debug: ssh: reopen readonly=0
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: debug: ssh: open readonly=0
nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: Connection refused
nbdkit: ssh[1]: debug: ssh: open returned handle (nil)
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: debug: retry 6: original errno = 108
nbdkit: ssh[1]: debug: could not recover after 5 retries
nbdkit: ssh[1]: debug: sending error reply: Cannot send after transport endpoint shutdown
nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1084227584
nbdkit: ssh[1]: debug: retry 1: original errno = 0qemu-img: error while reading sector 2113536: Cannot send after transport endpoint shutdown

nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
nbdkit: ssh[1]: debug: ssh: reopen readonly=0
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: debug: ssh: open readonly=0
....


Actual results:
As above description

Expected results:
nbdkit connection can be disconnected with retry filter successfully after trying specified times

Additional info:

Comment 1 Eric Blake 2020-03-31 14:30:03 UTC

Rich Jones noted:
An easier way to reproduce this is as follows:

$ rm -f /tmp/inject ; (sleep 5s; touch /tmp/inject ) & nbdkit -f -v -U - null size=1G --filter=retry --filter=noextents --filter=error error-rate=100% error-file=/tmp/inject --filter=delay rdelay=1 --run 'qemu-img convert $nbd /var/tmp/out.img'

Comment 2 Eric Blake 2020-03-31 20:18:36 UTC

(In reply to mxie from comment #0)
> Created attachment 1675091 [details]
> retry-filter-default.log
> 
> Description of problem:
> nbdkit connection can't be disconnected with retry filter after trying
> specified times
> 

> 2.Stop sshd service and kill the ssh process of nbdkit on Xen server during
> converting, Found nbdkit operation will not stop after retrying 5 times,
> details pls refer to log"retry-filter-default.log"

Rather, the 'qemu-img convert' process is not giving up, even after an EIO after retry gives up.

> # nbdkit -f -v -U - ssh host=10.73.224.33
> /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f
> raw $nbd /tmp/out.img' --filter=retry  retry-exponential=no
> .....
> nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33:
> Connection refused
> nbdkit: ssh[1]: debug: ssh: open returned handle (nil)
> nbdkit: ssh[1]: debug: ssh: close
> nbdkit: ssh[1]: debug: retry 5: original errno = 108
> nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
> nbdkit: ssh[1]: debug: ssh: reopen readonly=0
> nbdkit: ssh[1]: debug: ssh: finalize
> nbdkit: ssh[1]: debug: ssh: close
> nbdkit: ssh[1]: debug: ssh: open readonly=0
> nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33:
> Connection refused
> nbdkit: ssh[1]: debug: ssh: open returned handle (nil)
> nbdkit: ssh[1]: debug: ssh: close
> nbdkit: ssh[1]: debug: retry 6: original errno = 108
> nbdkit: ssh[1]: debug: could not recover after 5 retries
> nbdkit: ssh[1]: debug: sending error reply: Cannot send after transport
> endpoint shutdown

Okay, the retry filter's output is a bit confusing (there were 6 attempts, but only 5 retries), but it IS correctly trying 5 additional times beyond the initial failure.  I'll probably post a trivial wording patch to improve that message, but it won't change behavior.

> nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1084227584
> nbdkit: ssh[1]: debug: retry 1: original errno = 0qemu-img: error while
> reading sector 2113536: Cannot send after transport endpoint shutdown
> 
> nbdkit: ssh[1]: debug: waiting 2 seconds before retrying

Still, what you are seeing here is that qemu-nbd then moves on to the next portion of the image (trying to recover as much as possible, rather than giving up on the first EIO).  That's not nbdkit's fault.  Other than a confusing log message, I don't see nbdkit doing anything wrong.  We may want to reassign this to qemu to have a mode where qemu-img convert gives up on the first error rather than blindly carrying on even after the first EIO.

Comment 3 Eric Blake 2020-03-31 21:38:06 UTC

(In reply to Eric Blake from comment #2)

> Okay, the retry filter's output is a bit confusing (there were 6 attempts,
> but only 5 retries), but it IS correctly trying 5 additional times beyond
> the initial failure.  I'll probably post a trivial wording patch to improve
> that message, but it won't change behavior.

https://www.redhat.com/archives/libguestfs/2020-March/msg00304.html

Comment 4 Eric Blake 2020-03-31 22:19:12 UTC

Other than the fact that qemu-img reports EIO message in sectors instead of byte offsets, I see nothing wrong with qemu-img's behavior. The documentation for qemu 4.2 states that there is a --salvage option that tells qemu-img to ignore ALL I/O errors, but that even when you are not using that, qemu-img defaults to spawning parallel workers (default 8). Since nbdkit's retry filter forces sequential processing (because retrying in parallel is tricky), you have to wait for all 8 of qemu-img's parallel workers to see an EIO failure before qemu-img will quit. You can speed up your test by using 'qemu-img convert -m 1' to limit qemu-img to a single worker (to match that nbdkit with the retry filter will only permit progress for a single worker). Also, your test with the retry filter's default (5 exponential retries, resulting in > 1 minute per failed I/O) requires over 8 minutes to exit when using qemu-img's default 8 coroutines; you can speed things up by tweaking the retry filter to use fewer retries or use linear rather than exponential retries.

Another thing to note is that qemu recently added built-in support for NBD client retries (different than nbdkit, which is NBD server retries). The qemu implementation DOES support parallel retries, so you may actually get better retry behavior by connecting to NBD with qemu doing the retries (especially if the NBD server can run in parallel). We may someday figure out how to improve the nbdkit retry filter to allow parallel retries, but that's more of a QoI issue than a bug fix.

Overall, though, I'm not seeing any behavioral bugs in nbdkit or qemu-img; at most, it is a matter for potentially improving documentation so that users can have more informed expectations.

Comment 5 mxie@redhat.com 2020-04-01 03:50:10 UTC

(In reply to Eric Blake from comment #4)
> Other than the fact that qemu-img reports EIO message in sectors instead of
> byte offsets, I see nothing wrong with qemu-img's behavior.  The
> documentation for qemu 4.2 states that there is a --salvage option that
> tells qemu-img to ignore ALL I/O errors, but that even when you are not
> using that, qemu-img defaults to spawning parallel workers (default 8). 
> Since nbdkit's retry filter forces sequential processing (because retrying
> in parallel is tricky), you have to wait for all 8 of qemu-img's parallel
> workers to see an EIO failure before qemu-img will quit.  You can speed up
> your test by using 'qemu-img convert -m 1' to limit qemu-img to a single
> worker (to match that nbdkit with the retry filter will only permit progress
> for a single worker).  Also, your test with the retry filter's default (5
> exponential retries, resulting in > 1 minute per failed I/O) requires over 8
> minutes to exit when using qemu-img's default 8 coroutines; you can speed
> things up by tweaking the retry filter to use fewer retries or use linear
> rather than exponential retries.

Hi Eric,

   I saw the situation get worse after adding --salvage option,using 'qemu-img convert -m 1' and using fewer retries, the connection can't stop after waiting for a long time(I have to cancel the operation by manual because it can't stop after waiting for more than 20mins), pls refer to log'qemu-m-1-salvage.log'

# nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img -m 1 --salvage' --filter=retry  retries=2  retry-exponential=no

Comment 6 mxie@redhat.com 2020-04-01 03:51:51 UTC

Created attachment 1675289 [details]
qemu-m-1-salvage.log

Comment 7 Richard W.M. Jones 2020-04-01 10:02:37 UTC

Adding --salvage is the wrong thing to do here.  We want qemu-img to stop
when it encounters an unrecoverable error.

The question on my mind is what we should do in virt-v2v.  The problem which
nbdkit-retry-filter was originally attempting to fix is that during long-
running cold conversions, if the network drops even briefly then the
conversion would completely fail.  By adding the retry filter we thought
we could make virt-v2v resilient against that.  Virt-v2v now uses the
filter unconditionally:
https://github.com/libguestfs/virt-v2v/blob/f8e4e8643f105bc2abeb6b80b17a0ded3b636b7c/v2v/nbdkit_sources.ml#L94

Unfortunately the thread model of the retry filter (SERIALIZE_REQUESTS)
probably affects performance.

If qemu can now do retries while keeping the ability to do parallel
requests, then it may be preferable to do it there.

Eric: Where is the support for NBD client retries?  I don't see it in the
code.  How can it be enabled?

Comment 8 Eric Blake 2020-04-01 13:22:50 UTC

qemu 4.2 added the reconnect-delay parameter to any NBD client:

# @reconnect-delay: On an unexpected disconnect, the nbd client tries to
#                   connect again until succeeding or encountering a serious
#                   error.  During the first @reconnect-delay seconds, all
#                   requests are paused and will be rerun on a successful
#                   reconnect. After that time, any delayed requests and all
#                   future requests before a successful reconnect will
#                   immediately fail. Default 0 (Since 4.2)

The retry loop is in qemu/block/nbd.c:nbd_co_reconnect_loop(), it starts with a probe at 1, 2, 4, 8, and then every 16 seconds until exceeding reconnect-delay (if I'm reading the loop correctly, you can wait several minutes, probing ~4 times per minute, if you are comfortable with a reconnect-delay that large).

Virtuozzo submitted the code, and I haven't played with it much myself.  So here's a quick attempt I made that shows it in action (prepare the following two command lines in different terminals; have the qemu command ready to execute within the first 5 seconds after starting the nbdkit command.  I used localhost:10809, but it's easy enough to alter the command line to use a Unix socket instead).  In my case, I saw the qemu progress meter pause at 32% during the window when nbdkit was not running, then successfully resume and go all the way to 100%.  I also observed that there were definitely 8 reads running in parallel, which is more efficient than nbdkit --filter=retry.

$ timeout -s KILL 5s ./nbdkit -fv --filter=delay --filter=noextents null 200M delay-read=1s; sleep 5; ./nbdkit -fv --filter=exitlast --filter=delay --filter=noextents null 200M delay-read=1s

$ qemu-img convert -p -O raw --image-opts driver=nbd,server.type=inet,server.host=localhost,server.port=10809,reconnect-delay=60 out.img

Things I noticed:
- qemu should be taught how to allow up to reconnect-delay seconds to make the initial connection to the NBD server before failing (it's rather annoying that you can't start the qemu command before the NBD command)
- qemu is not doing a very good job of treating ESHUTDOWN errors from the server as indicative of something that is worth retrying.  If I drop the '-s KILL' on the nbdkit command line (to use SIGTERM instead), nbdkit informs qemu about its impending doom with ESHUTDOWN, and qemu treats those as fatal rather than as a reason to retry.
I'll see if I can get those fixed for qemu 5.0.

Comment 9 Richard W.M. Jones 2020-04-02 09:43:24 UTC

I believe the retry filter is really working as intended.  Also qemu-img
is not broken.

However virt-v2v ought to be using qemu's NBD reconnect-delay mechanism,
and we should probably drop unconditional use of the retry filter from
virt-v2v.

Comment 10 Pino Toscano 2020-04-02 10:07:49 UTC

(In reply to Richard W.M. Jones from comment #9)
> However virt-v2v ought to be using qemu's NBD reconnect-delay mechanism,
> and we should probably drop unconditional use of the retry filter from
> virt-v2v.

virt-v2v in RHEL AV does not use the retry filter of nbdkit.

Comment 14 Klaus Heinrich Kiwi 2021-08-09 20:40:45 UTC

Eric,

 any updates to this but that you are aware of? Can I assign this one to you?

Comment 15 Richard W.M. Jones 2021-08-09 20:58:51 UTC

Virt-v2v 2.0 will use nbdcopy so it's possible this bug is no longer applicable.
Also the retry filter is working well since Eric rewrote the filter and a lot
of the server internals that it depends on.

Comment 16 Klaus Heinrich Kiwi 2021-08-11 13:20:13 UTC

Can we tentatively collapse this one into the v2v 2.0 RFE then? I couldn't find a tracker bug for it yet, though

Comment 17 Richard W.M. Jones 2021-08-11 16:34:30 UTC

Virt-v2v 2.0 won't use qemu so there's no overlap.  I think we should just
close this bug.

Note You need to log in before you can comment on or make changes to this bug.