Bug 1819240
| Summary: | RFE: virt-v2v should use qemu's nbd reconnection parameter | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | mxie <mxie> | ||||||
| Component: | virt-v2v | Assignee: | Virtualization Maintenance <virt-maint> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | unspecified | CC: | eblake, juzhou, kkiwi, mzhan, ptoscano, rjones, tyan, tzheng, virt-maint, xiaodwan | ||||||
| Target Milestone: | beta | Keywords: | FutureFeature, RFE, Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-08-11 16:34:30 UTC | Type: | Feature Request | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Rich Jones noted: An easier way to reproduce this is as follows: $ rm -f /tmp/inject ; (sleep 5s; touch /tmp/inject ) & nbdkit -f -v -U - null size=1G --filter=retry --filter=noextents --filter=error error-rate=100% error-file=/tmp/inject --filter=delay rdelay=1 --run 'qemu-img convert $nbd /var/tmp/out.img' (In reply to mxie from comment #0) > Created attachment 1675091 [details] > retry-filter-default.log > > Description of problem: > nbdkit connection can't be disconnected with retry filter after trying > specified times > > 2.Stop sshd service and kill the ssh process of nbdkit on Xen server during > converting, Found nbdkit operation will not stop after retrying 5 times, > details pls refer to log"retry-filter-default.log" Rather, the 'qemu-img convert' process is not giving up, even after an EIO after retry gives up. > # nbdkit -f -v -U - ssh host=10.73.224.33 > /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f > raw $nbd /tmp/out.img' --filter=retry retry-exponential=no > ..... > nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: > Connection refused > nbdkit: ssh[1]: debug: ssh: open returned handle (nil) > nbdkit: ssh[1]: debug: ssh: close > nbdkit: ssh[1]: debug: retry 5: original errno = 108 > nbdkit: ssh[1]: debug: waiting 2 seconds before retrying > nbdkit: ssh[1]: debug: ssh: reopen readonly=0 > nbdkit: ssh[1]: debug: ssh: finalize > nbdkit: ssh[1]: debug: ssh: close > nbdkit: ssh[1]: debug: ssh: open readonly=0 > nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: > Connection refused > nbdkit: ssh[1]: debug: ssh: open returned handle (nil) > nbdkit: ssh[1]: debug: ssh: close > nbdkit: ssh[1]: debug: retry 6: original errno = 108 > nbdkit: ssh[1]: debug: could not recover after 5 retries > nbdkit: ssh[1]: debug: sending error reply: Cannot send after transport > endpoint shutdown Okay, the retry filter's output is a bit confusing (there were 6 attempts, but only 5 retries), but it IS correctly trying 5 additional times beyond the initial failure. I'll probably post a trivial wording patch to improve that message, but it won't change behavior. > nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1084227584 > nbdkit: ssh[1]: debug: retry 1: original errno = 0qemu-img: error while > reading sector 2113536: Cannot send after transport endpoint shutdown > > nbdkit: ssh[1]: debug: waiting 2 seconds before retrying Still, what you are seeing here is that qemu-nbd then moves on to the next portion of the image (trying to recover as much as possible, rather than giving up on the first EIO). That's not nbdkit's fault. Other than a confusing log message, I don't see nbdkit doing anything wrong. We may want to reassign this to qemu to have a mode where qemu-img convert gives up on the first error rather than blindly carrying on even after the first EIO. (In reply to Eric Blake from comment #2) > Okay, the retry filter's output is a bit confusing (there were 6 attempts, > but only 5 retries), but it IS correctly trying 5 additional times beyond > the initial failure. I'll probably post a trivial wording patch to improve > that message, but it won't change behavior. https://www.redhat.com/archives/libguestfs/2020-March/msg00304.html Other than the fact that qemu-img reports EIO message in sectors instead of byte offsets, I see nothing wrong with qemu-img's behavior. The documentation for qemu 4.2 states that there is a --salvage option that tells qemu-img to ignore ALL I/O errors, but that even when you are not using that, qemu-img defaults to spawning parallel workers (default 8). Since nbdkit's retry filter forces sequential processing (because retrying in parallel is tricky), you have to wait for all 8 of qemu-img's parallel workers to see an EIO failure before qemu-img will quit. You can speed up your test by using 'qemu-img convert -m 1' to limit qemu-img to a single worker (to match that nbdkit with the retry filter will only permit progress for a single worker). Also, your test with the retry filter's default (5 exponential retries, resulting in > 1 minute per failed I/O) requires over 8 minutes to exit when using qemu-img's default 8 coroutines; you can speed things up by tweaking the retry filter to use fewer retries or use linear rather than exponential retries. Another thing to note is that qemu recently added built-in support for NBD client retries (different than nbdkit, which is NBD server retries). The qemu implementation DOES support parallel retries, so you may actually get better retry behavior by connecting to NBD with qemu doing the retries (especially if the NBD server can run in parallel). We may someday figure out how to improve the nbdkit retry filter to allow parallel retries, but that's more of a QoI issue than a bug fix. Overall, though, I'm not seeing any behavioral bugs in nbdkit or qemu-img; at most, it is a matter for potentially improving documentation so that users can have more informed expectations. (In reply to Eric Blake from comment #4) > Other than the fact that qemu-img reports EIO message in sectors instead of > byte offsets, I see nothing wrong with qemu-img's behavior. The > documentation for qemu 4.2 states that there is a --salvage option that > tells qemu-img to ignore ALL I/O errors, but that even when you are not > using that, qemu-img defaults to spawning parallel workers (default 8). > Since nbdkit's retry filter forces sequential processing (because retrying > in parallel is tricky), you have to wait for all 8 of qemu-img's parallel > workers to see an EIO failure before qemu-img will quit. You can speed up > your test by using 'qemu-img convert -m 1' to limit qemu-img to a single > worker (to match that nbdkit with the retry filter will only permit progress > for a single worker). Also, your test with the retry filter's default (5 > exponential retries, resulting in > 1 minute per failed I/O) requires over 8 > minutes to exit when using qemu-img's default 8 coroutines; you can speed > things up by tweaking the retry filter to use fewer retries or use linear > rather than exponential retries. Hi Eric, I saw the situation get worse after adding --salvage option,using 'qemu-img convert -m 1' and using fewer retries, the connection can't stop after waiting for a long time(I have to cancel the operation by manual because it can't stop after waiting for more than 20mins), pls refer to log'qemu-m-1-salvage.log' # nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img -m 1 --salvage' --filter=retry retries=2 retry-exponential=no Created attachment 1675289 [details]
qemu-m-1-salvage.log
Adding --salvage is the wrong thing to do here. We want qemu-img to stop when it encounters an unrecoverable error. The question on my mind is what we should do in virt-v2v. The problem which nbdkit-retry-filter was originally attempting to fix is that during long- running cold conversions, if the network drops even briefly then the conversion would completely fail. By adding the retry filter we thought we could make virt-v2v resilient against that. Virt-v2v now uses the filter unconditionally: https://github.com/libguestfs/virt-v2v/blob/f8e4e8643f105bc2abeb6b80b17a0ded3b636b7c/v2v/nbdkit_sources.ml#L94 Unfortunately the thread model of the retry filter (SERIALIZE_REQUESTS) probably affects performance. If qemu can now do retries while keeping the ability to do parallel requests, then it may be preferable to do it there. Eric: Where is the support for NBD client retries? I don't see it in the code. How can it be enabled? qemu 4.2 added the reconnect-delay parameter to any NBD client: # @reconnect-delay: On an unexpected disconnect, the nbd client tries to # connect again until succeeding or encountering a serious # error. During the first @reconnect-delay seconds, all # requests are paused and will be rerun on a successful # reconnect. After that time, any delayed requests and all # future requests before a successful reconnect will # immediately fail. Default 0 (Since 4.2) The retry loop is in qemu/block/nbd.c:nbd_co_reconnect_loop(), it starts with a probe at 1, 2, 4, 8, and then every 16 seconds until exceeding reconnect-delay (if I'm reading the loop correctly, you can wait several minutes, probing ~4 times per minute, if you are comfortable with a reconnect-delay that large). Virtuozzo submitted the code, and I haven't played with it much myself. So here's a quick attempt I made that shows it in action (prepare the following two command lines in different terminals; have the qemu command ready to execute within the first 5 seconds after starting the nbdkit command. I used localhost:10809, but it's easy enough to alter the command line to use a Unix socket instead). In my case, I saw the qemu progress meter pause at 32% during the window when nbdkit was not running, then successfully resume and go all the way to 100%. I also observed that there were definitely 8 reads running in parallel, which is more efficient than nbdkit --filter=retry. $ timeout -s KILL 5s ./nbdkit -fv --filter=delay --filter=noextents null 200M delay-read=1s; sleep 5; ./nbdkit -fv --filter=exitlast --filter=delay --filter=noextents null 200M delay-read=1s $ qemu-img convert -p -O raw --image-opts driver=nbd,server.type=inet,server.host=localhost,server.port=10809,reconnect-delay=60 out.img Things I noticed: - qemu should be taught how to allow up to reconnect-delay seconds to make the initial connection to the NBD server before failing (it's rather annoying that you can't start the qemu command before the NBD command) - qemu is not doing a very good job of treating ESHUTDOWN errors from the server as indicative of something that is worth retrying. If I drop the '-s KILL' on the nbdkit command line (to use SIGTERM instead), nbdkit informs qemu about its impending doom with ESHUTDOWN, and qemu treats those as fatal rather than as a reason to retry. I'll see if I can get those fixed for qemu 5.0. I believe the retry filter is really working as intended. Also qemu-img is not broken. However virt-v2v ought to be using qemu's NBD reconnect-delay mechanism, and we should probably drop unconditional use of the retry filter from virt-v2v. (In reply to Richard W.M. Jones from comment #9) > However virt-v2v ought to be using qemu's NBD reconnect-delay mechanism, > and we should probably drop unconditional use of the retry filter from > virt-v2v. virt-v2v in RHEL AV does not use the retry filter of nbdkit. Eric, any updates to this but that you are aware of? Can I assign this one to you? Virt-v2v 2.0 will use nbdcopy so it's possible this bug is no longer applicable. Also the retry filter is working well since Eric rewrote the filter and a lot of the server internals that it depends on. Can we tentatively collapse this one into the v2v 2.0 RFE then? I couldn't find a tracker bug for it yet, though Virt-v2v 2.0 won't use qemu so there's no overlap. I think we should just close this bug. |
Created attachment 1675091 [details] retry-filter-default.log Description of problem: nbdkit connection can't be disconnected with retry filter after trying specified times Version-Release number of selected component (if applicable): nbdkit-1.16.2-2.module+el8.2.0+5664+dd92f997.x86_64 How reproducible: 100% Steps to Reproduce: 1.Check nbdkit-retry-filter man page, will retry 5 times before fail the operation by default # man nbdkit-retry-filter .... PARAMETERS retries=N The number of times any single operation will be retried before we give up and fail the operation. The default is 5. 2.Convert a image from Xen server to rhel8.2 conversion using below command # nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry retry-exponential=no .... nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=2097152 nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=2097152 nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=4194304 2.Stop sshd service and kill the ssh process of nbdkit on Xen server during converting, Found nbdkit operation will not stop after retrying 5 times, details pls refer to log"retry-filter-default.log" # nbdkit -f -v -U - ssh host=10.73.224.33 /var/lib/xen/images/xen-hvm-rhel7.7-x86_64.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry retry-exponential=no ..... nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[1]: debug: ssh: open returned handle (nil) nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: debug: retry 5: original errno = 108 nbdkit: ssh[1]: debug: waiting 2 seconds before retrying nbdkit: ssh[1]: debug: ssh: reopen readonly=0 nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: debug: ssh: open readonly=0 nbdkit: ssh[1]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[1]: debug: ssh: open returned handle (nil) nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: debug: retry 6: original errno = 108 nbdkit: ssh[1]: debug: could not recover after 5 retries nbdkit: ssh[1]: debug: sending error reply: Cannot send after transport endpoint shutdown nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1084227584 nbdkit: ssh[1]: debug: retry 1: original errno = 0qemu-img: error while reading sector 2113536: Cannot send after transport endpoint shutdown nbdkit: ssh[1]: debug: waiting 2 seconds before retrying nbdkit: ssh[1]: debug: ssh: reopen readonly=0 nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: debug: ssh: open readonly=0 .... Actual results: As above description Expected results: nbdkit connection can be disconnected with retry filter successfully after trying specified times Additional info: