Description of problem: Upstream we have added a new filter to nbdkit called "retry" which transparently reopens connections if there's a network timeout or network interruption. We should backport this filter to nbdkit in RHEL AV. Steps to Reproduce: The upstream test suite contains an automated test. One way to test this by hand is to create a large file on an SSH server, and copy it to your local machine: nbdkit -f -v -U - ssh host=server /var/tmp/test.img \ --run 'qemu-img convert -p -f raw $nbd /var/tmp/out.img' While this is copying, log into the remote SSH server and look for the "sshd" process associated with the nbdkit connection, and kill it. If the patch is working then the nbdkit process should recover after a few seconds, and the copy should complete successfully. Upstream commits: https://github.com/libguestfs/nbdkit/commit/fb72fcd1d8dff02a02e4c473ff09c813bef09b65 https://github.com/libguestfs/nbdkit/commit/f0f0ec497d36ba7753fdfd0bd322507613d1e276 https://github.com/libguestfs/nbdkit/commit/75434982a1a80f29652b7bc72156084a16ef6fc9
This has been done by the rebase of nbdkit to 1.16, bug 1756307.
Test the bug with builds: nbdkit-1.16.2-1.module+el8.2.0+5579+d71178e0.x86_64 Steps: 1.Prepare a large file on remote rhel7.8 server, #qemu-img info disk1.img image: disk1.img file format: raw virtual size: 3.6G (3901456384 bytes) disk size: 3.6G 2.Use nbdkit command to copy it to a local rhel8.2 server: #nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f qcow2 $nbd /var/tmp/out.img' 3.When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection. # ps -ef|grep sshd root 14797 1 0 09:53 ? 00:00:02 sshd: root@pts/2 root 19871 1 0 17:01 ? 00:00:00 /usr/sbin/sshd -D root 20167 19871 10 17:38 ? 00:00:00 sshd: root@notty root 20188 14806 0 17:38 pts/2 00:00:00 grep --color=auto sshd # kill -9 20167 4.Log into the local server and find the nbdkit process failed. # nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /var/tmp/out.img' nbdkit: debug: TLS disabled: could not load TLS certificates nbdkit: debug: registering plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so nbdkit: debug: registered plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so (name ssh) nbdkit: debug: ssh: load nbdkit: debug: ssh: config key=host, value=10.66.87.212 nbdkit: debug: ssh: config key=path, value=/tmp/disk1.img nbdkit: debug: ssh: config_complete nbdkit: debug: using thread model: serialize_requests nbdkit: debug: bound to unix socket /tmp/nbdkitmZKDjp/socket nbdkit: debug: forked into background (new pid = 28605) nbdkit: debug: accepted connection nbdkit: ssh[1]: debug: newstyle negotiation: flags: global 0x3 nbdkit: ssh[1]: debug: newstyle negotiation: client flags: 0x3 nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_STRUCTURED_REPLY: client requested structured replies nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: client requested export '' nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: set count: 1 nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: set base:allocation nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: replying with base:allocation id 1 nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: reply complete nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_GO: client requested export '' nbdkit: ssh[1]: debug: ssh: open readonly=0 nbdkit: ssh[1]: debug: opened libssh handle nbdkit: ssh[1]: debug: ssh: open returned handle 0x7fc0e4001d70 nbdkit: ssh[1]: debug: ssh: prepare readonly=0 nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: can_write nbdkit: ssh[1]: debug: ssh: can_zero nbdkit: ssh[1]: debug: ssh: can_write nbdkit: ssh[1]: debug: ssh: can_fast_zero nbdkit: ssh[1]: debug: ssh: can_zero nbdkit: ssh[1]: debug: ssh: can_trim nbdkit: ssh[1]: debug: ssh: can_write nbdkit: ssh[1]: debug: ssh: can_fua nbdkit: ssh[1]: debug: ssh: can_write nbdkit: ssh[1]: debug: ssh: can_flush nbdkit: ssh[1]: debug: ssh: is_rotational nbdkit: ssh[1]: debug: ssh: can_multi_conn nbdkit: ssh[1]: debug: ssh: can_cache nbdkit: ssh[1]: debug: ssh: can_extents nbdkit: ssh[1]: debug: newstyle negotiation: flags: export 0x8cd nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_GO: ignoring NBD_INFO_* request 3 (NBD_INFO_BLOCK_SIZE) nbdkit: ssh[1]: debug: handshake complete, processing requests serially nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: extents count=2147483136 offset=0 req_one=1 nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: extents count=1753973248 offset=2147483136 req_one=1 nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: extents count=2147483136 offset=0 req_one=1 nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=0 nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=2097152 [...] nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1935671296 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1937768448 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) qemu-img: error while reading sector 3780608: Input/output error nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1939865600 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1941962752 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]qemu-img: error while reading sector 3784704: Input/output error : debug: ssh: pread count=2097152 offset=1944059904 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1946157056 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: sshqemu-img: error while reading sector 3788800: Input/output error [1]: debug: ssh: pread count=2097152 offset=1948254208 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1950351360 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: sending error reply: Input/output error qemu-img: error while reading sector 3792896: Input/output error qemu-img: error while reading sector 3796992: Input/output error qemu-img: error while reading sector 3801088: Input/output error qemu-img: error while reading sector 3805184: Input/output error qemu-img: error while reading sector 3809280: Input/output error nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected nbdkit: debug: ssh: unload plugin Hi,Rjones, 1.As above testing,I think the result is not expected.Could you help to review my test steps? 2.when I want to create an NBD connection over SSH,Should I setup Passwordless SSH Login in advance?
The retry action isn't automatic, you have to enable the retry filter. Just modify your nbdkit command line by adding: --filter=retry Although actually the test you have done in comment 3 was still worthwhile because you've demonstrated that the connection does break without the filter.
Thanks for your kindly reply. Verify the bug with build: nbdkit-1.16.2-1.module+el8.2.0+5579+d71178e0.x86_64 Steps: 1.Prepare a large file on remote rhel7.8 server, #qemu-img info disk1.img image: disk1.img file format: raw virtual size: 3.6G (3901456384 bytes) disk size: 3.6G 2.Check nbdkit new filter's man page and use nbdkit command to copy it to a local rhel8.2 server and set filter=retry: # man nbdkit-retry-filter Scenario1 Keep all paraments about "retry" as default. 1.1 # nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry 1.2 When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection. # ps -ef|grep sshd root 19871 1 0 Feb05 ? 00:00:00 /usr/sbin/sshd -D root 27693 19871 0 10:17 ? 00:00:00 sshd: root@pts/0 root 27903 19871 9 10:38 ? 00:00:01 sshd: root@notty root 27928 27704 0 10:38 pts/0 00:00:00 grep --color=auto sshd # kill -9 27903 1.3.Log into the local server and check the result. 1.3.1 THe nbdkit process recovered after a few seconds, and the copy completed successfully. [...] nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1178599424 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: retry 1: original errno = 5 nbdkit: ssh[1]: debug: waiting 2 seconds before retrying nbdkit: ssh[1]: debug: ssh: reopen readonly=0 nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected nbdkit: ssh[1]: debug: ssh: open readonly=0 nbdkit: ssh[1]: debug: opened libssh handle nbdkit: ssh[1]: debug: ssh: open returned handle 0x7f9a34001db0 nbdkit: ssh[1]: debug: ssh: prepare readonly=0 nbdkit: ssh[1]: debug: ssh: get_size nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1178599424 nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1180696576 nbdkit: ssh[1]: debug: ssh: get_size [...] (100.00/100%) nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection nbdkit: ssh[1]: debug: retry: finalize nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: retry: close nbdkit: ssh[1]: debug: reopens needed: 1 nbdkit: ssh[1]: debug: ssh: close nbdkit: debug: ssh: unload plugin nbdkit: debug: retry: unload filter 1.3.2 Check the output file info and compare it with the original file. # qemu-img info out.img image: out.img file format: raw virtual size: 3.63 GiB (3901456384 bytes) disk size: 3.63 GiB # qemu-img compare -f raw -F raw disk1.img out.img Images are identical. Result 1: When use new filter "retry",nbdkit can reopen connections if there's a network timeout or network interruption. Scenario 2 Change parameters default value about retry filter 2.1 # nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /tmp/out1.img' --filter=retry retries=1 retry-readonly=yes retry-delay=15 retry-exponential=no 2.2 When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection. 2.3 Log into the local server and check the nbdkit new connection info. nbdkit: debug: TLS disabled: could not load TLS certificates nbdkit: debug: registering plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so nbdkit: debug: registered plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so (name ssh) nbdkit: debug: ssh: load nbdkit: debug: registering filter /usr/lib64/nbdkit/filters/nbdkit-retry-filter.so nbdkit: debug: registered filter /usr/lib64/nbdkit/filters/nbdkit-retry-filter.so (name retry) nbdkit: debug: retry: load nbdkit: debug: retry: config key=host, value=10.66.87.212 nbdkit: debug: ssh: config key=host, value=10.66.87.212 nbdkit: debug: retry: config key=path, value=/tmp/disk1.img nbdkit: debug: ssh: config key=path, value=/tmp/disk1.img nbdkit: debug: retry: config key=retries, value=1 nbdkit: debug: retry: config key=retry-readonly, value=yes nbdkit: debug: retry: config key=retry-delay, value=15 nbdkit: debug: retry: config key=retry-exponential, value=no nbdkit: debug: retry: config_complete nbdkit: debug: ssh: config_complete nbdkit: debug: using thread model: serialize_requests nbdkit: debug: bound to unix socket /tmp/nbdkitpbbBJP/socket nbdkit: debug: forked into background (new pid = 20271) nbdkit: debug: accepted connection [...] nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=801112064 nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[1]: debug: retry 1: original errno = 5 nbdkit: ssh[1]: debug: waiting 15 seconds before retrying nbdkit: ssh[1]: debug: ssh: reopen readonly=1 nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: ssh: close nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected nbdkit: ssh[1]: debug: ssh: open readonly=1 nbdkit: ssh[1]: debug: opened libssh handle nbdkit: ssh[1]: debug: ssh: open returned handle 0x7f9510001db0 nbdkit: ssh[1]: debug: ssh: prepare readonly=1 [...] (100.00/100%) nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection nbdkit: ssh[1]: debug: retry: finalize nbdkit: ssh[1]: debug: ssh: finalize nbdkit: ssh[1]: debug: retry: close nbdkit: ssh[1]: debug: reopens needed: 1 nbdkit: ssh[1]: debug: ssh: close nbdkit: debug: ssh: unload plugin nbdkit: debug: retry: unload filter Result 2: When change new filter "retry" related parameters' values,nbdkit can reopen connections correctly and the copy completed successfully. Result: As comment3 and comment5 testing,now change the bug from ON_QA to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017