Bug 1754897 - Backport nbdkit retry filter to RHEL AV
Summary: Backport nbdkit retry filter to RHEL AV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: nbdkit
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.1
Assignee: Richard W.M. Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1756307
Blocks: 1754900
TreeView+ depends on / blocked
 
Reported: 2019-09-24 10:36 UTC by Richard W.M. Jones
Modified: 2020-05-05 09:50 UTC (History)
7 users (show)

Fixed In Version: nbdkit-1.16.2-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:49:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:50:27 UTC

Description Richard W.M. Jones 2019-09-24 10:36:04 UTC
Description of problem:

Upstream we have added a new filter to nbdkit called "retry" which
transparently reopens connections if there's a network timeout
or network interruption.  We should backport this filter to
nbdkit in RHEL AV.

Steps to Reproduce:

The upstream test suite contains an automated test.

One way to test this by hand is to create a large file on an SSH
server, and copy it to your local machine:

  nbdkit -f -v -U - ssh host=server /var/tmp/test.img \
         --run 'qemu-img convert -p -f raw $nbd /var/tmp/out.img'

While this is copying, log into the remote SSH server and look for
the "sshd" process associated with the nbdkit connection, and kill it.

If the patch is working then the nbdkit process should recover after
a few seconds, and the copy should complete successfully.

Upstream commits:

https://github.com/libguestfs/nbdkit/commit/fb72fcd1d8dff02a02e4c473ff09c813bef09b65
https://github.com/libguestfs/nbdkit/commit/f0f0ec497d36ba7753fdfd0bd322507613d1e276
https://github.com/libguestfs/nbdkit/commit/75434982a1a80f29652b7bc72156084a16ef6fc9

Comment 1 Richard W.M. Jones 2019-11-26 10:13:33 UTC
This has been done by the rebase of nbdkit to 1.16, bug 1756307.

Comment 3 liuzi 2020-02-05 09:57:11 UTC
Test the bug with builds:
nbdkit-1.16.2-1.module+el8.2.0+5579+d71178e0.x86_64

Steps:
1.Prepare a large file on remote rhel7.8 server,
#qemu-img info disk1.img 
image: disk1.img
file format: raw
virtual size: 3.6G (3901456384 bytes)
disk size: 3.6G
 
2.Use nbdkit command to copy it to a local rhel8.2 server:
#nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f qcow2 $nbd /var/tmp/out.img'

3.When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection.
# ps -ef|grep sshd
root     14797     1  0 09:53 ?        00:00:02 sshd: root@pts/2
root     19871     1  0 17:01 ?        00:00:00 /usr/sbin/sshd -D
root     20167 19871 10 17:38 ?        00:00:00 sshd: root@notty
root     20188 14806  0 17:38 pts/2    00:00:00 grep --color=auto sshd
# kill -9 20167

4.Log into the local server and find the nbdkit process failed.
# nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /var/tmp/out.img'
nbdkit: debug: TLS disabled: could not load TLS certificates
nbdkit: debug: registering plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so
nbdkit: debug: registered plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so (name ssh)
nbdkit: debug: ssh: load
nbdkit: debug: ssh: config key=host, value=10.66.87.212
nbdkit: debug: ssh: config key=path, value=/tmp/disk1.img
nbdkit: debug: ssh: config_complete
nbdkit: debug: using thread model: serialize_requests
nbdkit: debug: bound to unix socket /tmp/nbdkitmZKDjp/socket
nbdkit: debug: forked into background (new pid = 28605)
nbdkit: debug: accepted connection
nbdkit: ssh[1]: debug: newstyle negotiation: flags: global 0x3
nbdkit: ssh[1]: debug: newstyle negotiation: client flags: 0x3
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_STRUCTURED_REPLY: client requested structured replies
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: client requested export ''
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: set count: 1
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: set base:allocation
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: replying with base:allocation id 1
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_SET_META_CONTEXT: reply complete
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_GO: client requested export ''
nbdkit: ssh[1]: debug: ssh: open readonly=0
nbdkit: ssh[1]: debug: opened libssh handle
nbdkit: ssh[1]: debug: ssh: open returned handle 0x7fc0e4001d70
nbdkit: ssh[1]: debug: ssh: prepare readonly=0
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: can_write
nbdkit: ssh[1]: debug: ssh: can_zero
nbdkit: ssh[1]: debug: ssh: can_write
nbdkit: ssh[1]: debug: ssh: can_fast_zero
nbdkit: ssh[1]: debug: ssh: can_zero
nbdkit: ssh[1]: debug: ssh: can_trim
nbdkit: ssh[1]: debug: ssh: can_write
nbdkit: ssh[1]: debug: ssh: can_fua
nbdkit: ssh[1]: debug: ssh: can_write
nbdkit: ssh[1]: debug: ssh: can_flush
nbdkit: ssh[1]: debug: ssh: is_rotational
nbdkit: ssh[1]: debug: ssh: can_multi_conn
nbdkit: ssh[1]: debug: ssh: can_cache
nbdkit: ssh[1]: debug: ssh: can_extents
nbdkit: ssh[1]: debug: newstyle negotiation: flags: export 0x8cd
nbdkit: ssh[1]: debug: newstyle negotiation: NBD_OPT_GO: ignoring NBD_INFO_* request 3 (NBD_INFO_BLOCK_SIZE)
nbdkit: ssh[1]: debug: handshake complete, processing requests serially
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: extents count=2147483136 offset=0 req_one=1
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: extents count=1753973248 offset=2147483136 req_one=1
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: extents count=2147483136 offset=0 req_one=1
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=0
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=2097152
[...]
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1935671296
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1937768448
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
qemu-img: error while reading sector 3780608: Input/output error
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1939865600
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1941962752
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]qemu-img: error while reading sector 3784704: Input/output error
: debug: ssh: pread count=2097152 offset=1944059904
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1946157056
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: sshqemu-img: error while reading sector 3788800: Input/output error
[1]: debug: ssh: pread count=2097152 offset=1948254208
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1950351360
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: sending error reply: Input/output error
qemu-img: error while reading sector 3792896: Input/output error
qemu-img: error while reading sector 3796992: Input/output error
qemu-img: error while reading sector 3801088: Input/output error
qemu-img: error while reading sector 3805184: Input/output error
qemu-img: error while reading sector 3809280: Input/output error

nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected
nbdkit: debug: ssh: unload plugin

Hi,Rjones,
1.As above testing,I think the result is not expected.Could you help to review my test steps?
2.when I want to create an NBD connection over SSH,Should I setup Passwordless SSH Login in advance?

Comment 4 Richard W.M. Jones 2020-02-05 13:18:27 UTC
The retry action isn't automatic, you have to enable the retry filter.

Just modify your nbdkit command line by adding:
  --filter=retry

Although actually the test you have done in comment 3 was still worthwhile
because you've demonstrated that the connection does break without the filter.

Comment 5 liuzi 2020-02-06 06:48:45 UTC
Thanks for your kindly reply.

Verify the bug with build:
nbdkit-1.16.2-1.module+el8.2.0+5579+d71178e0.x86_64

Steps:
1.Prepare a large file on remote rhel7.8 server,
#qemu-img info disk1.img 
image: disk1.img
file format: raw
virtual size: 3.6G (3901456384 bytes)
disk size: 3.6G
 
2.Check nbdkit new filter's man page and use nbdkit command to copy it to a local rhel8.2 server and set filter=retry:
# man nbdkit-retry-filter

Scenario1 Keep all paraments about "retry" as default.

1.1 # nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /tmp/out.img' --filter=retry

1.2 When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection.
# ps -ef|grep sshd
root     19871     1  0 Feb05 ?        00:00:00 /usr/sbin/sshd -D
root     27693 19871  0 10:17 ?        00:00:00 sshd: root@pts/0
root     27903 19871  9 10:38 ?        00:00:01 sshd: root@notty
root     27928 27704  0 10:38 pts/0    00:00:00 grep --color=auto sshd
# kill -9 27903

1.3.Log into the local server and check the result.
1.3.1 THe nbdkit process recovered after a few seconds, and the copy completed successfully.
[...]
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1178599424
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: retry 1: original errno = 5
nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
nbdkit: ssh[1]: debug: ssh: reopen readonly=0
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected
nbdkit: ssh[1]: debug: ssh: open readonly=0
nbdkit: ssh[1]: debug: opened libssh handle
nbdkit: ssh[1]: debug: ssh: open returned handle 0x7f9a34001db0
nbdkit: ssh[1]: debug: ssh: prepare readonly=0
nbdkit: ssh[1]: debug: ssh: get_size
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=1178599424
nbdkit: ssh[1]: debug: retry: pread count=2097152 offset=1180696576
nbdkit: ssh[1]: debug: ssh: get_size
[...]
(100.00/100%)
nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection
nbdkit: ssh[1]: debug: retry: finalize
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: retry: close
nbdkit: ssh[1]: debug: reopens needed: 1
nbdkit: ssh[1]: debug: ssh: close
nbdkit: debug: ssh: unload plugin
nbdkit: debug: retry: unload filter

1.3.2 Check the output file info and compare it with the original file.
# qemu-img info out.img
image: out.img
file format: raw
virtual size: 3.63 GiB (3901456384 bytes)
disk size: 3.63 GiB

# qemu-img compare -f raw -F raw disk1.img out.img 
Images are identical.

Result 1:
When use new filter "retry",nbdkit can reopen connections if there's a network timeout
or network interruption.

Scenario 2 Change parameters default value about retry filter
2.1 #  nbdkit -f -v -U - ssh host=10.66.87.212 /tmp/disk1.img --run 'qemu-img convert -p -f raw $nbd /tmp/out1.img' --filter=retry retries=1  retry-readonly=yes  retry-delay=15 retry-exponential=no

2.2 When the local server start to copy, log into the remote SSH server and kill the sshd process which associated with the nbdkit connection.

2.3 Log into the local server and check the nbdkit new connection info.
nbdkit: debug: TLS disabled: could not load TLS certificates
nbdkit: debug: registering plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so
nbdkit: debug: registered plugin /usr/lib64/nbdkit/plugins/nbdkit-ssh-plugin.so (name ssh)
nbdkit: debug: ssh: load
nbdkit: debug: registering filter /usr/lib64/nbdkit/filters/nbdkit-retry-filter.so
nbdkit: debug: registered filter /usr/lib64/nbdkit/filters/nbdkit-retry-filter.so (name retry)
nbdkit: debug: retry: load
nbdkit: debug: retry: config key=host, value=10.66.87.212
nbdkit: debug: ssh: config key=host, value=10.66.87.212
nbdkit: debug: retry: config key=path, value=/tmp/disk1.img
nbdkit: debug: ssh: config key=path, value=/tmp/disk1.img
nbdkit: debug: retry: config key=retries, value=1
nbdkit: debug: retry: config key=retry-readonly, value=yes
nbdkit: debug: retry: config key=retry-delay, value=15
nbdkit: debug: retry: config key=retry-exponential, value=no
nbdkit: debug: retry: config_complete
nbdkit: debug: ssh: config_complete
nbdkit: debug: using thread model: serialize_requests
nbdkit: debug: bound to unix socket /tmp/nbdkitpbbBJP/socket
nbdkit: debug: forked into background (new pid = 20271)
nbdkit: debug: accepted connection
[...]
nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=801112064
nbdkit: ssh[1]: error: read failed: Socket error: disconnected (-1)
nbdkit: ssh[1]: debug: retry 1: original errno = 5
nbdkit: ssh[1]: debug: waiting 15 seconds before retrying
nbdkit: ssh[1]: debug: ssh: reopen readonly=1
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: ssh: close
nbdkit: ssh[1]: error: cannot close file: Socket error: disconnected
nbdkit: ssh[1]: debug: ssh: open readonly=1
nbdkit: ssh[1]: debug: opened libssh handle
nbdkit: ssh[1]: debug: ssh: open returned handle 0x7f9510001db0
nbdkit: ssh[1]: debug: ssh: prepare readonly=1
[...]
(100.00/100%)
nbdkit: ssh[1]: debug: client sent NBD_CMD_DISC, closing connection
nbdkit: ssh[1]: debug: retry: finalize
nbdkit: ssh[1]: debug: ssh: finalize
nbdkit: ssh[1]: debug: retry: close
nbdkit: ssh[1]: debug: reopens needed: 1
nbdkit: ssh[1]: debug: ssh: close
nbdkit: debug: ssh: unload plugin
nbdkit: debug: retry: unload filter

Result 2:
When change new filter "retry" related parameters' values,nbdkit can reopen connections correctly and the copy completed successfully.

Result:
As comment3 and comment5 testing,now change the bug from ON_QA to VERIFIED.

Comment 7 errata-xmlrpc 2020-05-05 09:49:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.