Bug 1692018

Summary: qemu-img: Protocol error: simple reply when structured reply chunk was expected
Product: Red Hat Enterprise Linux 7 Reporter: Richard W.M. Jones <rjones>
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: Tingting Mao <timao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.7CC: eblake, jinzhao, jsnow, juzhang, ngu, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-29.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-22 09:20:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2019-03-23 11:40:56 UTC
Description of problem:

qemu-img convert fails with nbdkit 1.11 because of recent
support for NBD Structured Replies which was added to nbdkit.

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.12.0-25.el7.x86_64
nbdkit 1.11.9

How reproducible:

100%

Steps to Reproduce:

Using nbdkit 1.11.9 compiled from source, in the nbdkit source
directory, do:

$ ./nbdkit memory size=64M --run 'qemu-img convert $nbd /var/tmp/out'
nbdkit: memory.0: error: invalid request: unknown command (7) ignored
qemu-img: Protocol error: simple reply when structured reply chunk was expected

Additional info:

It's a bug in qemu which was fixed a long time ago.  I bisected
the fix to:

89aa0d87634e2cb98517509dc8bdb876f26ecf8b is the first bad commit
commit 89aa0d87634e2cb98517509dc8bdb876f26ecf8b
Author: Vladimir Sementsov-Ogievskiy <vsementsov>
Date:   Fri Apr 27 17:20:01 2018 +0300

    nbd/client: fix nbd_negotiate_simple_meta_context
    
    Initialize received variable. Otherwise, is is possible for server to
    answer without any contexts, but we will set context_id to something
    random (received_id is not initialized too) and return 1, which is
    wrong.
    
    To solve it, just initialize received to false. Initialize received_id
    too, just to make all possible checkers happy.
    
    Bug was introduced in 78a33ab58782efdb206de14 "nbd: BLOCK_STATUS for
    standard get_block_status function: client part" with the whole
    function.
    
    Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov>
    Message-Id: <20180427142002.21930-2-vsementsov>
    Reviewed-by: Eric Blake <eblake>
    CC: qemu-stable
    Signed-off-by: Eric Blake <eblake>

:040000 040000 9993feb118af1a9a59dbc8fe92015a324e93e557 14db90d621d6b7f1ee5ff97a0ca2cb92f6f2f7e9 M	nbd


The fix is an obvious two-liner, just initializing some
variables which were accidentally left uninitialized.

Comment 2 Richard W.M. Jones 2019-03-23 11:43:20 UTC
Upstream information:
https://www.redhat.com/archives/libguestfs/2019-March/msg00115.html

Comment 3 Eric Blake 2019-03-23 14:36:08 UTC
If we backport that fix, we also want to backport the first hunk of:
https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg06496.html
"nbd: Permit simple error to NBD_CMD_BLOCK_STATUS"

to ensure that a server sending a simple error reply does not kill the connection. (The second hunk of that email is only applicable to code added after 3.2)

Comment 4 Tingting Mao 2019-03-25 06:54:28 UTC
Reproduce this issue like below:

Tested with:
qemu-kvm-rhev-2.12.0-25.el7
kernel-3.10.0-1014.el7

Steps:

1. Clone the upstream source code
# git clone https://github.com/libguestfs/nbdkit.git

2. Compile the source code
# autoreconf -i
# ./configure
# make
# make check
......
============================================================================
Testsuite summary for nbdkit 1.11.9
============================================================================
# TOTAL: 57
# PASS:  45
# SKIP:  7
# XFAIL: 0
# FAIL:  5
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
============================================================================
make[3]: *** [test-suite.log] Error 1
make[3]: Leaving directory `/home/nbdkit/nbdkit/tests'
make[2]: *** [check-TESTS] Error 2
make[2]: Leaving directory `/home/nbdkit/nbdkit/tests'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/home/nbdkit/nbdkit/tests'
make: *** [check-recursive] Error 1

3. Convert images over nbd
# ./nbdkit memory size=64M --run 'qemu-img convert $nbd /var/tmp/out'
nbdkit: memory.1: error: invalid request: unknown command (7) ignored
qemu-img: Protocol error: simple reply when structured reply chunk was expected

Comment 7 John Snow 2019-03-25 16:37:35 UTC
Hi, I currently anticipate that the needed fixes here will be included as part of RHBZ #1691563.

As for the hunk mentioned by Eric, it's still pending upstream though I do intend to eventually backport it as part of an NBD roundup. I'll leave this bug open until then.

Comment 8 Eric Blake 2019-03-25 17:26:42 UTC
(In reply to Eric Blake from comment #3)
> If we backport that fix, we also want to backport the first hunk of:
> https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg06496.html
> "nbd: Permit simple error to NBD_CMD_BLOCK_STATUS"
> 
> to ensure that a server sending a simple error reply does not kill the
> connection. (The second hunk of that email is only applicable to code added
> after 3.2)

3.2 doesn't exist. The second half is applicable to builds that backport commit 7f86068d (not part of 3.1, will be part of 4.0 but so will the fix). At any rate, I'm posting a v2 to upstream that splits the fix into two commits (where it is more obvious which patch(es) to backport based on which commit id they fix).

Comment 10 John Snow 2019-05-01 21:26:38 UTC
Backports are ready, but there are some build infrastructure issues that are holding this up.

Comment 12 Miroslav Rezanina 2019-05-17 15:47:33 UTC
Fix included in qemu-kvm-rhev-2.12.0-29.el7

Comment 14 Tingting Mao 2019-05-21 07:19:44 UTC
Tried to verify this issue like below:


Tested with:
qemu-kvm-rhev-2.12.0-29.el7
kernel-3.10.0-1048.el7


Steps:
1. Clone the upstream source code
# git clone https://github.com/libguestfs/nbdkit.git

2. Compile the source code
# autoreconf -i
# ./configure
# make
# make check
......
============================================================================
Testsuite summary for nbdkit 1.13.3
============================================================================
# TOTAL: 66
# PASS:  52
# SKIP:  13
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
============================================================================
make[3]: *** [test-suite.log] Error 1
make[3]: Leaving directory `/home/test/nbdkit/tests'
make[2]: *** [check-TESTS] Error 2
make[2]: Leaving directory `/home/test/nbdkit/tests'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/home/test/nbdkit/tests'
make: *** [check-recursive] Error 1

3. Convert images over nbd
# ./nbdkit memory size=64M --run 'qemu-img convert $nbd /var/tmp/out'
# echo $?
0


Result:
No errors any more.

Comment 15 Richard W.M. Jones 2019-05-21 07:25:47 UTC
I don't know what the nbdkit test failure is because you cut out the test failures, but
the command in step 3 looks fine.

Comment 16 Tingting Mao 2019-05-22 07:37:15 UTC
After regression test for nbd, there is no new bug. Below link is my JIRA issue:

https://projects.engineering.redhat.com/browse/XKVMSEVEN-726.

Comment 17 Tingting Mao 2019-05-22 07:38:45 UTC
According to comment 14 and comment 16, set this bug as verified. Thanks.

Comment 25 errata-xmlrpc 2019-08-22 09:20:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553