| Summary: | Migration with copy-storage-all failed in drive-mirror phase from RHEL7.3 to RHEL7.1 | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yang Yang <yanyang> | |
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> | |
| Status: | CLOSED WONTFIX | QA Contact: | Qianqian Zhu <qizhu> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.3 | CC: | berrange, chayang, dyuan, eblake, fjin, hhuang, huding, juzhang, knoel, michal.skrivanek, pkrempa, rbalakri, virt-maint, yafu, yanyang | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1376053 (view as bug list) | Environment: | ||
| Last Closed: | 2016-09-23 19:03:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
Cross migration test results are as follows: Source host Target host Test Restuls RHEL7.0 -> RHEL7.3 PASSED RHEL7.1 -> RHEL7.3 PASSED RHEL7.2 -> RHEL7.3 PASSED RHEL7.3 -> RHEL7.2 PASSED RHEL7.3 -> RHEL7.1 FAILED RHEL7.3 -> RHEL7.0 FAILED RHEL6.8 -> RHEL7.3 FAILED Test results for migration from RHEL7.3 to RHEL7.0 # virsh migrate vm-mig qemu+ssh://10.66.144.76/system --live --verbose --unsafe --copy-storage-all root.144.76's password: Migration: [100 %]error: Domain not found: no domain with matching name 'vm-mig' Additional info: Addressed vm log on target host: red_dispatcher_loadvm_commands: qemu-kvm: qemu-coroutine-lock.c:147: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed. 2016-09-01 03:01:31.545+0000: shutting down Test results for migration from RHEL6.8 to RHEL7.3 migrate vm with copy-storage-all flag to target host # virsh migrate vm-mig qemu+ssh://10.66.4.190/system --live --verbose --unsafe --copy-storage-all root.4.190's password: error: Unable to read from monitor: Connection reset by peer Actual results: Migration failed and vm is killed on source host Expected results: Migration passed Additional info: Addressed vm log on target host: 2016-09-01 06:05:33.778+0000: 7482: debug : virCommandHandshakeChild:461 : Handshake with parent is done char device redirected to /dev/pts/3 (label charserial0) RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3 Receiving block device images Completed 0 %^MCompleted 1 %^MCompleted 2 %^MCompleted 3 %^MCompleted 4 %^MCompleted 5 %^MCompleted 6 %^MCompleted 7 %^MCompleted 8 %^MCompleted 9 %^MCompleted 10 %^MCompleted 11 %^MCompleted 12 %^MCompleted 13 %^MCompleted 14 %^MCompleted 15 %^MCompleted 16 %^MCompleted 17 %^MCompleted 18 %^MCompleted 19 %^MCompleted 20 %^MCompleted 21 %^MCompleted 22 %^MCompleted 23 %^MCompleted 24 %^MCompleted 25 %^MCompleted 26 %^MCompleted 27 %^MCompleted 28 %^MCompleted 29 %^MCompleted 30 %^MCompleted 31 %^MCompleted 32 %^MCompleted 33 %^MCompleted 34 %^MCompleted 35 %^MCompleted 36 %^MCompleted 37 %^MCompleted 38 %^MCompleted 39 %^MCompleted 40 %^M2016-09-01T06:05:39.828328Z qemu-kvm: error while loading state section id 1(block) copying E and F segments from pc.bios to pc.ram copying C and D segments from pc.rom to pc.ram 2016-09-01T06:05:39.828927Z qemu-kvm: load of migration failed: Input/output error 2016-09-01 06:05:40.062+0000: shutting down (In reply to yangyang from comment #1) > Cross migration test results are as follows: > > Source host Target host Test Restuls > RHEL7.0 -> RHEL7.3 PASSED > RHEL7.1 -> RHEL7.3 PASSED > RHEL7.2 -> RHEL7.3 PASSED > RHEL7.3 -> RHEL7.2 PASSED > RHEL7.3 -> RHEL7.1 FAILED > RHEL7.3 -> RHEL7.0 FAILED > RHEL6.8 -> RHEL7.3 FAILED Correct the description. They are the test results for cross migration with copy-storage-all flag. According to the log output it looks like the error originates from qemu. Reassigning for further analysis. can you try RHEL7.2->RHEL7.1 please This can be reproduced outside of libvirt as follows
Checkout qemu 2.1.2 and apply the following patch to qemu-nbd
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 626e584..8537242 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -369,7 +369,7 @@ static void nbd_accept(void *opaque)
return;
}
- if (nbd_client_new(exp, fd, nbd_client_closed)) {
+ if (nbd_client_new(NULL, fd, nbd_client_closed)) {
nb_fds++;
} else {
shutdown(fd, 2);
@@ -698,6 +698,8 @@ int main(int argc, char **argv)
exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed);
+ nbd_export_set_name(exp, "main");
+
if (sockpath) {
fd = unix_socket_incoming(sockpath);
} else {
Now run qemu-nbd
$ ./qemu-img create -f qcow2 demo.img 1G
$ ./qemu-nbd demo.img -p 9000
On a second host, checking QEMU 2.6.0 and run qemu-io
$ ./qemu-io --image-opts driver=nbd,host=domokun.gsslab.fab.redhat.com,port=9000,export=main -c "readv 0 1M"
can't open: Failed to read export length
no file open, try 'help open'
The problem is that the QEMU NBD server has fubar handling of NBD options when there is more than one option sent by the client, causing it to totally scramble its protocol parsing.
This bug was fixed in QEMU 2.3.0 in this commit
commit 9c122adadbf4377eb77195b3944be10a59d9484f
Author: Max Reitz <mreitz>
Date: Wed Feb 25 13:08:31 2015 -0500
nbd: Fix nbd_receive_options()
The client flags are sent exactly once overall, not once per option.
Signed-off-by: Max Reitz <mreitz>
Message-Id: <1424887718-10800-19-git-send-email-mreitz>
Signed-off-by: Paolo Bonzini <pbonzini>
The problem was previously invisible because the NBD client only ever sent a single NBD option (NBD_OPT_EXPORTNAME). As of QEMU 2.6.0, we are now sending multiple options (first NBD_OPT_LIST and then NBD_OPT_EXPORTNAME) when we connect and so hit the NBD server bug in old QEMU.
This behaviour was introduced in
commit 9344e5f554690d5e379b5426daebadef7c87baf5
Author: Daniel P. Berrange <berrange>
Date: Wed Feb 10 18:41:09 2016 +0000
nbd: always query export list in fixed new style protocol
With the new style protocol, the NBD client will currenetly
send NBD_OPT_EXPORT_NAME as the first (and indeed only)
option it wants. The problem is that the NBD protocol spec
does not allow for returning an error message with the
NBD_OPT_EXPORT_NAME option. So if the server mandates use
of TLS, the client will simply see an immediate connection
close after issuing NBD_OPT_EXPORT_NAME which is not user
friendly.
To improve this situation, if we have the fixed new style
protocol, we can sent NBD_OPT_LIST as the first option
to query the list of server exports. We can check for our
named export in this list and raise an error if it is not
found, instead of going ahead and sending NBD_OPT_EXPORT_NAME
with a name that we know will be rejected.
This improves the error reporting both in the case that the
server required TLS, and in the case that the client requested
export name does not exist on the server.
If the server does not support NBD_OPT_LIST, we just ignore
that and carry on with NBD_OPT_EXPORT_NAME as before.
Signed-off-by: Daniel P. Berrange <berrange>
Message-Id: <1455129674-17255-12-git-send-email-berrange>
Signed-off-by: Paolo Bonzini <pbonzini>
Reverting that change will fix the problem, at the cost of being unable to report a clear error to the user when the server mandates use of TLS
eg, this change fixes it
diff --git a/nbd/client.c b/nbd/client.c
index 48f2a21..91cffce 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -503,7 +503,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
TRACE("Using default NBD export name \"\"");
name = "";
}
- if (fixedNewStyle) {
+ if (fixedNewStyle && 0) {
/* Check our desired export is present in the
* server export list. Since NBD_OPT_EXPORT_NAME
* cannot return an error message, running this
Another related patch is:
commit 156f6a10c21c3501aa3938badf5c3f1339c509a2
Author: Eric Blake <eblake>
Date: Wed Apr 6 16:48:38 2016 -0600
nbd: Don't kill server when client requests unknown option
nbd-server.c currently fails to handle unsupported options properly.
If during option haggling the client sends an unknown request, the
server kills the connection instead of letting the client try to
fall back to something older. This is precisely what advertising
NBD_FLAG_FIXED_NEWSTYLE was supposed to fix.
Signed-off-by: Eric Blake <eblake>
Message-Id: <1459982918-32229-1-git-send-email-eblake>
Signed-off-by: Paolo Bonzini <pbonzini>
(In reply to Eric Blake from comment #8) > Another related patch is: > > commit 156f6a10c21c3501aa3938badf5c3f1339c509a2 > Author: Eric Blake <eblake> > Date: Wed Apr 6 16:48:38 2016 -0600 > > nbd: Don't kill server when client requests unknown option Fortunately I don't think we're hitting that scenario - the old versions *do* understand the options we're sending, they just can't process multiple options in a row correctly - they keep returning to earlier phase of the protocol :-( (In reply to Daniel Berrange from comment #9) > (In reply to Eric Blake from comment #8) > > Another related patch is: > > > > commit 156f6a10c21c3501aa3938badf5c3f1339c509a2 > > Author: Eric Blake <eblake> > > Date: Wed Apr 6 16:48:38 2016 -0600 > > > > nbd: Don't kill server when client requests unknown option > > Fortunately I don't think we're hitting that scenario - the old versions > *do* understand the options we're sending, they just can't process multiple > options in a row correctly - they keep returning to earlier phase of the > protocol :-( But we WILL be hitting that scenario with 7.4 -> 7.1 migration; since 7.4 will be using NBD_OPT_GO (instead of NBD_OPT_EXPORTNAME) as its preferred first command, at least if I get my way at getting upstream patches sorted out (posted since 2.6, but waiting for 2.8 to land). The 6.8->7.3 case is separate; that's now https://bugzilla.redhat.com/show_bug.cgi?id=1376053 |
Description of problem: Migration with copy-storage-all failed in drive-mirror phase from RHEL7.3 to RHEL7.1 Version-Release number of selected component (if applicable): soure host: libvirt-2.0.0-6.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 target host: libvirt-1.2.8-16.el7_1.5.x86_64 qemu-kvm-rhev-2.1.2-23.el7_1.12.x86_64 How reproducible: 100% Steps to Reproduce: 1. start vm with local image on source host <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/RHEL-7.3-latest.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/> </disk> 2. migrate vm with copy-storage-all flag to target host # virsh migrate vm-mig qemu+ssh://intel-5205-32-1.englab.nay.redhat.com/system --live --verbose --unsafe --copy-storage-all root.nay.redhat.com's password: error: internal error: unable to execute QEMU command 'drive-mirror': Failed to read export length Actual results: Migration failed Expected results: Migration passed Additional info: Addressed libvirtd.log on source host: 2016-08-31 09:21:45.245+0000: 21776: info : qemuMonitorIOProcess:426 : QEMU_MONITOR_IO_PROCESS: mon=0x7fec70024320 buf={"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}}^M len=98 2016-08-31 09:21:45.245+0000: 21776: debug : qemuMonitorJSONIOProcessLine:191 : Line [{"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}}] 2016-08-31 09:21:45.245+0000: 21776: debug : virJSONValueFromString:1604 : string={"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21781: debug : virJSONValueToString:1795 : result={"id":"libvirt-21","error":{"class":"GenericError","desc":"Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollCleanupHandles:574 : Cleanup 13 2016-08-31 09:21:45.246+0000: 21781: debug : qemuMonitorJSONCheckError:376 : unable to execute QEMU command {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:intel-5205-32-1.englab.nay.redhat.com:49153:exportname=drive-virtio-disk0","speed":9223372036853727232,"sync":"full","mode":"existing","format":"raw"},"id":"libvirt-21"}: {"id":"libvirt-21","error":{"class":"GenericError","desc":"Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=0 w=1, f=6 e=1 d=0 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=1 w=2, f=8 e=1 d=0 2016-08-31 09:21:45.246+0000: 21781: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command 'drive-mirror': Failed to read export length 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=2 w=3, f=11 e=1 d=0