Hide Forgot
Description of problem: Migration with copy-storage-all failed in drive-mirror phase from RHEL7.3 to RHEL7.1 Version-Release number of selected component (if applicable): soure host: libvirt-2.0.0-6.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 target host: libvirt-1.2.8-16.el7_1.5.x86_64 qemu-kvm-rhev-2.1.2-23.el7_1.12.x86_64 How reproducible: 100% Steps to Reproduce: 1. start vm with local image on source host <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/RHEL-7.3-latest.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/> </disk> 2. migrate vm with copy-storage-all flag to target host # virsh migrate vm-mig qemu+ssh://intel-5205-32-1.englab.nay.redhat.com/system --live --verbose --unsafe --copy-storage-all root.nay.redhat.com's password: error: internal error: unable to execute QEMU command 'drive-mirror': Failed to read export length Actual results: Migration failed Expected results: Migration passed Additional info: Addressed libvirtd.log on source host: 2016-08-31 09:21:45.245+0000: 21776: info : qemuMonitorIOProcess:426 : QEMU_MONITOR_IO_PROCESS: mon=0x7fec70024320 buf={"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}}^M len=98 2016-08-31 09:21:45.245+0000: 21776: debug : qemuMonitorJSONIOProcessLine:191 : Line [{"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}}] 2016-08-31 09:21:45.245+0000: 21776: debug : virJSONValueFromString:1604 : string={"id": "libvirt-21", "error": {"class": "GenericError", "desc": "Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21781: debug : virJSONValueToString:1795 : result={"id":"libvirt-21","error":{"class":"GenericError","desc":"Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollCleanupHandles:574 : Cleanup 13 2016-08-31 09:21:45.246+0000: 21781: debug : qemuMonitorJSONCheckError:376 : unable to execute QEMU command {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:intel-5205-32-1.englab.nay.redhat.com:49153:exportname=drive-virtio-disk0","speed":9223372036853727232,"sync":"full","mode":"existing","format":"raw"},"id":"libvirt-21"}: {"id":"libvirt-21","error":{"class":"GenericError","desc":"Failed to read export length"}} 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=0 w=1, f=6 e=1 d=0 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=1 w=2, f=8 e=1 d=0 2016-08-31 09:21:45.246+0000: 21781: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command 'drive-mirror': Failed to read export length 2016-08-31 09:21:45.246+0000: 21776: debug : virEventPollMakePollFDs:401 : Prepare n=2 w=3, f=11 e=1 d=0
Cross migration test results are as follows: Source host Target host Test Restuls RHEL7.0 -> RHEL7.3 PASSED RHEL7.1 -> RHEL7.3 PASSED RHEL7.2 -> RHEL7.3 PASSED RHEL7.3 -> RHEL7.2 PASSED RHEL7.3 -> RHEL7.1 FAILED RHEL7.3 -> RHEL7.0 FAILED RHEL6.8 -> RHEL7.3 FAILED
Test results for migration from RHEL7.3 to RHEL7.0 # virsh migrate vm-mig qemu+ssh://10.66.144.76/system --live --verbose --unsafe --copy-storage-all root.144.76's password: Migration: [100 %]error: Domain not found: no domain with matching name 'vm-mig' Additional info: Addressed vm log on target host: red_dispatcher_loadvm_commands: qemu-kvm: qemu-coroutine-lock.c:147: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed. 2016-09-01 03:01:31.545+0000: shutting down
Test results for migration from RHEL6.8 to RHEL7.3 migrate vm with copy-storage-all flag to target host # virsh migrate vm-mig qemu+ssh://10.66.4.190/system --live --verbose --unsafe --copy-storage-all root.4.190's password: error: Unable to read from monitor: Connection reset by peer Actual results: Migration failed and vm is killed on source host Expected results: Migration passed Additional info: Addressed vm log on target host: 2016-09-01 06:05:33.778+0000: 7482: debug : virCommandHandshakeChild:461 : Handshake with parent is done char device redirected to /dev/pts/3 (label charserial0) RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3 Receiving block device images Completed 0 %^MCompleted 1 %^MCompleted 2 %^MCompleted 3 %^MCompleted 4 %^MCompleted 5 %^MCompleted 6 %^MCompleted 7 %^MCompleted 8 %^MCompleted 9 %^MCompleted 10 %^MCompleted 11 %^MCompleted 12 %^MCompleted 13 %^MCompleted 14 %^MCompleted 15 %^MCompleted 16 %^MCompleted 17 %^MCompleted 18 %^MCompleted 19 %^MCompleted 20 %^MCompleted 21 %^MCompleted 22 %^MCompleted 23 %^MCompleted 24 %^MCompleted 25 %^MCompleted 26 %^MCompleted 27 %^MCompleted 28 %^MCompleted 29 %^MCompleted 30 %^MCompleted 31 %^MCompleted 32 %^MCompleted 33 %^MCompleted 34 %^MCompleted 35 %^MCompleted 36 %^MCompleted 37 %^MCompleted 38 %^MCompleted 39 %^MCompleted 40 %^M2016-09-01T06:05:39.828328Z qemu-kvm: error while loading state section id 1(block) copying E and F segments from pc.bios to pc.ram copying C and D segments from pc.rom to pc.ram 2016-09-01T06:05:39.828927Z qemu-kvm: load of migration failed: Input/output error 2016-09-01 06:05:40.062+0000: shutting down
(In reply to yangyang from comment #1) > Cross migration test results are as follows: > > Source host Target host Test Restuls > RHEL7.0 -> RHEL7.3 PASSED > RHEL7.1 -> RHEL7.3 PASSED > RHEL7.2 -> RHEL7.3 PASSED > RHEL7.3 -> RHEL7.2 PASSED > RHEL7.3 -> RHEL7.1 FAILED > RHEL7.3 -> RHEL7.0 FAILED > RHEL6.8 -> RHEL7.3 FAILED Correct the description. They are the test results for cross migration with copy-storage-all flag.
According to the log output it looks like the error originates from qemu. Reassigning for further analysis.
can you try RHEL7.2->RHEL7.1 please
This can be reproduced outside of libvirt as follows Checkout qemu 2.1.2 and apply the following patch to qemu-nbd diff --git a/qemu-nbd.c b/qemu-nbd.c index 626e584..8537242 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -369,7 +369,7 @@ static void nbd_accept(void *opaque) return; } - if (nbd_client_new(exp, fd, nbd_client_closed)) { + if (nbd_client_new(NULL, fd, nbd_client_closed)) { nb_fds++; } else { shutdown(fd, 2); @@ -698,6 +698,8 @@ int main(int argc, char **argv) exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed); + nbd_export_set_name(exp, "main"); + if (sockpath) { fd = unix_socket_incoming(sockpath); } else { Now run qemu-nbd $ ./qemu-img create -f qcow2 demo.img 1G $ ./qemu-nbd demo.img -p 9000 On a second host, checking QEMU 2.6.0 and run qemu-io $ ./qemu-io --image-opts driver=nbd,host=domokun.gsslab.fab.redhat.com,port=9000,export=main -c "readv 0 1M" can't open: Failed to read export length no file open, try 'help open' The problem is that the QEMU NBD server has fubar handling of NBD options when there is more than one option sent by the client, causing it to totally scramble its protocol parsing. This bug was fixed in QEMU 2.3.0 in this commit commit 9c122adadbf4377eb77195b3944be10a59d9484f Author: Max Reitz <mreitz> Date: Wed Feb 25 13:08:31 2015 -0500 nbd: Fix nbd_receive_options() The client flags are sent exactly once overall, not once per option. Signed-off-by: Max Reitz <mreitz> Message-Id: <1424887718-10800-19-git-send-email-mreitz> Signed-off-by: Paolo Bonzini <pbonzini> The problem was previously invisible because the NBD client only ever sent a single NBD option (NBD_OPT_EXPORTNAME). As of QEMU 2.6.0, we are now sending multiple options (first NBD_OPT_LIST and then NBD_OPT_EXPORTNAME) when we connect and so hit the NBD server bug in old QEMU. This behaviour was introduced in commit 9344e5f554690d5e379b5426daebadef7c87baf5 Author: Daniel P. Berrange <berrange> Date: Wed Feb 10 18:41:09 2016 +0000 nbd: always query export list in fixed new style protocol With the new style protocol, the NBD client will currenetly send NBD_OPT_EXPORT_NAME as the first (and indeed only) option it wants. The problem is that the NBD protocol spec does not allow for returning an error message with the NBD_OPT_EXPORT_NAME option. So if the server mandates use of TLS, the client will simply see an immediate connection close after issuing NBD_OPT_EXPORT_NAME which is not user friendly. To improve this situation, if we have the fixed new style protocol, we can sent NBD_OPT_LIST as the first option to query the list of server exports. We can check for our named export in this list and raise an error if it is not found, instead of going ahead and sending NBD_OPT_EXPORT_NAME with a name that we know will be rejected. This improves the error reporting both in the case that the server required TLS, and in the case that the client requested export name does not exist on the server. If the server does not support NBD_OPT_LIST, we just ignore that and carry on with NBD_OPT_EXPORT_NAME as before. Signed-off-by: Daniel P. Berrange <berrange> Message-Id: <1455129674-17255-12-git-send-email-berrange> Signed-off-by: Paolo Bonzini <pbonzini> Reverting that change will fix the problem, at the cost of being unable to report a clear error to the user when the server mandates use of TLS eg, this change fixes it diff --git a/nbd/client.c b/nbd/client.c index 48f2a21..91cffce 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -503,7 +503,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags, TRACE("Using default NBD export name \"\""); name = ""; } - if (fixedNewStyle) { + if (fixedNewStyle && 0) { /* Check our desired export is present in the * server export list. Since NBD_OPT_EXPORT_NAME * cannot return an error message, running this
Another related patch is: commit 156f6a10c21c3501aa3938badf5c3f1339c509a2 Author: Eric Blake <eblake> Date: Wed Apr 6 16:48:38 2016 -0600 nbd: Don't kill server when client requests unknown option nbd-server.c currently fails to handle unsupported options properly. If during option haggling the client sends an unknown request, the server kills the connection instead of letting the client try to fall back to something older. This is precisely what advertising NBD_FLAG_FIXED_NEWSTYLE was supposed to fix. Signed-off-by: Eric Blake <eblake> Message-Id: <1459982918-32229-1-git-send-email-eblake> Signed-off-by: Paolo Bonzini <pbonzini>
(In reply to Eric Blake from comment #8) > Another related patch is: > > commit 156f6a10c21c3501aa3938badf5c3f1339c509a2 > Author: Eric Blake <eblake> > Date: Wed Apr 6 16:48:38 2016 -0600 > > nbd: Don't kill server when client requests unknown option Fortunately I don't think we're hitting that scenario - the old versions *do* understand the options we're sending, they just can't process multiple options in a row correctly - they keep returning to earlier phase of the protocol :-(
(In reply to Daniel Berrange from comment #9) > (In reply to Eric Blake from comment #8) > > Another related patch is: > > > > commit 156f6a10c21c3501aa3938badf5c3f1339c509a2 > > Author: Eric Blake <eblake> > > Date: Wed Apr 6 16:48:38 2016 -0600 > > > > nbd: Don't kill server when client requests unknown option > > Fortunately I don't think we're hitting that scenario - the old versions > *do* understand the options we're sending, they just can't process multiple > options in a row correctly - they keep returning to earlier phase of the > protocol :-( But we WILL be hitting that scenario with 7.4 -> 7.1 migration; since 7.4 will be using NBD_OPT_GO (instead of NBD_OPT_EXPORTNAME) as its preferred first command, at least if I get my way at getting upstream patches sorted out (posted since 2.6, but waiting for 2.8 to land).
The 6.8->7.3 case is separate; that's now https://bugzilla.redhat.com/show_bug.cgi?id=1376053