Bug 1018695
Summary: | qemu live migration port conflicts with other users of ephemeral port(s) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Stefan Hajnoczi <stefanha> | |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | urgent | |||
Version: | 6.5 | CC: | aavati, areis, barumuga, berrange, cdhouch, chorn, clalancette, dani-rh, dshetty, dyuan, gianluca.cecchi, herrold, itamar, jforbes, jherrman, juzhang, kkeithle, laine, libvirt-maint, mzhan, rbalakri, rhodain, s.kieske, veillard, virt-maint, ydu, zpeng | |
Target Milestone: | rc | Keywords: | Upstream, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-0.10.2-37.el6 | Doc Type: | Bug Fix | |
Doc Text: |
Prior to this update, migrating a virtual machine failed when the libvirtd service used a transmission control protocol (TCP) port that was already in use. Now, it is possible to predefine a custom migration TCP port range in case the default port is in use. In addition, libvirtd now ensures that the port it chooses from the custom range is not used by another process.
|
Story Points: | --- | |
Clone Of: | 1018530 | |||
: | 1019237 1340368 1340479 (view as bug list) | Environment: | ||
Last Closed: | 2014-10-14 04:17:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1018178, 1018383, 1045196, 1061468, 1340368, 1340479 |
Description
Stefan Hajnoczi
2013-10-14 08:11:09 UTC
QEMU is not choosing the port number. It is libvirt who builds the QEMU command-line, including the -incoming tcp:[::]:49152 option: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_migration.c;h=38edadb9742dd787f1cc58008f45ae +1da6c032ac;hb=HEAD#l2553 The problem is that libvirt uses an internal variable ('port') to keep track of which ephemeral port to use. (It also seems like there might be a problem if more than 64 incoming guests are migrating at the same time.) Note that it may be possible to override the incoming migration URI including port number in the virsh migrate command. See the --desturi and --migrateuri options. This may be a usable temporary workaround. As a long-term fix libvirt and QEMU should do a real search for a free port number by binding to a port. To avoid race conditions either QEMU needs to do this or libvirt must use file descriptor passing to handle QEMU the already-bound socket. *** Bug 1019058 has been marked as a duplicate of this bug. *** This upstream patch looks like it would probably solve this issue https://www.redhat.com/archives/libvir-list/2013-October/msg00652.html (In reply to Daniel Berrange from comment #11) > Libvirt can *not* change the port range used for migration by default, > because that will likely cause regressions for existing customers, due to > the need for them to now change their firewall to open a different range of > ports. Odds are users running gluster will have to change their firewall settings anyway. Can libvirt selectively change migration ports only if the current ones are busy (my understanding is that this is what the current patch does, no?), or maybe only when using gluster? In summary: you presented the problem, any ideas for the solution or workarounds? The solution is the upstream patch to libvirt to make the port range configurable and make libvirt check if the port is in use. This explicitly does not change the default range, so is backcompat safe. This is now fixed upstream by v1.1.3-188-g0196845 and v1.1.3-189-ge3ef20d: commit 0196845d3abd0d914cf11f7ad6c19df8b47c32ed Author: Wang Yufei <james.wangyufei> Date: Fri Oct 11 11:27:13 2013 +0800 qemu: Avoid assigning unavailable migration ports https://bugzilla.redhat.com/show_bug.cgi?id=1019053 When we migrate vms concurrently, there's a chance that libvirtd on destination assigns the same port for different migrations, which will lead to migration failure during prepare phase on destination. So we use virPortAllocator here to solve the problem. Signed-off-by: Wang Yufei <james.wangyufei> Signed-off-by: Jiri Denemark <jdenemar> commit e3ef20d7f7fee595ac4fc6094e04b7d65ee0583a Author: Jiri Denemark <jdenemar> Date: Tue Oct 15 15:26:52 2013 +0200 qemu: Make migration port range configurable https://bugzilla.redhat.com/show_bug.cgi?id=1019053 One more patch is needed to fully support configurable migration ports: commit d9be5a7157515eeae99379e9544c34b34c5e5198 Author: Michal Privoznik <mprivozn> Date: Fri Oct 18 18:28:14 2013 +0200 qemu: Fix augeas support for migration ports Commit e3ef20d7 allows user to configure migration ports range via qemu.conf. However, it forgot to update augeas definition file and even the test data was malicious. Signed-off-by: Michal Privoznik <mprivozn> And one more upstream commit is required: commit c92ca769af2bacefdd451802d7eb1adac5e6597c Author: Zeng Junliang <zengjunliang> Date: Wed Nov 6 11:36:57 2013 +0800 qemu: clean up migration ports when migration cancelled If there's a migration cancelled, the bitmap of migration port should be cleaned up too. Signed-off-by: Zeng Junliang <zengjunliang> Signed-off-by: Jiri Denemark <jdenemar> Any news on this so to be able to test? Thanks, Gianluca As you can see in the three comments above, this bug has been fixed upstream. This bug will get further updates when appropriate for the RHEL release in which this bug will be addressed. Hi, as this is fixed upstream and target release is 6.5 and it's marked as "urgent": When will this get backported? Thanks for your work. Target release is not 6.5 and never was because the bug came in too late to be incorporated in 6.5. The "Version" bugzilla field says in what version of the product the issue was observed. If you want to have this bug fixed earlier than in the next minor update (6.6), please, talk to Red Hat customer support and provide them with a business justification for putting this bug in an Extended Update Support release. Note that running a gluster node and VMs on the same host is not an officially supported configuration so another justification will likely be requested. I'll take that as a "no, we won't backport this for EL 6.5" as I don't run direct RH EL 6.5 but a clone you might know. I can reproduce this with build:libvirt-0.10.2-35.el6.x86_64 get error msg: # virsh migrate --live rhel qemu+ssh://10.66.100.102/system --verbose error: internal error Process exited while reading console log output: char device redirected to /dev/pts/2 qemu-kvm: Migrate: socket bind failed: Address already in use Migration failed. Exit code tcp:[::]:49152(-1), exiting. verify with build:libvirt-0.10.2-37.el6.x86_64 step: S1: 1:prepare gluster server and client(on migration source and dst.) 2:mount gluster pool both on souce and dst. 10.66.100.103:/gluster-vol1 on /var/lib/libvirt/migrate type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 3:prepare a guest with gluster storage 4:check port on dst. tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 0 194240 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.103:1015 ESTABLISHED 0 194475 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.102:1021 ESTABLISHED 0 194462 29866/glusterfsd tcp 0 0 10.66.100.102:1016 10.66.100.102:49152 ESTABLISHED 0 194748 30008/glusterfs tcp 0 0 10.66.100.102:1018 10.66.100.103:49152 ESTABLISHED 0 194464 29879/glusterfs tcp 0 0 10.66.100.102:1015 10.66.100.103:49152 ESTABLISHED 0 194751 30008/glusterfs tcp 0 0 10.66.100.102:49152 10.66.100.103:1017 ESTABLISHED 0 194466 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.102:1016 ESTABLISHED 0 194749 29866/glusterfsd tcp 0 0 10.66.100.102:1021 10.66.100.102:49152 ESTABLISHED 0 194461 29879/glusterfs 4:do migration 5:repeat 20 times, no error occured. S2: 1:prepare a guest same with S1 2:do live migrate then cancelled # virsh migrate rhel qemu+ssh://10.66.100.102/system --verbose Migration: [ 2 %]^Cerror: operation aborted: migration job: canceled by client 3:before migration canceled, check port on dst.: tcp 0 0 :::49153 :::* LISTEN 107 212413 931/qemu-kvm tcp 0 0 ::ffff:10.66.100.102:49153 ::ffff:10.66.100.103:52244 ESTABLISHED 107 212487 931/qemu-kvm after canceled, check port again: # netstat -laputen | grep 49153 no output if cancelled the job, the port cleaned up, can reused in next migration S3: 1:config /etc/libvirt/qemu.conf, edit and restart libvirtd migration_port_min = 51152 migration_port_max = 51251 2:do migration 3:check dst. port # netstat -laputen | grep 51 tcp 0 0 10.66.100.102:1015 10.66.100.103:49152 ESTABLISHED 0 194751 30008/glusterfs tcp 0 0 :::51152 :::* LISTEN 107 214187 1179/qemu-kvm tcp 0 0 ::ffff:10.66.100.102:51152 ::ffff:10.66.100.103:56922 ESTABLISHED 107 214260 1179/qemu-kvm # virsh migrate rhel qemu+ssh://10.66.100.102/system --verbose Migration: [100 %] migration worked well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html *** Bug 1340368 has been marked as a duplicate of this bug. *** |