Hide Forgot
+++ This bug was initially created as a clone of Bug #1018530 +++ +++ This bug was initially created as a clone of Bug #987555 +++ Description of problem: Starting with GlusterFS 3.4, glusterfsd uses the IANA defined ephemeral port range (49152 and upward). If you happen to use the same network for storage and qemu-kvm live migration, sometimes you get a port conflict, and live migration aborts Here's a log of a failed live migration on the destination host: 2013-07-23 15:54:32.619+0000: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name ipasserelle QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name ipasserelle -S -M-M rhel6.4.0 -enable-kvm -m 20482048 -smp-smp 2,sockets=2,cores=1,threads=1 -uuid 8505958b-8227-0a46-91a7-41d3247544e2 -nodefconfig-nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ipasserelle.monitor,server,nowait -mon-mon chardev=charmonitor,id=monitor,mode=controlchardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/gluster/ipasserelle.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=nonefile=/var/lib/libvirt/images/gluster/ipasserelle.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=rawif=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2b:14:d7,bus=pci.0,addr=0x3 -netdev-netdev tap,fd=24,id=hostnet1,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6d:4f:52,bus=pci.0,addr=0x4 -chardev pty,id=charserial0pty,id=charserial0 -device-device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc-vnc 127.0.0.1:0127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x5 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 char device redirected to /dev/pts/2 inet_listen_opts: bind(ipv6,::,49152): Address already in use inet_listen_opts: FAILED Migrate: Failed to bind socket Migration failed. Exit code tcp:[::]:49152(-1), exiting. 2013-07-23 15:54:33.016+0000: shutting down [root@dd9 ~]# netstat -laputen | grep :49152 tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 0 82349 1927/glusterfsd tcp 0 0 127.0.0.1:1015 127.0.0.1:49152 ESTABLISHED 0 82555 1952/glusterfs tcp 0 0 10.90.25.138:49152 10.90.25.137:1016 ESTABLISHED 0 82473 1927/glusterfsd tcp 0 0 10.90.25.138:1021 10.90.25.137:49152 ESTABLISHED 0 82344 1952/glusterfs tcp 0 0 127.0.0.1:49152 127.0.0.1:1008 ESTABLISHED 0 82725 1927/glusterfsd tcp 0 0 127.0.0.1:49152 127.0.0.1:1015 ESTABLISHED 0 82556 1927/glusterfsd tcp 0 0 10.90.25.138:49152 10.90.25.137:1010 ESTABLISHED 0 89092 1927/glusterfsd tcp 0 0 127.0.0.1:1008 127.0.0.1:49152 ESTABLISHED 0 82724 2069/glusterfs tcp 0 0 10.90.25.138:1018 10.90.25.137:49152 ESTABLISHED 0 82784 2115/glusterfs The exact same setup with GlusterFS 3.3.2 is working like a charm Version-Release number of selected component (if applicable): Host is CentOS 6.4 x86_64 gluster 3.4.0-2 (glusterfs glusterfs-server glusterfs-fuse), from the gluster.org RHEL repo libvirt 0.10.2-18 qemu-kvm-rhev 0.12.1.2-2.355.el6.5 How reproducible: Not always, but frequently enough Steps to Reproduce: - Two hosts with a replicated glusterFS volume (both are gluster server and client) - Libvirt on both nodes - One private network used for gluster and live migration - while glusterFS is working, try to live migrate a qemu-kvm VM, using the standard migration (virsh migrate --live vm qemu+ssh://user@other_node/system) - From time to time (not always), the migration will fail because the qemu process on the destination host cannot bind to the choosed port Actual results: Live migration fails Expected results: Live migration shouldn't be bothered by Gluster Additional info: An option to configure the first port, or the port range used by Gluster would avoid this situation --- Additional comment from Daniel on 2013-07-24 05:41:31 EDT --- Just one more info: I have three GlusterFS volumes between the two nodes, and the first three migrations fail. As qemu (or libvirt, not sure which one chooses the incomming migration port) increment the port number at each migration attempt, the fourth migration succeed (and the following migrations succeed too) --- Additional comment from Caerie Houchins on 2013-10-02 17:18:14 EDT --- We just hit this bug in a new setup today. Verifying this still exists. qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64 glusterfs-3.4.0-8.el6.x86_64 CentOS release 6.4 (Final) Linux SERVERNAME 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux --- Additional comment from Gianluca Cecchi on 2013-10-09 18:52:29 EDT --- Same problem with oVirt 3.3 and fedora 19 as Hypervisors. See here: http://wiki.libvirt.org/page/FAQ the range 49152-49215 already used by libvirt years before the Gluster change from 3.3 to 3.4... How could you miss it and worse not able to change at least for 3.4.1 as this bugzilla was opened in July? At least could you provide a way to configure gluster to use another range so that if two nodes are both servers and client they can use another range? You are limiting GlusterFS adoption itself as noone would implement oVirt on GlusterFS without migration available... Thanks for reading Gianluca --- Additional comment from Kaleb KEITHLEY on 2013-10-11 07:54:00 EDT --- Out of curiosity, why isn't this a bug in qemu-kvm? Shouldn't qemu-kvm be trying another port if 49152 (or any other port) is in use? And using portmapper to register the port it does end up using? --- Additional comment from Anand Avati on 2013-10-11 08:07:18 EDT --- REVIEW: http://review.gluster.org/6076 (xlators/mgmt/glusterd: ports conflict with qemu live migration) posted (#1) for review on release-3.4 by Kaleb KEITHLEY (kkeithle) --- Additional comment from Gianluca Cecchi on 2013-10-11 08:34:15 EDT --- From http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers " Dynamic, private or ephemeral ports[edit] The range 49152–65535 (215+214 to 216−1) – above the registered ports – contains dynamic or private ports that cannot be registered with IANA.[133] This range is used for custom or temporary purposes and for automatic allocation of ephemeral ports. " [133] https://tools.ietf.org/html/rfc6335 If a new projyect starts to use a range, in my opinion has to consider that it is not the only project in the world and/or for the future.... ;-) Why libvirt and GlusterFS could not reserve via IANA so that /etc/services could be updated and other projects before set new range can query current status? It seems very like the 192.168.1.x private network used by every one. Latest reserved port 49151 and so why not? Start at 49152... ;-) There are quite several ranges up to 65535 or not? Just my two eurocent --- Additional comment from Gianluca Cecchi on 2013-10-11 08:35:59 EDT --- So as 49152-65535 cannot be registered with IANA why not try any range below 49151 that is still free or ask to IANA to extend or at least coordinate in an way to not overlap? Thanks, Gianluca
QEMU is not choosing the port number. It is libvirt who builds the QEMU command-line, including the -incoming tcp:[::]:49152 option: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_migration.c;h=38edadb9742dd787f1cc58008f45ae +1da6c032ac;hb=HEAD#l2553 The problem is that libvirt uses an internal variable ('port') to keep track of which ephemeral port to use. (It also seems like there might be a problem if more than 64 incoming guests are migrating at the same time.) Note that it may be possible to override the incoming migration URI including port number in the virsh migrate command. See the --desturi and --migrateuri options. This may be a usable temporary workaround. As a long-term fix libvirt and QEMU should do a real search for a free port number by binding to a port. To avoid race conditions either QEMU needs to do this or libvirt must use file descriptor passing to handle QEMU the already-bound socket.
*** Bug 1019058 has been marked as a duplicate of this bug. ***
This upstream patch looks like it would probably solve this issue https://www.redhat.com/archives/libvir-list/2013-October/msg00652.html
(In reply to Daniel Berrange from comment #11) > Libvirt can *not* change the port range used for migration by default, > because that will likely cause regressions for existing customers, due to > the need for them to now change their firewall to open a different range of > ports. Odds are users running gluster will have to change their firewall settings anyway. Can libvirt selectively change migration ports only if the current ones are busy (my understanding is that this is what the current patch does, no?), or maybe only when using gluster? In summary: you presented the problem, any ideas for the solution or workarounds?
The solution is the upstream patch to libvirt to make the port range configurable and make libvirt check if the port is in use. This explicitly does not change the default range, so is backcompat safe.
This is now fixed upstream by v1.1.3-188-g0196845 and v1.1.3-189-ge3ef20d: commit 0196845d3abd0d914cf11f7ad6c19df8b47c32ed Author: Wang Yufei <james.wangyufei> Date: Fri Oct 11 11:27:13 2013 +0800 qemu: Avoid assigning unavailable migration ports https://bugzilla.redhat.com/show_bug.cgi?id=1019053 When we migrate vms concurrently, there's a chance that libvirtd on destination assigns the same port for different migrations, which will lead to migration failure during prepare phase on destination. So we use virPortAllocator here to solve the problem. Signed-off-by: Wang Yufei <james.wangyufei> Signed-off-by: Jiri Denemark <jdenemar> commit e3ef20d7f7fee595ac4fc6094e04b7d65ee0583a Author: Jiri Denemark <jdenemar> Date: Tue Oct 15 15:26:52 2013 +0200 qemu: Make migration port range configurable https://bugzilla.redhat.com/show_bug.cgi?id=1019053
One more patch is needed to fully support configurable migration ports: commit d9be5a7157515eeae99379e9544c34b34c5e5198 Author: Michal Privoznik <mprivozn> Date: Fri Oct 18 18:28:14 2013 +0200 qemu: Fix augeas support for migration ports Commit e3ef20d7 allows user to configure migration ports range via qemu.conf. However, it forgot to update augeas definition file and even the test data was malicious. Signed-off-by: Michal Privoznik <mprivozn>
And one more upstream commit is required: commit c92ca769af2bacefdd451802d7eb1adac5e6597c Author: Zeng Junliang <zengjunliang> Date: Wed Nov 6 11:36:57 2013 +0800 qemu: clean up migration ports when migration cancelled If there's a migration cancelled, the bitmap of migration port should be cleaned up too. Signed-off-by: Zeng Junliang <zengjunliang> Signed-off-by: Jiri Denemark <jdenemar>
Any news on this so to be able to test? Thanks, Gianluca
As you can see in the three comments above, this bug has been fixed upstream. This bug will get further updates when appropriate for the RHEL release in which this bug will be addressed.
Hi, as this is fixed upstream and target release is 6.5 and it's marked as "urgent": When will this get backported? Thanks for your work.
Target release is not 6.5 and never was because the bug came in too late to be incorporated in 6.5. The "Version" bugzilla field says in what version of the product the issue was observed. If you want to have this bug fixed earlier than in the next minor update (6.6), please, talk to Red Hat customer support and provide them with a business justification for putting this bug in an Extended Update Support release. Note that running a gluster node and VMs on the same host is not an officially supported configuration so another justification will likely be requested.
I'll take that as a "no, we won't backport this for EL 6.5" as I don't run direct RH EL 6.5 but a clone you might know.
I can reproduce this with build:libvirt-0.10.2-35.el6.x86_64 get error msg: # virsh migrate --live rhel qemu+ssh://10.66.100.102/system --verbose error: internal error Process exited while reading console log output: char device redirected to /dev/pts/2 qemu-kvm: Migrate: socket bind failed: Address already in use Migration failed. Exit code tcp:[::]:49152(-1), exiting. verify with build:libvirt-0.10.2-37.el6.x86_64 step: S1: 1:prepare gluster server and client(on migration source and dst.) 2:mount gluster pool both on souce and dst. 10.66.100.103:/gluster-vol1 on /var/lib/libvirt/migrate type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 3:prepare a guest with gluster storage 4:check port on dst. tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 0 194240 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.103:1015 ESTABLISHED 0 194475 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.102:1021 ESTABLISHED 0 194462 29866/glusterfsd tcp 0 0 10.66.100.102:1016 10.66.100.102:49152 ESTABLISHED 0 194748 30008/glusterfs tcp 0 0 10.66.100.102:1018 10.66.100.103:49152 ESTABLISHED 0 194464 29879/glusterfs tcp 0 0 10.66.100.102:1015 10.66.100.103:49152 ESTABLISHED 0 194751 30008/glusterfs tcp 0 0 10.66.100.102:49152 10.66.100.103:1017 ESTABLISHED 0 194466 29866/glusterfsd tcp 0 0 10.66.100.102:49152 10.66.100.102:1016 ESTABLISHED 0 194749 29866/glusterfsd tcp 0 0 10.66.100.102:1021 10.66.100.102:49152 ESTABLISHED 0 194461 29879/glusterfs 4:do migration 5:repeat 20 times, no error occured. S2: 1:prepare a guest same with S1 2:do live migrate then cancelled # virsh migrate rhel qemu+ssh://10.66.100.102/system --verbose Migration: [ 2 %]^Cerror: operation aborted: migration job: canceled by client 3:before migration canceled, check port on dst.: tcp 0 0 :::49153 :::* LISTEN 107 212413 931/qemu-kvm tcp 0 0 ::ffff:10.66.100.102:49153 ::ffff:10.66.100.103:52244 ESTABLISHED 107 212487 931/qemu-kvm after canceled, check port again: # netstat -laputen | grep 49153 no output if cancelled the job, the port cleaned up, can reused in next migration S3: 1:config /etc/libvirt/qemu.conf, edit and restart libvirtd migration_port_min = 51152 migration_port_max = 51251 2:do migration 3:check dst. port # netstat -laputen | grep 51 tcp 0 0 10.66.100.102:1015 10.66.100.103:49152 ESTABLISHED 0 194751 30008/glusterfs tcp 0 0 :::51152 :::* LISTEN 107 214187 1179/qemu-kvm tcp 0 0 ::ffff:10.66.100.102:51152 ::ffff:10.66.100.103:56922 ESTABLISHED 107 214260 1179/qemu-kvm # virsh migrate rhel qemu+ssh://10.66.100.102/system --verbose Migration: [100 %] migration worked well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html
*** Bug 1340368 has been marked as a duplicate of this bug. ***