Bug 1018530

Summary: qemu live migration port conflicts with other users of ephemeral port(s)
Product: [Fedora] Fedora Reporter: Kaleb KEITHLEY <kkeithle>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: urgent    
Version: 19CC: berrange, cdhouch, clalancette, crobinso, dani-rh, eblake, gianluca.cecchi, gluster-bugs, herrold, itamar, jforbes, jherrman, jtomko, jyang, kkeithle, laine, libvirt-maint, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.0.5.9-1.fc19 Doc Type: Bug Fix
Doc Text:
Prior to this update, migrating a virtual machine failed when the libvirtd service used a transmission control protocol (TCP) port that was already in use. Now, it is possible to predefine a custom migration TCP port range in case the default port is in use. In addition, libvirtd now ensures that the port it chooses from the custom range is not used by another process.
Story Points: ---
Clone Of: 987555
: 1018695 (view as bug list) Environment:
Last Closed: 2014-01-26 11:54:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kaleb KEITHLEY 2013-10-13 00:02:00 UTC
+++ This bug was initially created as a clone of Bug #987555 +++

Description of problem:
Starting with GlusterFS 3.4, glusterfsd uses the IANA defined ephemeral port range (49152 and upward). If you happen to use the same network for storage and qemu-kvm live migration, sometimes you get a port conflict, and live migration aborts

Here's a log of a failed live migration on the destination host:

2013-07-23 15:54:32.619+0000: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name ipasserelle QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name ipasserelle -S -M-M rhel6.4.0  -enable-kvm -m 20482048 -smp-smp 2,sockets=2,cores=1,threads=1  -uuid  8505958b-8227-0a46-91a7-41d3247544e2 -nodefconfig-nodefconfig -nodefaults  -chardev  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ipasserelle.monitor,server,nowait -mon-mon chardev=charmonitor,id=monitor,mode=controlchardev=charmonitor,id=monitor,mode=control -rtc  base=utc  -no-shutdown -device  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/gluster/ipasserelle.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=nonefile=/var/lib/libvirt/images/gluster/ipasserelle.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=rawif=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device  ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2b:14:d7,bus=pci.0,addr=0x3 -netdev-netdev tap,fd=24,id=hostnet1,vhost=on,vhostfd=25 -device  virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6d:4f:52,bus=pci.0,addr=0x4  -chardev pty,id=charserial0pty,id=charserial0 -device-device isa-serial,chardev=charserial0,id=serial0  -device usb-tablet,id=input0 -vnc-vnc 127.0.0.1:0127.0.0.1:0 -vga  cirrus  -device intel-hda,id=sound0,bus=pci.0,addr=0x5  -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming  tcp:[::]:49152 -device-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
char device redirected to /dev/pts/2
inet_listen_opts: bind(ipv6,::,49152): Address already in use
inet_listen_opts: FAILED
Migrate: Failed to bind socket
Migration failed. Exit code tcp:[::]:49152(-1), exiting.
2013-07-23 15:54:33.016+0000: shutting down

[root@dd9 ~]# netstat -laputen | grep :49152
tcp        0      0 0.0.0.0:49152               0.0.0.0:*                   LISTEN      0          82349      1927/glusterfsd     
tcp        0      0 127.0.0.1:1015              127.0.0.1:49152             ESTABLISHED 0          82555      1952/glusterfs      
tcp        0      0 10.90.25.138:49152          10.90.25.137:1016           ESTABLISHED 0          82473      1927/glusterfsd     
tcp        0      0 10.90.25.138:1021           10.90.25.137:49152          ESTABLISHED 0          82344      1952/glusterfs      
tcp        0      0 127.0.0.1:49152             127.0.0.1:1008              ESTABLISHED 0          82725      1927/glusterfsd     
tcp        0      0 127.0.0.1:49152             127.0.0.1:1015              ESTABLISHED 0          82556      1927/glusterfsd     
tcp        0      0 10.90.25.138:49152          10.90.25.137:1010           ESTABLISHED 0          89092      1927/glusterfsd     
tcp        0      0 127.0.0.1:1008              127.0.0.1:49152             ESTABLISHED 0          82724      2069/glusterfs      
tcp        0      0 10.90.25.138:1018           10.90.25.137:49152          ESTABLISHED 0          82784      2115/glusterfs



The exact same setup with GlusterFS 3.3.2 is working like a charm

Version-Release number of selected component (if applicable):

Host is CentOS 6.4 x86_64

gluster 3.4.0-2 (glusterfs glusterfs-server glusterfs-fuse), from the gluster.org RHEL repo
libvirt 0.10.2-18
qemu-kvm-rhev 0.12.1.2-2.355.el6.5


How reproducible:

Not always, but frequently enough


Steps to Reproduce:

- Two hosts with a replicated glusterFS volume (both are gluster server and client)
- Libvirt on both nodes
- One private network used for gluster and live migration
- while glusterFS is working, try to live migrate a qemu-kvm VM, using the standard migration (virsh migrate --live vm qemu+ssh://user@other_node/system)
- From time to time (not always), the migration will fail because the qemu process on the destination host cannot bind to the choosed port


Actual results:
Live migration fails


Expected results:
Live migration shouldn't be bothered by Gluster

Additional info:
An option to configure the first port, or the port range used by Gluster would avoid this situation

--- Additional comment from Daniel on 2013-07-24 05:41:31 EDT ---

Just one more info: I have three GlusterFS volumes between the two nodes, and the first three migrations fail.

As qemu (or libvirt, not sure which one chooses the incomming migration port) increment the port number at each migration attempt, the fourth migration succeed (and the following migrations succeed too)

--- Additional comment from Caerie Houchins on 2013-10-02 17:18:14 EDT ---

We just hit this bug in a new setup today.  Verifying this still exists.  
qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
glusterfs-3.4.0-8.el6.x86_64
CentOS release 6.4 (Final)
Linux SERVERNAME 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

--- Additional comment from Gianluca Cecchi on 2013-10-09 18:52:29 EDT ---

Same problem with oVirt 3.3 and fedora 19 as Hypervisors.
See here:
http://wiki.libvirt.org/page/FAQ
the range 49152-49215 already used by libvirt years before the Gluster change from 3.3 to 3.4...
How could you miss it and worse not able to change at least for 3.4.1 as this bugzilla was opened in July?

At least could you provide a way to configure gluster to use another range so that if two nodes are both servers and client they can use another range?

You are limiting GlusterFS adoption itself as noone would implement oVirt on GlusterFS without migration available...

Thanks for reading
Gianluca

--- Additional comment from Kaleb KEITHLEY on 2013-10-11 07:54:00 EDT ---

Out of curiosity, why isn't this a bug in qemu-kvm? Shouldn't qemu-kvm be trying another port if 49152 (or any other port) is in use? And using portmapper to register the port it does end up using?

--- Additional comment from Anand Avati on 2013-10-11 08:07:18 EDT ---

REVIEW: http://review.gluster.org/6076 (xlators/mgmt/glusterd: ports conflict with qemu live migration) posted (#1) for review on release-3.4 by Kaleb KEITHLEY (kkeithle)

--- Additional comment from Gianluca Cecchi on 2013-10-11 08:34:15 EDT ---

From
http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
"
Dynamic, private or ephemeral ports[edit]

The range 49152–65535 (215+214 to 216−1) – above the registered ports – contains dynamic or private ports that cannot be registered with IANA.[133] This range is used for custom or temporary purposes and for automatic allocation of ephemeral ports.
"

[133]
https://tools.ietf.org/html/rfc6335

If a new projyect starts to use a range, in my opinion has to consider that it is not the only project in the world and/or for the future.... ;-)

Why libvirt and GlusterFS could not reserve via IANA so that /etc/services could be updated and other projects before set new range can query current status?

It seems very like the 192.168.1.x private network used by every one.
Latest reserved port 49151 and so why not? Start at 49152... ;-)

There are quite several ranges up to 65535 or not?

Just my two eurocent

--- Additional comment from Gianluca Cecchi on 2013-10-11 08:35:59 EDT ---

So as

49152-65535 cannot be registered with IANA

why not try any range below 49151 that is still free or ask to IANA to extend or at least coordinate in an way to not overlap?
Thanks,
Gianluca

Comment 2 Ján Tomko 2014-01-09 13:35:16 UTC
Now fixed on v1.0.5-maint as of:
commit 2dd4b3939407f4216892e0c461f615b3dae53c4f
Author:     Zeng Junliang <zengjunliang>
    qemu: clean up migration ports when migration cancelled
    (cherry picked from commit c92ca769af2bacefdd451802d7eb1adac5e6597c)

Comment 3 Gianluca Cecchi 2014-01-09 14:28:19 UTC
Well.
Any package to test and give feedback in koji or updates-testing?
What does it mean -maint in version name?
Thanks 
Gianluca

Comment 4 Eric Blake 2014-01-09 16:33:47 UTC
(In reply to Gianluca Cecchi from comment #3)
> Well.
> Any package to test and give feedback in koji or updates-testing?

Not yet.  We are waiting for a CVE embargo to be lifted, to fix multiple BZ in one build.

> What does it mean -maint in version name?

That is the libvirt.git branch name that contains the patches that will be incorporated into the next Fedora build.  You can test from a direct checkout of libvirt.git, if desired:

git clone git://libvirt.org/libvirt.git
cd libvirt
git checkout v1.0.5-maint
./autogen.sh --system
make rpc

Comment 5 Eric Blake 2014-01-09 19:09:25 UTC
(In reply to Eric Blake from comment #4)
> 
> git clone git://libvirt.org/libvirt.git
> cd libvirt
> git checkout v1.0.5-maint
> ./autogen.sh --system
> make rpc

Correction: 'make rpm'

Comment 6 Fedora Update System 2014-01-17 14:02:56 UTC
libvirt-1.0.5.9-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/libvirt-1.0.5.9-1.fc19

Comment 7 Fedora Update System 2014-01-18 04:29:21 UTC
Package libvirt-1.0.5.9-1.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.0.5.9-1.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-1090/libvirt-1.0.5.9-1.fc19
then log in and leave karma (feedback).

Comment 8 Gianluca Cecchi 2014-01-21 16:26:05 UTC
Hello I have successfully tested libvirt-1.0.5.9-1.fc19, for both port definition setting and port skipping if busy.
My environment has oVirt 3.3.3beta1 with 2 f19 hosts and oVirt stable+beta repo defined.
My DC is of type Gluster version 3.4.2-1.fc19.x86_64 with two bricks
Initial config is gluster configured with range starting at 50152:
    option base-port 50152

So ports occupied are 50152 and 50153

Initial configuration 1.0.5.8-1.fc19
libvirt with its own base config, so bound to 49152-xxx
During live migration netstat command gives:
On source
tcp        0      0 10.4.4.58:42501         10.4.4.59:49152         ESTABLISHED

On destination
tcp6  135382	  0 10.4.4.59:49152         10.4.4.58:42501         ESTABLISHED

After updating and setting 51152-51251 for libvirt:
/etc/libvirt/qemu.conf
migration_port_min = 51152
migration_port_max = 51251

I'm able to successfully live migrate

src
tcp        0  57344 10.4.4.59:53761         10.4.4.58:51152         ESTABLISHED

dest
tcp6	   0	  0 10.4.4.58:51152         10.4.4.59:53761         ESTABLISHED


I also tested libvirt on same range of gluster 50152-50251
My gluster has two bricks and live migration completed at first attempt, silently adapting to the first available port (50154):

On destination netstat gives:
tcp        0	  0 0.0.0.0:50152           0.0.0.0:*               LISTEN
tcp        0	  0 0.0.0.0:50153           0.0.0.0:*               LISTEN
tcp        0	  0 192.168.3.3:1008        192.168.3.3:50153       ESTABLISHED
tcp        0	  0 192.168.3.3:50153       192.168.3.1:1009        ESTABLISHED
tcp        0	  0 192.168.3.3:1007        192.168.3.1:50152       ESTABLISHED
tcp        0	  0 192.168.3.3:1014        192.168.3.3:50152       ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.1:1012        ESTABLISHED
tcp        0	  0 192.168.3.3:1006        192.168.3.3:50152       ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.3:1014        ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.3:1018        ESTABLISHED
tcp        0	  0 192.168.3.3:1017        192.168.3.1:50153       ESTABLISHED
tcp        0	  0 192.168.3.3:50153       192.168.3.3:1020        ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.1:1000        ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.3:1006        ESTABLISHED
tcp        0	  0 192.168.3.3:1020        192.168.3.3:50153       ESTABLISHED
tcp        0	  0 192.168.3.3:1018        192.168.3.3:50152       ESTABLISHED
tcp        0	  0 192.168.3.3:50153       192.168.3.1:1001        ESTABLISHED
tcp        0	  0 192.168.3.3:1019        192.168.3.1:50152       ESTABLISHED
tcp        0	  0 192.168.3.3:1005        192.168.3.1:50153       ESTABLISHED
tcp        0	  0 192.168.3.3:50153       192.168.3.3:1008        ESTABLISHED
tcp        0	  0 192.168.3.3:1013        192.168.3.1:50153       ESTABLISHED
tcp        0	  0 192.168.3.3:50152       192.168.3.1:1006        ESTABLISHED
tcp6	   0	  0 10.4.4.59:50154         10.4.4.58:37323         ESTABLISHED
udp        0	  0 0.0.0.0:35011           0.0.0.0:*

I don't remember if there was a specific bugzilla for port busy management or if it is ok here...

Gianluca

Comment 9 Cole Robinson 2014-01-22 13:22:48 UTC
Thanks for the detailed results Gianluca! They are fine here, there isn't any other bug that I know of

Comment 10 Fedora Update System 2014-01-26 11:54:11 UTC
libvirt-1.0.5.9-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.