Bug 1235846 - vnc + websocket = websocket autoport not working right at live migration
Summary: vnc + websocket = websocket autoport not working right at live migration
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: All
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-25 21:50 UTC by piotr.rybicki
Modified: 2016-12-12 22:13 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-12 22:13:12 UTC
Embargoed:


Attachments (Terms of Use)

Description piotr.rybicki 2015-06-25 21:50:13 UTC
Description of problem:

When I start qemu via libvirt with vnc and websocket defined, it is not possible to live migrate to other host, where other qemu process is running with the same display id. 

Version-Release number of selected component (if applicable):

1.2.16 (and many previous versions)

How reproducible:
Always

Steps to Reproduce:
1. have 2 libvirtd hosts
2. start one qemu guest on each host, with vnc display and websocket definet and with identical display id
3. live migrate one qemu guest from one host to other

Actual results:

migration error is:

error: internal error: early end of file from monitor: possible problem:
            [1] => 2015-06-23T11:54:25.590506Z qemu-system-x86_64: -vnc 0.0.0.0:1,websocket=5700,password: Failed to start VNC server on `(null)': Failed to bind socket: Address already in use

(please note vnc display id=1 and websocket=5700 - where it should be 5701) 


Expected results:

live migration is possible.

Additional info:

in libvirt's xml i have:
(...)
    <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' passwd='xxx'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
(...)

for first and only qemu process on host, this creates qemu commandline:
(...) -vnc 0.0.0.0:0,websocket=5700,password (...)

for second qemu process on the same host:
(...)  -vnc 0.0.0.0:1,websocket=5701,password (...)

There is no problem with migration, when there is no websocket configuration.

Solution:

I believe, to solve this problem, libvirt has to omit websocket port definition in commandline string ('websocket=5700' => 'websocket') when autoport is defined in domain xml definition.

Martin has sone adittional ideas here:
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg06752.html

Best regards
Piotr Rybicki

Comment 1 piotr.rybicki 2015-08-12 13:04:55 UTC
I have to report that this bug still exists in libvirt 1.2.18

Comment 2 mabrothier 2016-03-02 14:44:43 UTC
Any update on a fix for this bug? We have the same problem trying to do live migration of our VMs.

Marco

Comment 3 Cole Robinson 2016-04-12 20:57:48 UTC
I investigated this a bit. The issue is that the destination host thinks the auto allocated websocket port was an explicit port request by the user.

It comes down to the fact that we don't have an explicit autoport= knob for websocket, and used the old websocket=-1 pattern. What happens is that the runtime XML is then not fully describing of the requested VM config: the runtime XML has websocket=5700, and the websocket=-1/autoport data isn't listed. It _is_ listed in the inactive XML, which is how we preserve it on a single machine.

However with migration, we send the runtime XML to the new host, which parses it internally with the INACTIVE flag. However just operating on the runtime XML it has no way of knowing if websocket=5700 was an explicitly requested port, or an autoallocated port. It assumes explicitly requested, then collides with the VM running on the destination host.

The proper fix is to add a websocket_autoport setting or similar, and handle the backcompat like we do with port=-1. Then it should all just work, but only for fixed libvirt on both the source and dest.

FWIW I don't plan to work on this. If anyone affected by this is a RHEL (or openstack) customer, I suggest re-assigning to that product since it'll get more attention

Comment 4 Cole Robinson 2016-12-12 22:13:12 UTC
Sounds like this was fixed upstream with:

commit 61a0026a941c0bd8c4e916b4cf62e53e77152713
Author: Nikolay Shirokovskiy <nshirokovskiy>
Date:   Tue Nov 22 14:09:33 2016 +0300

    qemu: Fix xml dump of autogenerated websocket


Note You need to log in before you can comment on or make changes to this bug.