Bug 1751969 - libvirt port allocator does not take into account IP addresses
Summary: libvirt port allocator does not take into account IP addresses
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-13 10:08 UTC by mkeedlinger
Modified: 2024-12-17 12:32 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-12-17 12:32:00 UTC
Embargoed:


Attachments (Terms of Use)

Description mkeedlinger 2019-09-13 10:08:46 UTC
Description of problem:
When 2 VMs listen on the same port, but on different addresses for spice (or anything else as I understand it), starting the second VM failed, stating the port is taken.

Helpful information can be found here: https://unix.stackexchange.com/questions/541127/how-to-listen-on-same-port-separate-address-using-spice-from-libvirt

These lines on code seem to be related: https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virportallocator.c;h=1ffd446f40b2c1bfb5bc85db566fe5707a76d5cf;hb=HEAD#l213

(Thanks to @teuf from both the #libvirt irc and #spice irc, he was very helpful)


How reproducible:
100%


Steps to Reproduce:
1. Create and start a VM using spice to emulate, listening on port 5900 for address 10.0.0.2 
2. Create and start a VM using spice to emulate, listening on port 5900 for address 10.0.0.3
3. Second VM should fail to start, claiming port collision.

Actual results:
Second VM fails to start, claiming a port collision.

Expected results:
VM should start, listening on correct address.

Additional info:

Comment 1 Daniel Berrangé 2019-09-13 10:27:40 UTC
This is actually a really difficult problem to fix because there's lots of special cases to deal with. You can't simply compare addresses to see if they are going to clash. The wildcard addresses 0.0.0.0 / ::  need special handling as they'll clash with all other addresses. IPv6 addrs also need to be canonicalized as there's many ways to write the same address. When binding to IPv6 you also may or may not silently get bound to IPv4 too - this depends on the host OS kernel, and on Linux also depends on sysctl settings, and also depends on socket settings the application used - which have changed in different QEMU versions. Finally if the address is in fact a hostname, you have to resolve it to many addresses too.

Having multiple VMs on the same port number but different IP addresses is also potentially confusing / errorprone, so in some sense I think it is in fact desirable that libvirt has this "bug" and treats port reservations as global across all addresses.

Do you have a compelling / strong need to have multiple VMs running on the same port with different addresses, or was this simply something you hit by chance ?

My preference is really to not fix this bug and thus avoid the significant complexity in tracking reservations per address.

Comment 2 Cole Robinson 2019-09-13 16:13:01 UTC
Maybe if the user requests a manual port, we don't attempt to validate it and just pass it through to qemu, which will fail for us if there's a collision. We still need to mark the port as occupied so autoport= usage doesn't attempt to use it

Comment 3 Daniel Berrangé 2019-09-13 16:24:40 UTC
To reserve the port we would need to introduce reference counting into the reservation system, which means we can't use the bitmap concept, so that adds complexity too, though admittedly not as much.

Comment 4 Michal Privoznik 2019-09-16 05:54:55 UTC
This looks like a duplicate of bug 1209959.
I think this bug would be technically fixed if libvirt opened the listen socket and then just passed FD to qemu. I even think there's a bug opened just for that, but I'm unable to find it now.
Anyway, the approach I'm proposing would solve much more problems than just this one. For instance, there is an inherent race that is unfixable with virportallocator - a typical TOCTOU - when constructing a cmd line, virportallocator finds an unused port and puts it onto generated qemu cmd line. Then, when qemu is executed it tries to bind() to that port, but obviously, this is not atomic.

Comment 5 Jiri Lunacek 2019-09-16 11:03:17 UTC
There are several possible solutions as I see it, however I am not in that depth of understanding libvirt internals.

As Daniel suggests, reference counting is one.

If the main goal would be to see possible ports for automatic assignment then another would be (when manual port is specified) to reserve the port if it is free in the bitmap and then look if it is still bound when the qemu process ends and only make it available if it is (the process was the only one holding it). This may be a viable variant of what Cole suggests.

As I originally suggested, the reservation map could also be (address, port) tuple as that is actually full identifier of a network port.


Personal note:
I am glad that after 4 and half years this gets some attention and I am not the only one who has a problem with this. This dates back to introduction of libvirt 3.9. Prior to this the autoport feature tried to bind the port to see if it was possible to use it. I don't know why this was removed in the first place. We had a patch that we ported throughout the versions to disable autoport altogether.

Anyway the autoport bitmap probably has more roles than to just reserve ports for virtual display as the latest version of libvirt in CentOS (patched with autoport disabled) ended up terminating qemu processes that used the same port number (while reporting the qemu process cannot be re-connected).
We ended up removing the patch, enabling autoport with all VMs and using hooks to set up NAT rules to bind the display ports to where we want them.

Comment 6 mkeedlinger 2019-10-01 22:57:04 UTC
Hi, sorry I'm late to reply. It always amazes me the good experiences I have with the open source community. Thanks for that!

To me it sounds easy enough to disable the check if a port is manually set, and autoport= is not set, so in that way I agree with Cole Robinson. However, I do claim ignorance to actually implementing this fix, and am not sure what other downfalls it might bring (if any).

Daniel Berrangé mentions this might be a hard issue to fix, and I can see why the use case may be limited. For me it's just about not specifying a port when I intend to connect (makes connection from just a hostname easier). At the very least this situation should have a more clear error message; claiming a port is taken when it isn't is obviously buggy / unwanted behaviour.

Thanks!

Comment 7 Daniel Berrangé 2024-12-17 12:32:00 UTC
Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution.


Note You need to log in before you can comment on or make changes to this bug.