Bug 2184967 - RFE: support passt for appliance networking, as QEMU userspace networking backend
Summary: RFE: support passt for appliance networking, as QEMU userspace networking bac...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Laszlo Ersek
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-06 10:22 UTC by Laszlo Ersek
Modified: 2023-07-14 19:52 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-07-14 16:00:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2222766 0 unspecified CLOSED passt integration incomplete 2023-09-22 16:10:09 UTC

Description Laszlo Ersek 2023-04-06 10:22:17 UTC
passt is the next-gen userspace networking component, eventually obsoleting / replacing slirp. Both QEMU (>= 7.2) and libvirtd (>= 9.2) expose passt.

In libguestfs, if appliance networking is being requested, we should detect whether passt is made available by the appliance back-end chosen by the user (i.e., QEMU vs libvirtd), and if so, favor passt over slirp. Fall back to slirp if networking is asked for but passt is not available.

For now this ticket aims at evaluating whether such a feature is feasible; it's not a hard RFE.

Regarding the passt availability detection ("feature test"), Stefano (CCd) mentions:

"""
For QEMU capabilities, see libvirt commit 5af6134e7068 ("qemu: new
capability QEMU_CAPS_NETDEV_STREAM").

For libvirt itself, <backend type="passt"/> needs to be supported in
<interface type='user'/>, if you check the XML schema
"""

I think for the latter point, we might want "virsh domcapabilities" to actually output the available <interface> types and subtypes (CC Michal). As far as I can tell, in "lib/launch-libvirt.c" of libguestfs, we already fetch all four pieces of information: qemu version number, libvirt version number, capabilities, domcapabilities.

Comment 1 Laszlo Ersek 2023-07-11 10:44:29 UTC
Unfortunately this looks complicated for the direct backend. The passt utility needs to be started in a separate child process, as far as I can tell. I'm not sure if we want an open-coded fork() for that, in launch_direct() [lib/launch-direct.c], or else if we can reuse guestfs_int_cmd_pipe_run() for this purpose.

Then, composing the command line for passt is an exercise in tedium, if I am to judge after libvirt's qemuPasstStart() function [src/qemu/qemu_passt.c].

Yet another white spot: assuming the libguestfs parent process crashes, we have a "recovery" process (another child) in place for killing qemu (= the appliance). It's unclear whether the exiting of that qemu process (which means a disconnect from the unix domain socket that passt reads) will trigger passt to exit as well, or else we need to kill passt separately from the recovery process.

The weight of libvirt really shows here. QEMU does not start (or stop) passt itself, it's either libvirtd or the user (similarly to how swtpm is started/stopped). So if we want to enable passt for the direct backend, we have to duplicate what libvirt does.

Comment 2 Laszlo Ersek 2023-07-11 12:23:20 UTC
Hi Laine, is the QEMU capabilities list available to libvirt clients? (See libvirt commit 5af6134e7068, "qemu: new capability QEMU_CAPS_NETDEV_STREAM", 2023-01-09.) Thanks.

Comment 3 Laine Stump 2023-07-11 18:58:00 UTC
The QEMU_CAPS_* flags were only ever meant for internal consumption (used to implement underlying bits differently), and aren't available anywhere in the public libvirt API. This does have a silver lining, because periodically flags are deprecated, and the code beneath them cleaned up, when some feature reaches the stage that it's available in all currently-supported QEMUs, but it does make the situation more difficult for a consumer like libguestfs during that time that there are mixes of "supporting" and "unsupporting" QEMUs available.

There are some features that are advertised in virConnectDomainCapabilities (the function behind "virsh domcapabilities"), but passt support isn't one of those. I guess we could add it in there, but that doesn't help much with excisting releases of libvirt/QEMU/passt. :-/ (Peter pointed out earlier today that domcapabilities has 0 information about support for *any* type of network interface)

For all practical purposes, I think that passt became available on most platforms at or near the same time that versions of libvirt and qemu supporting passt became available; I'm not sure what OS releases there were that were new enough two have one without having the other two. It may be "good enough" to just see if passt is installed, or if libvirt xml accepts the new options to virDomainDefineXML and spits it back out in a following virDomainGetXMLDesc.

Comment 4 Laszlo Ersek 2023-07-12 13:42:13 UTC
I've enabled passt for the libvirt backend.

It results in the following passt commandline (from libvirt):

/usr/bin/passt --one-off --socket /run/user/1000/libvirt/qemu/run/passt/1-guestfs-6liaxjtmow95-net0.socket --mac-addr 52:54:00:41:b4:8e --pid /run/user/1000/libvirt/qemu/run/passt/1-guestfs-6liaxjtmow95-net0-passt.pid --address 169.254.0.0 --netmask 16

That is consistent with my prior reading of the passt manual. In particular, with regard to the direct backend, (1) the "--one-off" option, and (2) the fact that passt daemonizes unless told otherwise, are promising. (1) means that in the direct backend, we need not worry about "recovery" -- as soon as qemu exits (either gracefully, or because our "recovery" logic kills it), passt will exit too, upon seeing EOF on the socket. And (2) means that we should be able to use guestfs_int_cmd_run() for launching passt -- the daemonization will be handled by passt. Also I believe we'll have no use for the "--pid" option (PID file not needed).

Comment 5 Richard W.M. Jones 2023-07-12 13:49:28 UTC
Setting prctl(PR_SET_PDEATHSIG) before we exec passt might be an idea too.
(In fact it might be an idea for qemu, and get rid of the recovery process).
Apparently it is preserved across exec.

See also:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/exit-with-parent.c

Comment 6 Laszlo Ersek 2023-07-12 14:40:55 UTC
Welp, that won't work.

- We still need to kill passt if we start it up successfully, but we don't get as far as launching QEMU, for some reason.

- That means we either need to pass --foreground to passt (so it remain our direct child, and we can kill it), or pass --pid to it (so we can learn the PID of the daemonized process, and kill *that*.

- The problem with --foreground is that (a) libguestfs currently has no nice internal API for forking & execing *long-lived* children, and (b) --foreground does not work with --one-shot, as far as I can tell.

- The problem with --pid is that passt is *only* willing to write its PID file to $XDG_RUNTIME_DIR, and not to (e.g.) /tmp. (See also the cmdline generated by libvirtd, in comment 4.) This is probably due to dropping privileges, or whatever. However, libguestfs has two "temp dir" concepts: a real "temp dir" for regular files, and a "sock dir" for unix domain sockets. The former is usually under /tmp (unless overridden), and the latter is by default under $XDG_RUNTIME_DIR. So for the PID file of passt, we need a *regular file* under the "sock dir", and that violates the guestfs APIs / documentation. In effect we'd have to call guestfs_int_create_socketname() -- or at least guestfs_int_lazy_make_sockdir() -- and place the PID file, a *regular file*, there.

Comment 7 Laszlo Ersek 2023-07-12 14:42:46 UTC
Ugh, my comment 6 was an update for comment 4, not a response to comment 5...

Comment 8 Laszlo Ersek 2023-07-13 09:16:52 UTC
Laine, Stefano,

I don't understand the "--mac-addr" option that libvirt generates for
passt (see libvirt commit a56f0168d576, "qemu: hook up passt config to
qemu domains", 2023-01-10).

As far as I can tell, libvirt passes the virtio-net device's MAC address
to passt. See in qemuPasstStart():

+int
+qemuPasstStart(virDomainObj *vm,
+               virDomainNetDef *net)
+{
...
+    virCommandAddArgList(cmd,
+                         "--one-off",
+                         "--socket", passtSocketName,
+                         "--mac-addr", virMacAddrFormat(&net->mac, macaddr),
+                         NULL);

That seems wrong to me. The virtio-net device's MAC address identifies
the virtual NIC that the guest owns. Whereas passt's "--mac-addr" is
documented as follows:

> -M, --mac-addr addr
>   Use source MAC address addr when communicating to the guest or to
>   the target namespace. Default is to use the MAC address of the
>   interface with the first IPv4 default route on the host.

In other words, it is supposed to impersonate a NIC that's *different*
from the guest's, but is on the same Ethernet subnet. Considering that
passt provides a DHCP server, this is the MAC addr that DHCP replies
should appear from. Considering also that passt effectively enables
routing for the guest (see the "IPv4 default route on the host" language
in the above quote!), it stands to reason that this MAC address is
supposed to belong to the *default gateway* / router on the Ethernet
subnet that the guest's virtual NIC is attached to.

Consider also SLIRP -- i.e., the facility that passt is replacing. If
you look up "-netdev user" in the QEMU manual, it doesn't offer any
property for setting the MAC address.

Further note that libvirt does not pass any --gateway option to passt.

> -g, --gateway addr
>   Assign IPv4 addr as default gateway via DHCP (option 3), or IPv6
>   addr as source for NDP Router Adver‐ tisement and DHCPv6 messages.
>   This option can be specified zero (for defaults) to two times (once
>   for IPv4, once for IPv6). By default, IPv4 and IPv6 addresses are
>   taken from the host interface with the first default route for the
>   corresponding IP version.
>
>   Note: these addresses are also used as source address for packets
>   directed to the guest or to the target namespace having a loopback
>   or local source address, to allow mapping of local traffic to guest
>   and target namespace. See the NOTES below for more details about
>   this mechanism.

Then

> NOTES
>
>   Handling of traffic with local destination and source addresses
>
>     Both passt and pasta can bind on ports with a local address,
>     depending on the configuration. Local destina‐ tion or source
>     addresses need to be changed before packets are delivered to the
>     guest or target namespace: most operating systems would drop
>     packets received from non-loopback interfaces with local
>     addresses, and it would also be impossible for guest or target
>     namespace to route answers back.
>
>     For convenience, and somewhat arbitrarily, the source address on
>     these packets is translated to the address of the default IPv4 or
>     IPv6 gateway -- this is known to be an existing, valid address on
>     the same subnet.
>
>     Loopback destination addresses are instead translated to the
>     observed external address of the guest or tar‐ get namespace.
>     For IPv6 packets, if usage of a link-local address by guest or
>     namespace has ever been ob‐ served, and the original destination
>     address is also a link-local address, the observed link-local
>     address is used. Otherwise, the observed global address is used.
>     For both IPv4 and IPv6, if no addresses have been seen yet, the
>     configured addresses will be used instead.
>
>     For example, if passt or pasta receive a connection from
>     127.0.0.1, with destination 127.0.0.10, and the de‐ fault IPv4
>     gateway is 192.0.2.1, while the last observed source address from
>     guest or namespace is 192.0.2.2, this will be translated to a
>     connection from 192.0.2.1 to 192.0.2.2.
>
>     Similarly, for traffic coming from guest or namespace, packets
>     with destination address corresponding to the default gateway will
>     have their destination address translated to a loopback address,
>     if and only if a packet, in the opposite direction, with a
>     loopback destination or source address, port-wise matching for
>     UDP, or connection-wise for TCP, has been recently forwarded to
>     guest or namespace. This behaviour can be disabled with
>     --no-map-gw.

Thus, I believe that what libvirt should do is *not* pass "--mac-addr"
at all -- that would be consistent with the *default* behavior of
--gateway (which libvirt also doesn't pass). That means the guest will
see itself to be on the same Ethernet subnet as the first routable NIC
of the host, and will consider the host's first routable NIC as its
gateway / router, meaning *both* IP *and* MAC addresses.

Here's why this matters to me: I'm about to start "passt" manully, from
the libguestfs direct backend. I intend to stay close to the passt
command line that libvirt generates (see comment 4). For the
"--mac-addr" option, *assuming* I want to pass it, I need to create
*some* MAC address. Doing what libvirt does at the moment (see above)
does not seem ideal, because (1) IMO that practice is simply wrong (see
above), and (2) because in the libguestfs direct backend, we don't
generate a MAC address for the *guest-side* NIC (= virtio-net) anyway;
so if I wanted those two to match, I'd have to generate the MAC for the
virtio-net NIC at first.

My opinion is that neither libvirt nor the libguestfs direct backend
should generate "--mac-addr" at all, for passt.

(NB technically I could do the MAC generation -- read three bytes from
/dev/urandom, then take 52:54:00:(random_byte[0] &
0x7F):random_byte[1]:random_byte[2]. There is prior art in libguestfs
for reading /dev/urandom -- namely guestfs_int_random_string() --, but I
think it's just unnecessary here.)

Please comment! Thanks.

Comment 9 Laszlo Ersek 2023-07-13 12:32:55 UTC
The "--address 169.254.0.0" option (as generated by libvirt) seems wrong as well. It does not mean "assign the guest an IP address from this range with DHCP". Instead, it means "assign the guest *specifically* this IP address". And using an IP address ending in .0.0 is wrong for that: that's a network address, host addresses never terminate with 0 bits. (I've verified with virt-rescue from within the guest, using the libvirt backend -- the guest gets IP address 169.254.0.0 precisely. When passt is not in use, it gets 169.254.2.15 or something like that.)

There are stark, guest-visible differences between slirp and passt. The passt manual says, "giving the illusion that application processes residing on the guest are running on the local host, from a networking perspective. Built-in ARP, DHCP, NDP, and DHCPv6 implementations are designed to provide the guest with a network configuration that tightly resembles the host native configuration. With the default options, guest and host share IP addresses, routes, and port bindings". That's very different from slirp / qemu user networking.

In fact, the "tests/rsync/test-rsync.sh" test case is failing now. I'm learning of this only now, because with the libvirt backend, the test is skipped. It is enabled with the direct backend however, and then the assumption that slirp makes the host appear to the guest as IP addr 169.254.2.2 breaks down, and then the rsync-in operation fails -- the guest-side rsync client cannot connect to 169.254.2.2, because the host is not called 169.254.2.2.

Comment 10 Laszlo Ersek 2023-07-13 16:31:45 UTC
I've figured it out -- I've filed the following BZ about the libvirt issues:

https://bugzilla.redhat.com/show_bug.cgi?id=2222766

Comment 11 Laine Stump 2023-07-13 16:45:20 UTC
Laszlo - sorry, I just spent a bunch of typing typing up a response to your questions, and before I hit send I somehow must have reloaded the page or something, and the entire comment is now gone.

Definitely you're correct about --mac-addr, and I just sent a patch upstream for it:

https://listman.redhat.com/archives/libvir-list/2023-July/240713.html

As for --address, although I don't understand why slirp would want to have a configuration that allowed you to request "some random address on a specific subnet" when there will only ever be 2 IP addresses in that namespace (the guest's single IP address, and the default route), I do see that's what it's doing, so yes this is a semantic change when replacing SLIRP with passt. Sorry for not noticing this way back in the beginning (although I don't know if it would have changed the implementation in this case, more likely would have instead led to an addition to the documentation).

Anyway, I'll put any further discussion on the new BZ.

Comment 12 Laszlo Ersek 2023-07-13 17:11:36 UTC
[libguestfs PATCH 0/7] lib: support networking with passt
Message-Id: <20230713171052.123365-1-lersek>
https://listman.redhat.com/archives/libguestfs/2023-July/031984.html

Comment 13 Laszlo Ersek 2023-07-13 17:15:35 UTC
(In reply to Laine Stump from comment #11)
> Laszlo - sorry, I just spent a bunch of typing typing up a response to your
> questions, and before I hit send I somehow must have reloaded the page or
> something, and the entire comment is now gone.

Sorry to hear that; I've now grown to edit any nontrivial comment for bugzilla in an external editor. Both Firefox or Bugzilla have been very adept at destroying my work. Editing the comment text externally helps quite a bit; it still does not protect against Bugzilla throwing away metadata changes (non-comment fields cannot be edited externally).

I recommend the <https://github.com/jlebon/textern> extension for external comment editing.

Comment 14 Laszlo Ersek 2023-07-13 17:22:19 UTC
(In reply to Laine Stump from comment #11)

> Definitely you're correct about --mac-addr, and I just sent a patch upstream
> for it:
> 
> https://listman.redhat.com/archives/libvir-list/2023-July/240713.html

Thanks for CC'ing me on the patch, I'll comment in-thread.

Comment 15 Laszlo Ersek 2023-07-14 13:22:43 UTC
[libguestfs PATCH v2 0/7] lib: support networking with passt
Message-Id: <20230714132213.96616-1-lersek>
https://listman.redhat.com/archives/libguestfs/2023-July/032018.html

Comment 16 Laszlo Ersek 2023-07-14 16:00:49 UTC
(In reply to Laszlo Ersek from comment #15)
> [libguestfs PATCH v2 0/7] lib: support networking with passt
> Message-Id: <20230714132213.96616-1-lersek>
> https://listman.redhat.com/archives/libguestfs/2023-July/032018.html

Merged upstream as commit range 13c7052ff96d..02bbc9daa742.

Comment 17 Richard W.M. Jones 2023-07-14 19:52:12 UTC
This is available for testing in libguestfs-1.51.5-1.fc39 in Fedora Rawhide.


Note You need to log in before you can comment on or make changes to this bug.