Bug 2184967
Summary: | RFE: support passt for appliance networking, as QEMU userspace networking backend | ||
---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Laszlo Ersek <lersek> |
Component: | libguestfs | Assignee: | Laszlo Ersek <lersek> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | laine, mhicks, mprivozn, ptoscano, rjones, sbrivio |
Target Milestone: | --- | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-07-14 16:00:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Laszlo Ersek
2023-04-06 10:22:17 UTC
Unfortunately this looks complicated for the direct backend. The passt utility needs to be started in a separate child process, as far as I can tell. I'm not sure if we want an open-coded fork() for that, in launch_direct() [lib/launch-direct.c], or else if we can reuse guestfs_int_cmd_pipe_run() for this purpose. Then, composing the command line for passt is an exercise in tedium, if I am to judge after libvirt's qemuPasstStart() function [src/qemu/qemu_passt.c]. Yet another white spot: assuming the libguestfs parent process crashes, we have a "recovery" process (another child) in place for killing qemu (= the appliance). It's unclear whether the exiting of that qemu process (which means a disconnect from the unix domain socket that passt reads) will trigger passt to exit as well, or else we need to kill passt separately from the recovery process. The weight of libvirt really shows here. QEMU does not start (or stop) passt itself, it's either libvirtd or the user (similarly to how swtpm is started/stopped). So if we want to enable passt for the direct backend, we have to duplicate what libvirt does. Hi Laine, is the QEMU capabilities list available to libvirt clients? (See libvirt commit 5af6134e7068, "qemu: new capability QEMU_CAPS_NETDEV_STREAM", 2023-01-09.) Thanks. The QEMU_CAPS_* flags were only ever meant for internal consumption (used to implement underlying bits differently), and aren't available anywhere in the public libvirt API. This does have a silver lining, because periodically flags are deprecated, and the code beneath them cleaned up, when some feature reaches the stage that it's available in all currently-supported QEMUs, but it does make the situation more difficult for a consumer like libguestfs during that time that there are mixes of "supporting" and "unsupporting" QEMUs available. There are some features that are advertised in virConnectDomainCapabilities (the function behind "virsh domcapabilities"), but passt support isn't one of those. I guess we could add it in there, but that doesn't help much with excisting releases of libvirt/QEMU/passt. :-/ (Peter pointed out earlier today that domcapabilities has 0 information about support for *any* type of network interface) For all practical purposes, I think that passt became available on most platforms at or near the same time that versions of libvirt and qemu supporting passt became available; I'm not sure what OS releases there were that were new enough two have one without having the other two. It may be "good enough" to just see if passt is installed, or if libvirt xml accepts the new options to virDomainDefineXML and spits it back out in a following virDomainGetXMLDesc. I've enabled passt for the libvirt backend. It results in the following passt commandline (from libvirt): /usr/bin/passt --one-off --socket /run/user/1000/libvirt/qemu/run/passt/1-guestfs-6liaxjtmow95-net0.socket --mac-addr 52:54:00:41:b4:8e --pid /run/user/1000/libvirt/qemu/run/passt/1-guestfs-6liaxjtmow95-net0-passt.pid --address 169.254.0.0 --netmask 16 That is consistent with my prior reading of the passt manual. In particular, with regard to the direct backend, (1) the "--one-off" option, and (2) the fact that passt daemonizes unless told otherwise, are promising. (1) means that in the direct backend, we need not worry about "recovery" -- as soon as qemu exits (either gracefully, or because our "recovery" logic kills it), passt will exit too, upon seeing EOF on the socket. And (2) means that we should be able to use guestfs_int_cmd_run() for launching passt -- the daemonization will be handled by passt. Also I believe we'll have no use for the "--pid" option (PID file not needed). Setting prctl(PR_SET_PDEATHSIG) before we exec passt might be an idea too. (In fact it might be an idea for qemu, and get rid of the recovery process). Apparently it is preserved across exec. See also: https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/exit-with-parent.c Welp, that won't work. - We still need to kill passt if we start it up successfully, but we don't get as far as launching QEMU, for some reason. - That means we either need to pass --foreground to passt (so it remain our direct child, and we can kill it), or pass --pid to it (so we can learn the PID of the daemonized process, and kill *that*. - The problem with --foreground is that (a) libguestfs currently has no nice internal API for forking & execing *long-lived* children, and (b) --foreground does not work with --one-shot, as far as I can tell. - The problem with --pid is that passt is *only* willing to write its PID file to $XDG_RUNTIME_DIR, and not to (e.g.) /tmp. (See also the cmdline generated by libvirtd, in comment 4.) This is probably due to dropping privileges, or whatever. However, libguestfs has two "temp dir" concepts: a real "temp dir" for regular files, and a "sock dir" for unix domain sockets. The former is usually under /tmp (unless overridden), and the latter is by default under $XDG_RUNTIME_DIR. So for the PID file of passt, we need a *regular file* under the "sock dir", and that violates the guestfs APIs / documentation. In effect we'd have to call guestfs_int_create_socketname() -- or at least guestfs_int_lazy_make_sockdir() -- and place the PID file, a *regular file*, there. Laine, Stefano, I don't understand the "--mac-addr" option that libvirt generates for passt (see libvirt commit a56f0168d576, "qemu: hook up passt config to qemu domains", 2023-01-10). As far as I can tell, libvirt passes the virtio-net device's MAC address to passt. See in qemuPasstStart(): +int +qemuPasstStart(virDomainObj *vm, + virDomainNetDef *net) +{ ... + virCommandAddArgList(cmd, + "--one-off", + "--socket", passtSocketName, + "--mac-addr", virMacAddrFormat(&net->mac, macaddr), + NULL); That seems wrong to me. The virtio-net device's MAC address identifies the virtual NIC that the guest owns. Whereas passt's "--mac-addr" is documented as follows: > -M, --mac-addr addr > Use source MAC address addr when communicating to the guest or to > the target namespace. Default is to use the MAC address of the > interface with the first IPv4 default route on the host. In other words, it is supposed to impersonate a NIC that's *different* from the guest's, but is on the same Ethernet subnet. Considering that passt provides a DHCP server, this is the MAC addr that DHCP replies should appear from. Considering also that passt effectively enables routing for the guest (see the "IPv4 default route on the host" language in the above quote!), it stands to reason that this MAC address is supposed to belong to the *default gateway* / router on the Ethernet subnet that the guest's virtual NIC is attached to. Consider also SLIRP -- i.e., the facility that passt is replacing. If you look up "-netdev user" in the QEMU manual, it doesn't offer any property for setting the MAC address. Further note that libvirt does not pass any --gateway option to passt. > -g, --gateway addr > Assign IPv4 addr as default gateway via DHCP (option 3), or IPv6 > addr as source for NDP Router Adver‐ tisement and DHCPv6 messages. > This option can be specified zero (for defaults) to two times (once > for IPv4, once for IPv6). By default, IPv4 and IPv6 addresses are > taken from the host interface with the first default route for the > corresponding IP version. > > Note: these addresses are also used as source address for packets > directed to the guest or to the target namespace having a loopback > or local source address, to allow mapping of local traffic to guest > and target namespace. See the NOTES below for more details about > this mechanism. Then > NOTES > > Handling of traffic with local destination and source addresses > > Both passt and pasta can bind on ports with a local address, > depending on the configuration. Local destina‐ tion or source > addresses need to be changed before packets are delivered to the > guest or target namespace: most operating systems would drop > packets received from non-loopback interfaces with local > addresses, and it would also be impossible for guest or target > namespace to route answers back. > > For convenience, and somewhat arbitrarily, the source address on > these packets is translated to the address of the default IPv4 or > IPv6 gateway -- this is known to be an existing, valid address on > the same subnet. > > Loopback destination addresses are instead translated to the > observed external address of the guest or tar‐ get namespace. > For IPv6 packets, if usage of a link-local address by guest or > namespace has ever been ob‐ served, and the original destination > address is also a link-local address, the observed link-local > address is used. Otherwise, the observed global address is used. > For both IPv4 and IPv6, if no addresses have been seen yet, the > configured addresses will be used instead. > > For example, if passt or pasta receive a connection from > 127.0.0.1, with destination 127.0.0.10, and the de‐ fault IPv4 > gateway is 192.0.2.1, while the last observed source address from > guest or namespace is 192.0.2.2, this will be translated to a > connection from 192.0.2.1 to 192.0.2.2. > > Similarly, for traffic coming from guest or namespace, packets > with destination address corresponding to the default gateway will > have their destination address translated to a loopback address, > if and only if a packet, in the opposite direction, with a > loopback destination or source address, port-wise matching for > UDP, or connection-wise for TCP, has been recently forwarded to > guest or namespace. This behaviour can be disabled with > --no-map-gw. Thus, I believe that what libvirt should do is *not* pass "--mac-addr" at all -- that would be consistent with the *default* behavior of --gateway (which libvirt also doesn't pass). That means the guest will see itself to be on the same Ethernet subnet as the first routable NIC of the host, and will consider the host's first routable NIC as its gateway / router, meaning *both* IP *and* MAC addresses. Here's why this matters to me: I'm about to start "passt" manully, from the libguestfs direct backend. I intend to stay close to the passt command line that libvirt generates (see comment 4). For the "--mac-addr" option, *assuming* I want to pass it, I need to create *some* MAC address. Doing what libvirt does at the moment (see above) does not seem ideal, because (1) IMO that practice is simply wrong (see above), and (2) because in the libguestfs direct backend, we don't generate a MAC address for the *guest-side* NIC (= virtio-net) anyway; so if I wanted those two to match, I'd have to generate the MAC for the virtio-net NIC at first. My opinion is that neither libvirt nor the libguestfs direct backend should generate "--mac-addr" at all, for passt. (NB technically I could do the MAC generation -- read three bytes from /dev/urandom, then take 52:54:00:(random_byte[0] & 0x7F):random_byte[1]:random_byte[2]. There is prior art in libguestfs for reading /dev/urandom -- namely guestfs_int_random_string() --, but I think it's just unnecessary here.) Please comment! Thanks. The "--address 169.254.0.0" option (as generated by libvirt) seems wrong as well. It does not mean "assign the guest an IP address from this range with DHCP". Instead, it means "assign the guest *specifically* this IP address". And using an IP address ending in .0.0 is wrong for that: that's a network address, host addresses never terminate with 0 bits. (I've verified with virt-rescue from within the guest, using the libvirt backend -- the guest gets IP address 169.254.0.0 precisely. When passt is not in use, it gets 169.254.2.15 or something like that.) There are stark, guest-visible differences between slirp and passt. The passt manual says, "giving the illusion that application processes residing on the guest are running on the local host, from a networking perspective. Built-in ARP, DHCP, NDP, and DHCPv6 implementations are designed to provide the guest with a network configuration that tightly resembles the host native configuration. With the default options, guest and host share IP addresses, routes, and port bindings". That's very different from slirp / qemu user networking. In fact, the "tests/rsync/test-rsync.sh" test case is failing now. I'm learning of this only now, because with the libvirt backend, the test is skipped. It is enabled with the direct backend however, and then the assumption that slirp makes the host appear to the guest as IP addr 169.254.2.2 breaks down, and then the rsync-in operation fails -- the guest-side rsync client cannot connect to 169.254.2.2, because the host is not called 169.254.2.2. I've figured it out -- I've filed the following BZ about the libvirt issues: https://bugzilla.redhat.com/show_bug.cgi?id=2222766 Laszlo - sorry, I just spent a bunch of typing typing up a response to your questions, and before I hit send I somehow must have reloaded the page or something, and the entire comment is now gone. Definitely you're correct about --mac-addr, and I just sent a patch upstream for it: https://listman.redhat.com/archives/libvir-list/2023-July/240713.html As for --address, although I don't understand why slirp would want to have a configuration that allowed you to request "some random address on a specific subnet" when there will only ever be 2 IP addresses in that namespace (the guest's single IP address, and the default route), I do see that's what it's doing, so yes this is a semantic change when replacing SLIRP with passt. Sorry for not noticing this way back in the beginning (although I don't know if it would have changed the implementation in this case, more likely would have instead led to an addition to the documentation). Anyway, I'll put any further discussion on the new BZ. [libguestfs PATCH 0/7] lib: support networking with passt Message-Id: <20230713171052.123365-1-lersek> https://listman.redhat.com/archives/libguestfs/2023-July/031984.html (In reply to Laine Stump from comment #11) > Laszlo - sorry, I just spent a bunch of typing typing up a response to your > questions, and before I hit send I somehow must have reloaded the page or > something, and the entire comment is now gone. Sorry to hear that; I've now grown to edit any nontrivial comment for bugzilla in an external editor. Both Firefox or Bugzilla have been very adept at destroying my work. Editing the comment text externally helps quite a bit; it still does not protect against Bugzilla throwing away metadata changes (non-comment fields cannot be edited externally). I recommend the <https://github.com/jlebon/textern> extension for external comment editing. (In reply to Laine Stump from comment #11) > Definitely you're correct about --mac-addr, and I just sent a patch upstream > for it: > > https://listman.redhat.com/archives/libvir-list/2023-July/240713.html Thanks for CC'ing me on the patch, I'll comment in-thread. [libguestfs PATCH v2 0/7] lib: support networking with passt Message-Id: <20230714132213.96616-1-lersek> https://listman.redhat.com/archives/libguestfs/2023-July/032018.html (In reply to Laszlo Ersek from comment #15) > [libguestfs PATCH v2 0/7] lib: support networking with passt > Message-Id: <20230714132213.96616-1-lersek> > https://listman.redhat.com/archives/libguestfs/2023-July/032018.html Merged upstream as commit range 13c7052ff96d..02bbc9daa742. This is available for testing in libguestfs-1.51.5-1.fc39 in Fedora Rawhide. |