Bug 1657041 - libvirt-daemon-config-network %post expects network access, which won't work on silverblue/rpm-ostree
Summary: libvirt-daemon-config-network %post expects network access, which won't work ...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1352154
TreeView+ depends on / blocked
 
Reported: 2018-12-06 22:21 UTC by Colin Walters
Modified: 2019-08-13 19:29 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Colin Walters 2018-12-06 22:21:40 UTC
For rpm-ostree, we are trying to support generating OSTree commits server side which are replicated to client systems.

This implies that everything done in %post must be system-independent and ideally predictable and bitwise reproducible.

What's going on in the current https://src.fedoraproject.org/rpms/libvirt/blob/master/f/libvirt.spec#_1400
violates this.

The general fix is to move these types of things to on daemon startup.  For example, you could create a `libvirt-daemon-config-network-init.service` systemd unit with ConditionPathExists=!/etc/libvirt/qemu/networks/default.xml, and introspect the network there.

Comment 1 Colin Walters 2018-12-06 22:24:42 UTC
Taking this to the next level, it doesn't make sense to encode *dynamic* networking state into a persistent data store in /etc.  Rather, the systemd unit could do the probing and then write the config file into /run/libvirt or so.

Comment 2 Steve Milner 2018-12-06 22:26:27 UTC
I hit this trying to do some virt based testing on Fedora Silverblue.

Comment 3 Laine Stump 2018-12-07 22:28:38 UTC
It's not really dynamic - once it's set, it should remain consistent across host reboots. The only reason that it's not set in stone in the original source files is that the "factory" choice of subnet may not work for some hosts.

For example (and this is the specific situation that led to the current code in the specfile %post - for a *very* long history, see Bug 1146232), if you install libvirt in a virtual machine whose network connection is via the libvirt default network on the L0 host, then it will already have a network connection using 192.168.122.0/24, and if the L1 host creates a new bridge in the virtual machine with address 192.168.122.1/24 (i.e. the same bridge as is created in the L0 host), this will lead to network connectivity being lost for the guest. So we have to do *something* to make sure the choice for subnet of the network is usable.

But if we try to make the choice at the time libvirtd is started, that sometimes works and sometimes doesn't, because libvirtd.service is started before the network is guaranteed to be fully up (so we might *think* it's okay to use a particular subnet because libvirtd happened to start up more quickly during the first start after install, but then the host network config would later start up a conflicting interface (or add a conflicting route).

So we added the code that is in the specfile's %post - because the install is usually running when the system is fully started up, it's more likely that any conflicting interface/route would have already been started.

However, this method has its own problems, since there are situations when the network environment at libvirt install time is different from the network environment at the time it is run  - the most annoying example is the Fedora Live CD image, which is created in some sort of container somewhere, and could later be run in a virtual machine connected to a host's libvirt virtual network.

But we don't really want to make libvirtd.service wait until the networking subsystem is fully up before it starts - someone might have networking infrastructure that uses virtual machines, and that would fail miserably if libvirtd.service couldn't start up until networking was fully up.

In spite of that, I've toyed with the idea of the config having a "super double secret probationary default" option (props to Animal House) that could be initially set for a network, and then the first time that network was started, it would delay until "networking is up" (whatever that means - I think the systemd target is different depending on whether or not NetworkManager is enabled, and we definitely don't want to make libvirt require NetworkManager!)

Anyway, TL;DR - 1) we can't have a dynamic address stored in /var/run because once chosen, the subnet must remain consistent across subsequent reboots, but 2) we have thought about the idea of somehow delaying the selection of subnet until the first run of libvirtd. 3) In the past we hadn't done that because it *still* doesn't solve the problem for everyone, but 4) this BZ may give us reason to look into it again.

Comment 4 Colin Walters 2018-12-10 15:54:06 UTC
> However, this method has its own problems, since there are situations when the network environment at libvirt install time is different from the network environment at the time it is run  - the most annoying example is the Fedora Live CD image, 

That's what this bug is about, yes.  rpm-ostree based systems (Fedora Atomic Host, Fedora Silverblue) *always* work this way.

In fact, when rpm-ostree runs scripts today, we disable networking:
https://github.com/projectatomic/rpm-ostree/blob/f811828543d46ca7264e6616dca29f39d715d4e1/src/libpriv/rpmostree-bwrap.c#L305

Note this even occurs on the *client* side.  I'm typing this from a Fedora Silverblue system which doesn't have libvirt by default, but I `rpm-ostree install libvirt`.

When the libvirt %post script runs on my local system, it won't see any network interfaces at all.

Comment 5 Daniel Berrangé 2018-12-10 15:56:05 UTC
(In reply to Colin Walters from comment #4)
> Note this even occurs on the *client* side.  I'm typing this from a Fedora
> Silverblue system which doesn't have libvirt by default, but I `rpm-ostree
> install libvirt`.
> 
> When the libvirt %post script runs on my local system, it won't see any
> network interfaces at all.

I can understand not having network when building images, but how is libvirt going to provide network connectivity for guests if it isn't given any network interfaces on client systems ?

Comment 6 Colin Walters 2018-12-14 13:42:07 UTC
> but how is libvirt going to provide network connectivity for guests if it isn't given any network interfaces on client systems ?

I was talking about the %post script.  rpm-ostree doesn't change how systemd units are run in any way.

https://bugzilla.redhat.com/show_bug.cgi?id=1657041#c0
specifically mentions moving the network detection to a systemd unit.

Comment 7 Ben Cotton 2019-08-13 16:56:15 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 8 Ben Cotton 2019-08-13 19:29:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.


Note You need to log in before you can comment on or make changes to this bug.