In openQA testing of yesterday's Rawhide compose, all the kickstart tests failed. So did the tests that use an updates image that is hosted on a network server. In each case, the test failed because the system failed to boot to the installer, instead booting to the dracut rescue prompt.
I think what's going on here is any scenario which requires the network to be brought up during the initramfs phase - which includes the use of a kickstart or updates image retrieved over the network - causes the boot to fail.
Here are the failed tests:
I'm blaming this on NetworkManager because the tests passed on the previous compose (20190604.n.0) and neither anaconda nor dbus nor any other obvious suspect changed in the 0609.n.1 compose. NetworkManager *did* change, and the changelog looks a bit suspicious for this bug:
* Tue Jun 04 2019 Lubomir Rintel <email@example.com> - 1:1.20.0-0.2
- Update the 1.20.0 snapshot
- Re-enable the initrd generator
Those sure look like relevant changes to me.
This is pretty easy to reproduce: just download an installer image from the 20190609.n.1 compose - e.g. https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190609.n.1/compose/Server/x86_64/iso/Fedora-Server-dvd-x86_64-Rawhide-20190609.n.1.iso - boot it, and add a kickstart or updates.img from a network server to the boot options. e.g. add 'inst.ks=http://fedorapeople.org/groups/qa/kickstarts/firewall-configured-net.ks' . That should be enough to trigger the bug.
Proposing as a Beta blocker as a violation of "The installer must be able to use all available kickstart delivery methods" - https://fedoraproject.org/wiki/Fedora_30_Beta_Release_Criteria#Kickstart_delivery .
Looking at the journal from the rescue shell, there seems to be a cycle of NetworkManager starting up, running into three dbus errors because dbus is not running (I'm not sure whether that's expected or not in the initramfs environment), exiting with the network device in state 'disconnected', then restarting and going through the whole cycle again. It does this hundreds of times. The end of the process looks like this:
device (ens3): carrier: link connected
manager: (ens3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
device (ens3): state change: unmanaged -> unavailable
sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
firewall: could not connect to system D-Bus (Could not connect: No such file or directory)
ifcfg-rh: dbus: couldn't initialize system bus: Could not connect: No such file or directory
device (ens3): state change: unavailable -> disconnected
manager: startup complete
quitting now that startup is complete
Then a half second later it starts up again:
NetworkManager (version 1.20.0-0.2.fc31) is starting... (after a restart)
and goes through the same process.
Thanks for the report. The fix for dracut is here: https://github.com/dracutdevs/dracut/pull/578
If the dracut maintainers will be willing to review and apply the patch I'd prefer if we didn't revert the change in NetworkManager.
It just so happens I'm a proven packager. Soo...;)
Let's see how the next compose goes.
(In reply to Adam Williamson from comment #3)
> It just so happens I'm a proven packager. Soo...;)
Ah, okay, me too, but I thought this sort of thing should get an upstream ack.
Guess this is all right, thanks for doing that.
eh, if upstream doesn't like it he can take it out again. :P I like composes that work!
Unfortunately we're not getting any composes at all ATM, I think partly because of the libgit2 module drama, so don't know if this is fixed yet.
I suspect bug #1725872 might be another variant of this one ...
This one was actually fixed by the change I made back on June 12, thanks for the reminder to close it :)