Created attachment 1625359 [details] dnf.librepo.log Description of problem: upgrade to Fed30 never boots to new kernel , does not install Fed30 pkgs, remains at Fed29 Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1.install Fed29 from LXDE spin 2.dnf update 3.dnf system-upgrade download --refresh --releasever=30 --allowerasing 4.dnf system-upgrade reboot Actual results: reboot to Fed29 kernel before reaching user runlevel does shutdown and reboots still no Fed30 no delay while installing new packages boots to Fed 29 as usual Expected results: after initial reboot , reboot to Fed30 kernel , install new F30 pkgs with msg "this may take a while". User login to F30. Additional info:
Created attachment 1625360 [details] dnf.log adding dnf.log
This may be a dumb question, but you did actually run dnf system-upgrade reboot, right? I'm only asking because there's no indication of that in the logs.
Yes, I've been round this loop at least 5 times now trying to see something I missed. here's a snip from root history: 996 dnf update 997 dnf system-upgrade download --refresh --releasever=30 --allowerasing 998 dnf --refresh upgrade 999 dnf system-upgrade download --refresh --releasever=30 --allowerasing 1000 dnf system-upgrade reboot I reiterate , as reported, it does a partial reboot and then restarts itself, it is not just doing a normal reboot.
Hang on, I'll system-upgrade reboot again. I think that log was since the last reboot since you said it was not sufficient. I'll reboot and post again.
Created attachment 1625374 [details] new dnf.librepo.log
That log should now include output for : dnf system-upgrade download --refresh --releasever=30 --allowerasing dnf system-upgrade reboot
It seems that the reboot only produced a couple of added lines. As with the log file I posted on the other thread, where I had cleared the file just before the reboot command. Even though that looked erroneous, it is consistent with what just happened again. the download command seems to go as normal, gets everything, checks and runs the transaction. That output is all familiar to me. After reboot I don't see any obvious error msgs but it all flies by so quickly it's hard to note anything of import.
Well, I'm afraid I can't help. I'm the zchunk developer, not the DNF system-upgrade plugin developer. You've successfully downloaded the packages which means you're well past the step where zchunk is used, so it must be something in the system-upgrade plugin. It might be worth checking the journal to see if there's anything obvious there, but you might be best off attempting to upgrade using a different method.
OK, thanks for taking a look. Hopefully one of the relevant devs will pick this up before it gets swept under rug by an EOL robot. How do I get the journal output for that operation? For "a different method" do I have any choices apart from ditching what I have, doing a fresh installation and trying to piece together all that changes I've made since installing Fed23 ?!
Hello Federic, Ales Matej made the following update to dnf-plugin-extras in Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2019-df11921f9a It contains fix for a similar looking bug. Can you please try the system-upgrade again with version dnf-plugin-system-upgrade-4.0.7-1.fc29 (either from updates-testing repository or wait until it gets to stable)? > How do I get the journal output for that operation? Useful information for system-upgrade are in "/var/log/dnf.log" and the output of "dnf system-upgrade log --number=1".
thanks, I've added plugin update . I will test when I have some downtime. dnf system-upgrade log --number=1 Error: Cannot find logs with this index. ??
(In reply to Federic from comment #11) > thanks, I've added plugin update . I will test when I have some downtime. Great, thanks. > dnf system-upgrade log --number=1 > Error: Cannot find logs with this index. > > ?? It's probably because there are no logs (you can run "dnf system-upgrade log" to check if there are any). It can happen if the system-upgrade crashes at the very beginning and that's probably the case here.
dnf system-upgrade log The following boots appear to contain upgrade logs: -- no logs were found -- I probably cleared all old content in /var/log some time back, since disk space is a bit tight. I know I did that for 28/29 upgrade, and I think once since for general housecleaning duties. I need to install a new disk and clone the root fs , I do not have my usual confidence in the upgrade going without problems.
Just been around the loop again. dnf system-upgrade download --refresh --releasever=30 --allowerasing dnf system-upgrade reboot No update happened, just as before. dnf system-upgrade log --number=1 Error: Cannot find logs with this index.
I tried to upload dnf.log but keep getting server 500 error. File is now 3.8MB , is there a limit? How many lines are required. currently 32000 lines.
ls -ail /boot/grub2/grubenv 7318 lrwxrwxrwx. 1 root root 25 Oct 4 2018 /boot/grub2/grubenv -> ../efi/EFI/fedora/grubenv thus h/w boots BIOS ! Is it possible that some changes are written here which are not referenced due to it being a BIOS booting system? Are the multiple issues with GRUB_ENABLE_BLSCFG relevant here also?
I had three kernel versions installed. I just removed the oldest one yet it still appears in Grub menu. this may be part of what is preventing system-upgrade from workig.
I'm becoming more convinced this is a BLS issue I had previously set GRUB_ENABLE_BLSCFG=False because it seemed there were a lot of unresolved/ unexpected outcomes. I figured I'd wait until the bugs got resolved before trying it. It seems that grubby-deprecated has not disappeared from the Fed29 repos. Quietly pulling the rug from under anyone who held back on BLS. I ran grub2-switch-to-blscfg as suggested here. Not sure what if anything it did. /boot/grub2/grub.conf still has a datestamp from yesterday. https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault I rebooted and still see the now removed kernel kernel options still showing in the grub menu. If grub2 is not getting correctly set I'm guessing that the new F30 kernel is not getting installed ( or not included in grub ) and there is little chance of the upgrade continuing if it does not boot to new kernel. I've added 'medium' severity since not being able to upgrade seems a fairly important problem, even though system remains functional at fed29. Can someone give some guidance on how to get this back into a functional state so that the upgrade can be completed? thx.
Created attachment 1627737 [details] new dnf.log dnf.log was cleared before doing dnf update and reboot commands. 11k lines and not errors I can see but still returns the following: dnf system-upgrade log --number=1 Error: Cannot find logs with this index. If the upgrade process is not even starting how can I determine what is preventing that? Obviously all the preamble of getting and checking pkgs is working and preparing the update "transaction" is fine. What triggers the update process on reboot and how can I find out what is blocking? It certainly does not seem to be much.
There were multiple bugs reported on system-upgrade from Fedora 29 to 30 that were reassigned to grub2 (bz 1686059, bz 1707199, bz 1720911) and based on the comment 18, this seems like a connected issue. I am reassigning to grub2 to investigate there.
(In reply to Federic from comment #18) > I'm becoming more convinced this is a BLS issue > > I had previously set GRUB_ENABLE_BLSCFG=False because it seemed there were a > lot of unresolved/ unexpected outcomes. I figured I'd wait until the bugs > got resolved before trying it. It seems that grubby-deprecated has not > disappeared from the Fed29 repos. Quietly pulling the rug from under anyone > who held back on BLS. > > I ran grub2-switch-to-blscfg as suggested here. Not sure what if anything it > did. /boot/grub2/grub.conf still has a datestamp from yesterday. > https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault > > I rebooted and still see the now removed kernel kernel options still showing > in the grub menu. > > If grub2 is not getting correctly set I'm guessing that the new F30 kernel > is not getting installed ( or not included in grub ) and there is little > chance of the upgrade continuing if it does not boot to new kernel. > > I've added 'medium' severity since not being able to upgrade seems a fairly > important problem, even though system remains functional at fed29. > > Can someone give some guidance on how to get this back into a functional > state so that the upgrade can be completed? > > thx. Not sure I understood what's your current setup now. To what value do you have GRUB_ENABLE_BLSCFG set now and do you have the grubby-deprecated installed or not?
(In reply to Pavla Kratochvilova from comment #20) > There were multiple bugs reported on system-upgrade from Fedora 29 to 30 > that were reassigned to grub2 (bz 1686059, bz 1707199, bz 1720911) and based > on the comment 18, this seems like a connected issue. I am reassigning to > grub2 to investigate there. Are we sure they are not facing the known issue of the installed GRUB being too old and needed to be updated? https://fedoraproject.org/wiki/Common_F30_bugs#GRUB_boot_menu_is_not_populated_after_an_upgrade
First of all this box was installed at Fed23 and so the Fed20 or older condition does not apply. grubby-deprecated is no longer available !! Nice move pulling it from the repos when it is still needed because of this buggy behaviour. I did have GRUB_ENABLE_BLSCFG=false that worked last time I hit this issue , now seems I have no choice. dnf install grubby-deprecated Fedora Modular 29 - x86_64 73 kB/s | 24 kB 00:00 Fedora Modular 29 - x86_64 - Updates 38 kB/s | 23 kB 00:00 Fedora 29 - x86_64 - Updates 84 kB/s | 23 kB 00:00 Fedora 29 - x86_64 42 kB/s | 24 kB 00:00 RPM Fusion for Fedora 29 - Free tainted 36 kB/s | 9.6 kB 00:00 RPM Fusion for Fedora 29 - Free - Updates 37 kB/s | 9.6 kB 00:00 RPM Fusion for Fedora 29 - Free 38 kB/s | 10 kB 00:00 RPM Fusion for Fedora 29 - Nonfree - Updates 18 kB/s | 9.8 kB 00:00 RPM Fusion for Fedora 29 - Nonfree 37 kB/s | 11 kB 00:00 No match for argument: grubby-deprecated Error: Unable to find a match I ran grub2-switch-to-blscfg but the upgrade problem did not change. Maybe someone who understands the guts of this problem can suggest a way clear it. Thanks for any suggestions.
Currently I have grubby installed but not grubby-bls
(In reply to Federic from comment #23) > First of all this box was installed at Fed23 and so the Fed20 or older > condition does not apply. > Yes, but I would still update GRUB with grub2-install /dev/sda (or whatever is your block device) to the latest installed version just to be sure. > grubby-deprecated is no longer available !! Nice move pulling it from the > repos when it is still needed because of this buggy behaviour. > The grubby-deprecated package is available in F30 onwards. > I did have GRUB_ENABLE_BLSCFG=false that worked last time I hit this issue , > now seems I have no choice. > > dnf install grubby-deprecated > Fedora Modular 29 - x86_64 73 kB/s | 24 kB 00:00 > > Fedora Modular 29 - x86_64 - Updates 38 kB/s | 23 kB 00:00 > > Fedora 29 - x86_64 - Updates 84 kB/s | 23 kB 00:00 > > Fedora 29 - x86_64 42 kB/s | 24 kB 00:00 > > RPM Fusion for Fedora 29 - Free tainted 36 kB/s | 9.6 kB 00:00 > > RPM Fusion for Fedora 29 - Free - Updates 37 kB/s | 9.6 kB 00:00 > > RPM Fusion for Fedora 29 - Free 38 kB/s | 10 kB 00:00 > > RPM Fusion for Fedora 29 - Nonfree - Updates 18 kB/s | 9.8 kB 00:00 > > RPM Fusion for Fedora 29 - Nonfree 37 kB/s | 11 kB 00:00 > > No match for argument: grubby-deprecated > Error: Unable to find a match > Right for F29 the default was still not BLS, so the package that's now grubby-deprecated is still just called grubby. > I ran grub2-switch-to-blscfg but the upgrade problem did not change. > So you wanted to enable BLS for F29? I thought you mention that wanted to disable it for F30. > Maybe someone who understands the guts of this problem can suggest a way > clear it. > BLS was not the default for F29 so I wouldn't enable there. Just remove the GRUB_ENABLE_BLSCFG from /etc/default/grub) and do the upgrade. If you have any issues, there will be a grub.cfg.rpmsave that contains a copy of your old grub.cfg so you could just do from the GRUB prompt: grub> configfile /grub2/grub.cfg.rpmsave Boot your machine and disable BLS (set GRUB_ENABLE_BLSCFG=false in /etc/default/grub) and install the grubby-deprecated package. If BLS works correctly, then I would just leave BLS enabled and use the grubby package installed by F30 (that used to be grubby-bls in F29).
many thanks for the input. I commented out the GRUB_ENABLE_BLSCFG line. grub2-install /dev/sda grub2-mkconfig -o /boot/grub2/grub.cfg here hoping I still have a Fedora system ...
Well that was fun, now I've blown a hole in grub as well . I did manage to reboot from the grub shell using configfile command, so it looks like just grub2-install did not go to plan. So now how do I repair grub. I have /boot on /dev/sda3 /root on /dev/sda6 I can not recall whether grub was installed on /dev/sda before since it's probably 10 years since I needed to do that. Even if it was, it should be now. So why do I get dumped straight to grub> Thanks for suggestions.
just to clarify I was able to boot with the new grub.cfg. So it just seems the grub2-install went wrong. grub> configfile (hd0,3)/grub2/grub.cfg
checked the BIOS and it was set to boot from sdb . Having reinstalled grub to that device, it reboots. I will now try the upgrade again and report back.
OK, so after that little detour, I now have up to date grub2 and the bug is still identical: dnf system-upgrade log --number=1 Error: Cannot find logs with this index.
This message is a reminder that Fedora 29 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '29'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 29 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
yes, I know Fed29 is reaching EOL, THAT IS WHY I HAVE BEEN TRYING TO UPGRADE FOR THE LAST MONTH ! "BLS was not the default for F29 so I wouldn't enable there. Just remove the GRUB_ENABLE_BLSCFG from /etc/default/grub) and do the upgrade." no BLS did not make change this one iota. "Are we sure they are not facing the known issue of the installed GRUB being too old and needed to be updated?" as stated in #9 this system was installed at fed23 so it never was eligible for that 'old grub' bug. Reinstalling current made not difference. Reassigning this to dnf-system-upgrade since I see no evidence this is in fact grub related or tied to existing grub issues. no input from grub2 maintainers. dnf system-upgrade log --number=1 Error: Cannot find logs with this index. I have a system upgrade which is not even starting and only adds two lines to dnf.log , I see no evidence for passing this to grub unless anyone can point to something grub related or direct evidence grub is failing. I would like some help before this gets EOLed. if I have to scrap my current system and reinstall afresh because of this I doubt it will fedora which gets installed. I have remote systems which I maintain where fedora's rolling upgrades are an advantage. But that needs to work and I need to know there is support when it doesn't . getting ELDed is not an acceptable outcome.
Thanks for trying the grub2 fix. It saved me some time by not trying it. My problem is a failing f29->f30 upgrade, but on a UEFI system. The symptoms are slightly different, The download Errors out with a message saying that I am already at f30 sudo dnf system-upgrade download --refresh --releasever=30 [sudo] password for user1: Before you continue ensure that your system is fully upgraded by running "dnf --refresh upgrade". Do you want to continue [y/N]: y Error: Need a --releasever greater than the current system version. My boot menu only contains f29 entries. If I could just populate my boot menu with the f30 entries that are already loaded, maybe my system would boot to f30. See https://bugzilla.redhat.com/show_bug.cgi?id=1711710 Have fun
> Thanks for trying the grub2 fix. It saved me some time by not trying it. Actually I would still run grub2-install. There are clearly some ( erroneous ) assumptions being made by devs about the state installations are in and what changes need to be taken into account. Reducing the number of possible differences seems like a good idea. At least having the current grub installed in MBR will mean next time they assume what you have installed , it will be nearer the truth.
I finally got out of the deadlock on this by using : dnf --releasever=30 distro-sync This did install all the packages and installed kernel files in /boot but did not modify the grub menu and thus still booted to fed29 kernel ( running fed30 ). I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg and was able to boot to fed30 kernel. I should note that I had GRUB_ENABLE_BLSCFG=false in /etc/default/grub This did not prevent grubby getting installed and did not cause grubby-deprecated to be installed. However: #ls /boot/loader/entries 87ebc272236b419da0c6cab75da6ddad-0-rescue.conf 87ebc272236b419da0c6cab75da6ddad-5.3.8-200.fc30.x86_64.conf #find /boot/ -iname "grub.cfg" /boot/grub2/grub.cfg apart from the file I created by hand, it does not seem to have changed or created a new grub.conf. It is not surprising that the update script does not proceed with the packages if it does not get the correct kernel on reboot. Is GRUB_ENABLE_BLSCFG=false in /etc/default/grub something that is not handled correctly when upgrading fed29
(In reply to Federic from comment #35) > I finally got out of the deadlock on this by using : > > dnf --releasever=30 distro-sync > > This did install all the packages and installed kernel files in /boot but > did not modify the grub menu and thus still booted to fed29 kernel ( running > fed30 ). > > I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg and was able to boot > to fed30 kernel. > > I should note that I had GRUB_ENABLE_BLSCFG=false in /etc/default/grub > > This did not prevent grubby getting installed and did not cause > grubby-deprecated to be installed. > > However: > > #ls /boot/loader/entries > 87ebc272236b419da0c6cab75da6ddad-0-rescue.conf > 87ebc272236b419da0c6cab75da6ddad-5.3.8-200.fc30.x86_64.conf > > #find /boot/ -iname "grub.cfg" > /boot/grub2/grub.cfg > > apart from the file I created by hand, it does not seem to have changed or > created a new grub.conf. > > It is not surprising that the update script does not proceed with the > packages if it does not get the correct kernel on reboot. > > Is GRUB_ENABLE_BLSCFG=false in /etc/default/grub something that is not > handled correctly when upgrading fed29 The default is to switch to a BLS configuration for the reasons that Chris mentioned. If a user explicitly set GRUB_ENABLE_BLSCFG=false, then the grub2-switch-to-blscfg won't be executed when the grub2 updated but as you said the grubby-deprecated is needed to add new boot entries. As a workaround the grub.cfg can be re-generated using grub2-mkconfig and future entries will be added by the grubby-deprecated package. That needs to be improved, Chris suggested just calling grub2-mkconfig for each kernel install if GRUB_ENABLE_BLSCFG=false which probably makes sense. That way the old grubby tool could be deprecated and users that don't want to use a BLS configuration won't need any additional tool to add new boot entries. The drawback is that os-prober will be executed for each kernel install and that's fragile since it mounts all the available block devices to search for other OS. Avoiding calling os-prober on each kernel install was one of the reasons why the old grubby tool existed and why the BLS configuration was introduced so the configuration could be modified using drop-ins files in a directory instead of editing a monolithic configuration file.
OK, so it is a know issue ( feature? ) that setting GRUB_ENABLE_BLSCFG=false in Fed29 will prevent upgrade from going ahead? Why was no one able to point that out before I spent 2 months messing around and discovered it for myself? Maybe that should be in the upgrade notes. https://fedoraproject.org/wiki/Common_F30_bugs#GRUB_boot_menu_is_not_populated_after_an_upgrade "DO NOT set GRUB_ENABLE_BLSCFG=false in Fed29: this will prevent upgrade from working. " I had read there were issues and decided I definitely did not want BLS so I set this line /etc/default/grub , there was no indication that this should not be done or would break the update. If that was a "feature" why was it not clearly documented? Hopefully this can be fixed.
There is the added issue that there is apparently no output about this fault condition. The upgrade fails to even start so something has been detected which prevents this but there is no output to note that it bailed out or as what the error was. This is primarily a dnf issue since if it produced a message on console and in logs, it would have been starting point to finding the underlying cause. A lot of time was wasted just determining exactly what was ( or was not ) happening before even being able the investigate the cause. If grubby is know to fail when GRUB_ENABLE_BLSCFG=false is set in fed 29 then this also could trapped instead of silently failing and letting everyone pick out the bones. It would not be beyond the means of a script to temporarily remove it for the update then reinstated it later, though simply halting the upgrade and recommending the user change it would be simplest.
(In reply to Federic from comment #35) > I finally got out of the deadlock on this by using : > > dnf --releasever=30 distro-sync > > This did install all the packages and installed kernel files in /boot but > did not modify the grub menu and thus still booted to fed29 kernel ( running > fed30 ). I think I must have done something like this because I have both fc29 and fc30 files > > I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg and was able to boot > to fed30 kernel. Ahh, but that seems like a BIOS boot. I am (and have been for years) running an EFI boot.
> Ahh, but that seems like a BIOS boot. I am (and have been for years) running an EFI boot. That is normal since this is my bug report and I'm running BIOS. I posted the EFI equivalent on your bug.
This bug seems to be persistent. I just tried to upgrade the same system to Fed31 by the usual system-upgrade plugin and it fails just as before and reboots to Fed30 with no Fed31 kernel. While I could use the same trick and use distro-sync but why is the update plugin broken and why is there no will to fix it?
(In reply to Federic from comment #41) > This bug seems to be persistent. > > I just tried to upgrade the same system to Fed31 by the usual system-upgrade > plugin and it fails just as before and reboots to Fed30 with no Fed31 > kernel. > > While I could use the same trick and use distro-sync but why is the update > plugin broken and why is there no will to fix it? Was the entry for F31 added? You mentioned that set GRUB_ENABLE_BLSCFG=false in /etc/default/grub. I assume that you also have the grubby-deprecated package installed?
GRUB_ENABLE_BLSCFG=false , yes, grubby-deprecated package installed? yes. Usual upgrade proceedure: dnf system-upgrade download --refresh --releasever=31 --allowerasing All downloads fine. dnf system-upgrade reboot as before no new kernel, in /boot/ no new grub entry, direct boot to existing version, no upgrade of pkgs.
(In reply to Federic from comment #43) > GRUB_ENABLE_BLSCFG=false , yes, grubby-deprecated package installed? yes. > Any reasons why you are not enabling the BLS configuration and not using grubby-deprecated? That's the default now and what gets more testing. Diverging from the default usually leads to issues. > Usual upgrade proceedure: > > dnf system-upgrade download --refresh --releasever=31 --allowerasing > > All downloads fine. > > dnf system-upgrade reboot > Having said that, I don't see why disabling BLS and using grubby-deprecated would cause any issues. I'll try to reproduce your problem. > as before no new kernel, in /boot/ no new grub entry, direct boot to > existing version, no upgrade of pkgs.
My own problem may have been due to my being unaware of the changes in boot and grub.cfg process. I was thinking that the menu was in grub.cfg, but I think it is constructed on-the-fly at boot time. The fact that the date on grub.cfg is old is misleading. You might increase the GRUB_TIMEOUT=10 in /etc/default/grub (from 5 to 10 for example) so you can better eyeball the boot menu. Check also the GRUB_DEFAULT=0 in the same file. In my file, the value was 'saved', which doesn't say much about the value at boot time (saved where?). I think the saved value was 3, which was selecting an old kernel. When you boot, try to see whether there really are fc30/fc31 kernels which you can boot. --- I was able to beat the boot into submission and I am now running fc31 on two of my systems.
Fedoric, Bob, you two really need to stick to your own bugs. The cross posting is confusing. Federic tells Bob to do a grub2-install which is really bad advice on UEFI and is certain to cause more confusion than it will solve because on UEFI Fedora supplies a premade grubx64.efi. The way to reinstall GRUB on UEFI is 'dnf reinstall grub2-efi shim' per: https://fedoraproject.org/wiki/GRUB_2#Install_the_bootloader_files The central part of this bug is this comment still dnf system-upgrade, not BLS and not GRUB. (In reply to Federic from comment #43) > GRUB_ENABLE_BLSCFG=false , yes, grubby-deprecated package installed? yes. > > Usual upgrade proceedure: > > dnf system-upgrade download --refresh --releasever=31 --allowerasing > > All downloads fine. > > dnf system-upgrade reboot > > as before no new kernel, in /boot/ no new grub entry, direct boot to > existing version, no upgrade of pkgs. What I can tell you Federic, is that there are thousands of dnf system-upgrade tests done per cycle. There is no general problem with it, so the fact you're having a problem is certainly some sort of local customization you have that's confusing the plug and results in failure to trigger the upgrade. The only possible way it gets fixed is to discover why it's failing to get triggered, there's no magical way it's just going to happen. The lack of dnf logs is a problem. 'dnf system-upgrade reboot' should cause a symlink to be created: /system-update -> /var/lib/system-update So it's worth checking that it exists or not, and that also /var/lib/system-update exists and is populated with all the downloaded files. And then also possibly useful is attaching the journal for the boot that should have been an upgrade, but instead booted normally without an upgrade. The way to get that journal is to just do 'dnf system-upgrade reboot' and at the next reboot do 'journalctl -o short-monotonic > bug1761251_journal.log' and attach that log to this bug. If at all possible, immediately after the reboot and you see the GRUB menu, if you can edit the boot entry, and add to the end of the kerneal parameters line, systemd.log_level=debug it will temporarily add a lot of debugging information that will then be included in the journal that you'll attach including possibly why the offline upgrade isn't being triggered.
Possible duplicates: https://bugzilla.redhat.com/show_bug.cgi?id=1764169 https://bugzilla.redhat.com/show_bug.cgi?id=1770703 Federic, check traceback for dnf crashing in those cases. And then search the journal you'll create from the steps in comment 46 for evidence of a similar problem. If it's a match, then you can optionally try the recommendation in https://bugzilla.redhat.com/show_bug.cgi?id=1770703#c1 and report back whether it fixes the problem or not. If it's not a match, then it might not be a dup of either of those.
thanks for the input Chris. Yes, I agree it may some corner case condition which is triggering this. >'dnf system-upgrade reboot' should cause a symlink to be created: >/system-update -> /var/lib/system-update Firstly, could you verify that is update and not upgrade ?? I have /var/lib/dnf/system-upgrade populated with all the files. Also I had to symlink the download location to another partition since /var is on rootfs and there was not enough space to run the upgrade. ls -ail /var/lib/dnf 1572903 lrwxrwxrwx. 1 root root 9 Oct 26 15:39 /var/lib/dnf -> /bigpart/dnf Can you explain the discrepancy in the download locations and comment on whether that symlink could be throwing something? I'll try to get debug output if that does not shed some light.
In the current state neither spelling exists as a symlink. I may have rebooted more that once since 'dnf system-upgrade reboot' . bash-5.0#ls -ail /system-update ls: cannot access '/system-update': No such file or directory bash-5.0#ls -ail /system-upgrade ls: cannot access '/system-upgrade': No such file or directory
(In reply to Federic from comment #48) > Firstly, could you verify that is update and not upgrade ?? > I have /var/lib/dnf/system-upgrade populated with all the files. I'm going off memory, which is also based on older dnf knowledge. I haven't gone to check the code, which is perhaps the best way to confirm/deny. Chances are it's since been changed from update to upgrade. > Also I had to symlink the download location to another partition since /var > is on rootfs and there was not enough space to run the upgrade. > > > ls -ail /var/lib/dnf > 1572903 lrwxrwxrwx. 1 root root 9 Oct 26 15:39 /var/lib/dnf -> /bigpart/dnf It seems decently likely this is related. How is this other partition being mounted by fstab? I'd expect that it's being mounted too late, and the upgrade process doesn't see any files available for updating so it fails to initiate. The proper way to handle a full system root is a) grow it, or b) change from /var as a directory on system root, to a separate /var partition, adding a /var mount entry to fstab. The systemd handling for /var is prioritized compared to /home and any non-standard mount points. In fact I'm not even absolutely certain that systemd will mount /home or non-standard mount points during the offline system upgrade because its intended to be a minimalist environment. But anyway, the systemd.log_level=debug journal will tell us where the confusion is.
Yes, the partition is in fstab on automount. If systemd does not mount anything beyond /var this could explain what we are seeing. This comes back to #11 and #13 where NO error is reported and the update process just craps out without leaving the slightest trace of why it failed. Even if this situation is a corner case, the lack of error reporting a fatal error is serious bug. I should have been able to find some error message the first time this happened and it would have been fixed the same day, instead of spending that last two months at blind man's bluff trying to guess what is or is not happening. Over the many times I've watched the console output of process, I seem to recall partition umounts happening in the initial partial boot where it installs the new kernel. IIRC this was more than / and /boot. Needs checking. I will try to reorganise the system so this this partition is mounted as /var , that seems like a good thing to eliminate.
(In reply to Federic from comment #51) > This comes back to #11 and #13 where NO error is reported and the update > process just craps out without leaving the slightest trace of why it failed. > Even if this situation is a corner case, the lack of error reporting a fatal > error is serious bug. That would definitely be revealed in the journal with systemd debug enabled; possibly even without it enabled. Literally the journal is the #1 go to for finding such error reporting, not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. If you were upgrading with GNOME Software (and also if you're specifically using Workstation edition) and run into a similar problem without error handling presented to the user at the next boot, i.e. the upgrade failed, you'd have a stronger case because it's a GUI upgrade mechanism. But even there, the upgrade happens offline in a special systemd upgrade target whether it's dnf or GNOME Software, so any problems are merely logged. > I should have been able to find some error message > the first time this happened and it would have been fixed the same day, > instead of spending that last two months at blind man's bluff trying to > guess what is or is not happening. The frustration is reasonable, but the complaint isn't convincing because the error message will be in the journal. And I know how incredibly tedious it is to go through the normal journal looking for problems, it's a metric F ton more verbose and tedious to wade through it when the debug option is set. But you can cheat by grepping it for things you know you're looking for like dnf. If you just filter that entire log for dnf entries, I really expect it to become obvious why it did not reach the offline upgrade target and it will say so, probably because the upgrade dir wasn't found or something to that effect; and then go looking at when /var was mounted in comparison to when the offline upgrade started (or tried to start and then presumably immediately failed). There's just no such thing as discrete error messages with a connect the dots explanation. That's not even the case on macOS or Windows.
firstly good news, it was indeed that late mounting of the partition which apparently caused the failure. Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected. I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. > not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method. sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output to the boot console when it worked, so manageing one line when it fails to find the rpm cache is hardly beyond the realms of computer science. firstly good news, it was indeed that late mounting of the partition which apparently caused the failure. Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected. I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. > not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method. sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science. firstly good news, it was indeed that late mounting of the partition which apparently caused the failure. Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected. I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. > not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method. sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science. If I grep dmesg I get loads of systemd lines , I see them on booting. > There's just no such thing as discrete error messages with a connect the dots explanation. it seems fairly clear by now that it was failing pretty early because it found an empty rpm cache or simply the simlink you indicated was either broken or pointing to an empty dir. Something like the following could be output. [FAILURE] simlink /system-upgrade-> /var/lib/dnf/system-upgrade does not contain update rpms. That could be echoed to as many log files as you like. It would still be rather confusing in the context I was in but at least we know what it was expecting, what problem it finds and why it craps out. It would then be a case of understanding why it differed from what I saw before reboot. But at least you have a lead to follow instead of nothing more than finding your self back at a standard login on the non upgraded system. No clues at all. > That's not even the case on macOS or Windows. Not even windows ? LOL As if that OS was not already the global reference in obfuscation and non indication of internal errors. That is one of the main reasons I adopted Linux, at least you can look under the hood. My main criticism here is that it was treated in a very windozian fashion. Part of a worrying trend to make Linux ever more windows like.
#12 > It's probably because there are logs (you can run "dnf system-upgrade log" to check if there are any). It can happen if the system-upgrade crashes at the very beginning and that's probably the case here. If it is crashing that needs to be fixed. If it is simply failing and exiting silently then this needs changing so that it does at least leave some trace that it even started. Preferably with an indication of why it terminated early.
> > not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method. > > sorry , I don't agree with that claim. Firstly the user does initiate it by > using software to set up the systemd target and rebooting. systemd outputs > tons of stuff to console via the partial boot which it runs and all the > package updates. I got about 10 thousand lines output the boot console when > it worked, so managing one line when it fails to find the rpm cache is > hardly beyond the realms of computer science. The user is activating a trigger. They are not running a command that immediately performs the upgrade. The upgrade starts via the trigger on a reboot. The user instance of dnf is long gone at this point, there isn't a way to get a message back to that program. Should the upgrade unit fail in this case and be marked as failed? Sure, absolutely. But by default, Workstation uses 'quiet rhgb' so such messages are invisible, and it's important that upgrades are failsafe. We can't just hang for 1 or 2 minutes, let alone indefinitely, waiting around doing nothing at all without informing the user, and informing the user of this means text has to be created for display by plymouth, needs translations into different languages, needs various a11y and i18n review, and so on. Every little thing is actually often not so little. So the proper thing for dnf offline upgrade to assume in this case, is that the user didn't actually run 'dnf system-upgrade download' by accident, report only in the journal that upgrade RPMs aren't available and therefore the upgrade was not initiated, and drop them right back to a working Fedora system that hasn't been touched. There isn't a way to bring up a GUI dialog in this instance, dnf is not a GUI program, they're not obligated to provide failure hints for GNOME or KDE or whatever DE, to pick up on and inform the user. That is something I'd sorta expect to see from GNOME Software but I haven't really attempted this particular act of sabotage (no offense) to see if it can handle it. The thing is, GNOME Software really can't get you into this kind of trouble by mistake - it's not possible to initiate the upgrade reboot until the download has happened and is verified behind the scenes by PackageKit. Which leads to another point of confusion is that there are two update/upgrade mechanisms that are not in conflict as much as they are competing implementations. > [FAILURE] simlink /system-upgrade-> /var/lib/dnf/system-upgrade does not > contain update rpms. > > That could be echoed to as many log files as you like. It may have. We don't have any attached journal entry, with or without debug logging enabled. > It would still be rather confusing in the context I was in but at least we > know what it was expecting, what problem it finds and why it craps out. It > would then be a case of understanding why it differed from what I saw before > reboot. But at least you have a lead to follow instead of nothing more than > finding your self back at a standard login on the non upgraded system. No > clues at all. I can't agree with no clues at all. Clues should be in the journal. There at least should be enough of a clue in the journal without debugging that our suspicion increases that there is in fact some kind of system confusion, which in turn causes us to enable systemd debugging options to get more information if necessary. > > > That's not even the case on macOS or Windows. > > Not even windows ? LOL As if that OS was not already the global reference in > obfuscation and non indication of internal errors. That is one of the main > reasons I adopted Linux, at least you can look under the hood. My main > criticism here is that it was treated in a very windozian fashion. Part of a > worrying trend to make Linux ever more windows like. Yeah I don't agree with the logic at all. Windows and macOS are so locked down they wouldn't have even allowed you to sabotage your layout by creating a symlink to a totally different volume containing the upgrade RPMs. In fact their enforcement would have prevented you from getting yourself into trouble in the first place. Their logging is just nowhere near as good by default, let alone with debugging enabled, as it is on linux/systemd systems. So even once you do get into trouble, it's much more straightforward to get the information needed from the system to understand where the failure is happening. Anyway, I'm glad the problem is fixed. I'm going to mark this closed notabug for now, because not initiating upgrades when upgrade RPMs aren't found isn't a bug, it's expected. If you can demonstrate that there's insufficient information logged in the journal in this case, then that's a reason to reopen and look into improving the logging.
Well I'm not out of the woods yet with this upgrade. on reboot it dumps me to recovery console, though I don't see any show stoppers in journalctl -xb If I cntl-D out it boots fine and seems to be fully functional. I noted these few lines when in rescue shell: reached target system initialisation reached target Sound card starting CUPS schedule started rescue shell. a little lower showed printer service startup completed. any ideas why it is dropping into a rescue shell? thx
Is the rescue shell one of the items in the on-the-fly-generated-boot-menu?
> any ideas why it is dropping into a rescue shell? Nope. Post the journal for this boot: 'journalctl -b -o short-monotonic > journal.log'
I found this in the journal. I had removed virtualbox because I was seeing this and had assumed that removing vbox would have fixed it, I should have checked. [ 2.229998] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxdrv' [ 2.230402] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxnetflt' [ 2.230687] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxnetadp' [ 2.230943] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxpci' [ 2.231181] localhost.localdomain systemd-modules-load[208]: Inserted module 'fuse' [ 2.231480] localhost.localdomain systemd[1]: Starting Create Volatile Files and Directories... [ 18.746836] localhost.localdomain kernel: vboxdrv: loading out-of-tree module taints kernel. [ 18.747785] localhost.localdomain kernel: vboxdrv: module verification failed: signature and/or required key missing - tainting kernel [ 18.752036] localhost.localdomain kernel: vboxdrv: Found 2 processor cores [ 18.753094] localhost.localdomain kernel: vboxdrv: fAsync=1 offMin=0xe2631 offMax=0xe2631 [ 18.790821] localhost.localdomain systemd[1]: systemd-journald.service: Succeeded. [ 18.837086] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxdrv' [ 18.853609] localhost.localdomain kernel: vboxdrv: TSC mode is Asynchronous, tentative frequency 3006270319 Hz [ 18.853963] localhost.localdomain kernel: vboxdrv: Successfully loaded version 6.0.14_RPMFusion (interface 0x00290008) [ 18.922154] localhost.localdomain kernel: VBoxNetFlt: Successfully started. [ 18.905465] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxnetflt' [ 18.975920] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxnetadp' [ 18.992585] localhost.localdomain kernel: VBoxNetAdp: Successfully started. [ 19.046633] localhost.localdomain kernel: VBoxPciLinuxInit [ 19.047005] localhost.localdomain kernel: vboxpci: IOMMU not found (not registered) [ 19.030620] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxpci' [ 19.092758] localhost.localdomain systemd[1]: Started Load Kernel Modules. [ 19.812164] localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe> [ 19.813924] localhost.localdomain systemd[1]: Starting udev Kernel Device Manager... [ 20.742296] localhost.localdomain systemd-udevd[482]: /etc/udev/rules.d/60-ipath.rules:4 Invalid value "kcopy/%02n" for NAME (char 7: invalid substitution type), ignoring, but please fix it. [ 21.007117] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/65-md-incremental.rules:28 Invalid value "/sbin/mdadm -I $env{DEVNAME} --export $devnode --offroot ${DEVLINKS}" f> [ 21.657117] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules:5 Invalid value "/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" for RUN (char 27: > [ 21.657344] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules:6 Invalid value "/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" for RUN (char 27: > [ 21.659554] localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/sy> [ 21.659667] localhost.localdomain systemd[1]: Started udev Kernel Device Manager. Why is it still trying to load stuff I have removed ?? dnf install kmod-VirtualBox VirtualBox-devel virtualbox-guest-additions akmod-VirtualBox Last metadata expiration check: 0:01:29 ago on Fri 15 Nov 2019 06:35:50 GMT. Dependencies resolved. ================================================================================================================================================================================================= Package Architecture Version Repository Size ================================================================================================================================================================================================= Installing: virtualbox-guest-additions x86_64 6.0.14-1.fc31 updates 1.2 M VirtualBox-devel x86_64 6.0.14-2.fc31 rpmfusion-free-updates 6.2 M akmod-VirtualBox x86_64 6.0.14-2.fc31 rpmfusion-free 47 k kmod-VirtualBox x86_64 6.0.14-2.fc31 rpmfusion-free 12 k Installing dependencies: akmods noarch 0.5.6-23.fc31 updates 23 k kmodtool noarch 1-37.fc31 updates 13 k python3-VirtualBox x86_64 6.0.14-2.fc31 rpmfusion-free-updates 158 k Transaction Summary ================================================================================================================================================================================================= Install 7 Packages Total download size: 7.6 M Installed size: 26 M Is this ok [y/N]: n Operation aborted. also this despite reaching target, cups is not working. [ 30.842080] localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-update-utmp-runlevel comm="systemd" ex> [ 31.882787] localhost.localdomain cupsd[642]: Missing value on line 790 of /var/cache/cups/job.cache. [ 31.895075] localhost.localdomain cupsd[642]: Missing value on line 826 of /var/cache/cups/job.cache. [ 31.895644] localhost.localdomain cupsd[642]: Missing value on line 838 of /var/cache/cups/job.cache. [ 31.896156] localhost.localdomain cupsd[642]: Missing value on line 850 of /var/cache/cups/job.cache. [ 31.952869] localhost.localdomain systemd[1]: Started CUPS Scheduler. [ 31.954545] localhost.localdomain systemd[1]: Reached target Printer. [ 31.971751] localhost.localdomain kernel: kauditd_printk_skb: 9 callbacks suppressed offending lines had no entry after "Name " ; deleted the cache file thanks.
ps I'm still seeing vbox stuff in current kernel: ls /lib/modules/5.3.9-300.fc31.x86_64/extra/VirtualBox vboxdrv.ko vboxnetadp.ko vboxnetflt.ko vboxpci.ko vboxsf.ko
bash-5.0#rpmquery -f /lib/modules/5.3.9-300.fc31.x86_64/extra/VirtualBox/vboxdrv.ko kmod-VirtualBox-5.3.9-300.fc31.x86_64-6.0.14-2.fc31.x86_64 bash-5.0#dnf list kmod-VirtualBox Last metadata expiration check: 0:13:43 ago on Fri 15 Nov 2019 06:35:50 GMT. Available Packages kmod-VirtualBox.x86_64 6.0.14-2.fc31 rpmfusion-free bash-5.0#dnf remove kmod-VirtualBox No match for argument: kmod-VirtualBox No packages marked for removal. Dependencies resolved. Nothing to do. Complete! orphaned garbage?
Created attachment 1636356 [details] log file re. boot to recovery shell after manually removing the remaining modules in extra , it is still trying to load them. what do I need to do to GET RID of vbox? Do you think that is the reason it is booting to a shell? thanks.
"Post the journal for this boot" Federic, please attach the complete unedited journal file for the failed boot. I don't want to have a back and forth about every little thing which may or may not be related. And no, I don't think it's vbox related to this bug at all.
the attached file was the output from what you requested. journalctl -b -o short-monotonic What is "edited" ?
If you are still booting to the recovery shell.. A quicky experiment, totally benign. Change /etc/default/grub GRUB_DEFAULT= Set it to zero -> GRUB_DEFAULT=0 and then do a normal plain reboot.
No, I don't think vobx is related to the orginal bug here either, it's just part of the mess Fedora upgrade has become, after being totally reliable to 6 years. I'll open a new bug about vbox mess. It happened as a result of the upgrade but is not related to failure of the upgrade to run which is the OP subject here. So let's stick to the subject of the bug: > Workstation uses 'quiet rhgb' so such messages are invisible, Yes, that is another defect of windozification attitude. However, having had the upgrade reboot process fail persistently , it seems pretty sensible and obvious step to DISABLE this "feature" so that the console output during the failing boot process is at least VISIBLE. I don't see that as a reason make such output useless by no longer reporting systematic failure to the boot console. >So the proper thing for dnf offline upgrade to assume in this case, is that the user didn't actually run 'dnf system-upgrade download' by accident, report only in the >journal that upgrade RPMs aren't available and therefore the upgrade was not initiated, and drop them right back to a working Fedora system that hasn't been touched. No problem with rebooting to a working system. Since the upgrade may be done remotely, it is not desirable to block indefinately as a command prompt : this situation I currently do have !! I don't see where "only in the journal" comes from. If I saw something like: "[UPGRADE FAILURE] : no rpm downloaded rmps in /var/lib/dnf.... " even for 5 seconds to read it, it would be a boon. Or as with the rescue shell at least say where the errors have been logged and how to get them. > The thing is, GNOME Software really can't get you into this kind of trouble by mistake - it's not possible to initiate the upgrade reboot until the download has happened and is verified behind the scenes by PackageKit. Not sure why Gnome is the definitive reference for this , it is a low level package management issue. Any GUI has to deal with that. It is not an excuse. However, exactly the same thing could happen with a GUI tool since all the downloads did happen the transaction was "checked" ran and completed before the reboot and no errors were evident at that stage. > I haven't really attempted this particular act of sabotage (no offense) to see if it can handle it. I realise that you use the term sabotage light-heartedly and there is no offence taken but this is not sabotage. I temporarily needed some extra storage to deal with about 7GB requirement of the upgrade process. It seemed a logical and legitimate solution to simlink the download dir to somewhere which could handle that. I was not aware dnf upgrade-plugin booted to a crippled ( no offence ) system where half the partitions were not mounted. That is not mentioned in the upgrade notes. resizing a root partition means taking the system out of service and will probably be an all-night operation even where possible. Mounting /var elsewhere is a good alternative but more complicated than a temporary simlink. I the absence of any good reason not to simlink it seems the quickest and most attractive option. Since rootfs requirements are growing rapidly , maybe the upgrade notes need to contain some specific information about this issue and the crippled state of the systemd boot under which it occurs. Once I have finished wasting further time fixing the broken system this upgrade left me with, I will attempt to further sabotage my system to get the logs of what is ( or is not ) reported about the failure to execute the systemd upgrade target. In the meantime this is not a NOTABUG, it is a NEEDSINFO.
(In reply to Federic from comment #65) > the attached file was the output from what you requested. Yep, I missed the attachment. I see it now. And the reason it ends up in rescue.target is because "single" is on the command line in the journal you attached: [ 0.000000] localhost.localdomain kernel: Command line: BOOT_IMAGE=/vmlinuz-5.3.9-300.fc31.x86_64 root=UUID=0e7f65df-e28b-45ec-ab09-b5bb44122e65 ro single rhgb LANG=en_GB.UTF-8 > I'll open a new bug about vbox mess. Yes please. I personally find off topic conversation in bug reports to be extremely irritating. This is not a chat forum. Do that on desktop@ or devel@. Seriously, developers just have no time or patience for this, they are already spread too thin. Bug reports that are 67 comments long just don't get read. It's too f'n long.
Thanks, I had missed that detail. So how on earth did that get there. It certainly was not me doing a manual edit at boot time. I've booted this system a dozen times since and it does the same every time, so presumably this must have been added by grub2-mkconfig. Isn't this like what it would add for the recovery options in the boot menu. I have the following line in /etc/default/grub if is relevant. GRUB_DISABLE_RECOVERY="false" I quite agree about length. Not being able upgrade the EOL system and now having it not boot properly is a major problem. It should not have taken over a month and 69 comments to get this far. Too fing long as you say. vbox was a detail but since something was causing it to drop to a shell, problems failing to load kernel modules was something I needed to consider. So the remaining issue is why is single appearing on the default boot kernel line? Is there a way with grub2 to simulate the output that will displayed in the grub menu, to inspect it on a running system?
OK, rerunning grub2-mkconfig fixed it. Now booting normally. many thanks for you time and patience getting to the bottom of this issue. I will at some point try to repeat this particular act of "sabotage" and attempt an upgrade to fed32 to see whether a similar blockage occurs and to get the detailed log to determine whether any errors are reported their. Though I have seriously over run the amount of "admin" R and D time I have available right now. This has wasted far more time and far longer to resolve than it should have. I again thank you for picking this up otherwise it would never have been solved. I still maintain that there should be some output on the console. It may not be the definitive record of reference and may be so scary that it needs hiding behind a soothing image of a cloudy sky or puppy dogs under Gnome WS, but it is the traditional primary output to an operator in front of the machine. The boot console output does not exist for no reason. systemd normally does output there, failure to do so in this case seems atypical, there should be something. best regards.
Since the fix was in running grub2-mkconfig, assigning back to grub2 and closing.
Daniel , maybe you need to read a bug before closing and reassigning it ! The bug here was caused by the upgrade plugin failing when the rpm cache is symlinked elsewhere ( to gain the necessary storage ) since though /var is mounted the rest of the system is not mounted when systemd attempts to run the upgrade. The fact that it runs a reduced system should be explained in the notes and is not mentioned. The reason this was still open is because of the issue that it silently fails without reporting anything to the boot console ( one left why the boot console even exists if this sort of thing is not posted there. ). Chris wanted to know whether there was something in the logs. Anyone can determine that on a fresh installation by symlinking /var/dnf to another mount point and attempting to run the dnf sytem-upgrade plugin. Please to NOT assign this to grub ( for a third time ) . Grub was NOT the cause.
Federic there are all kinds of ways a user can customize Fedora and unwittingly sabotage it such that updates and upgrades will silently fail. Not every single permutation of user customization has to be considered or documented, there aren't enough resources for things as it is, let alone insisting people document really esoteric and arguably unsupportable configurations. What you have in effect is an incomplete /var and it's not reasonable to expect *any* error handling as a result of a user customized system that creates such a setup, because no developer would expect a partially populated /var. It's expected that /var should be completely present, or not, and if it's not present then there is some error handling for that, but that's mostly the domain of systemd since it's responsible for early startup mountpoint assembly.
> ... unwittingly sabotage it such that updates and upgrades will silently fail. Agreed, it is impossible to anticipate every eventuality, that is not a reason that the symptom which causes the failure should be silent. That is what error trapping and error reporting is for. That should be reported. That is the bug here. > Not every single permutation of user customization has to be considered or documented. No one is asking that every single permutation of user customization is documented. That is a straw man argument. I said that the fact the plugin depends up booting under a "crippled" ( no offence ) version of the system should be documented. I was totally unaware having installed a "plugin" for dnf that this involved a special "target" for systemd operating in a crippled OS. Neither the use of systemd nor the incomplete, partially mounted system is explained in the notes on its use. That is a documentation bug. In the absence of any contrary information, I thought this was dnf running this ( since it is a dnf plugin ) and assumed that this was done once the system partitions were mounted. > and if it's not present then there is some error handling for that, but that's mostly the domain of systemd since it's responsible for early startup mountpoint assembly. Then maybe this should be assigned to systemd, so that those working on that can ensure suitable output is posted on the boot console when there is a *total failure* to reach the assigned target. If the boot console is not used for that, what is the point of having it existing at all? Just block all output to boot console, give us a nice reassuring pic of a couple young puppies with a pink bow round their necks and some calming soft music. We would not want to scare anyone, now, would we?
I guarantee you if you ask about this configuration on systemd-devel@ they will tell you it's an unsupported configuration, no documentation warning the user about it is indicated. And so far I'm inclined to say the same thing about documenting the dnf offline updates/upgrades idiosyncratic behavior of using a special minimalist offline target. https://docs.fedoraproject.org/en-US/quick-docs/dnf-system-upgrade/ Item 2: The system reboots into a special environment (implemented as a systemd target) to install them If you're doing to start doing customizations at a low level like symlinking to a dir of RPMs, that strikes me as very much out of scope for error handling, how could that possibly ever occur in normal usage? I think it's very reasonable to say you're on your own. But I leave it up to the dnf developers to decide if there should be a runtime test: hey we see the trigger symlink that says we should do an update/upgrade, but there are no RPMs, that's weird. But I bet there's a hint of that in the systemd journal. Seeing as by default we hide the console output, showing a pretty plymouth upgrade screen anyway I'm not sure there's an advantage to an error on console rather than just recording the error in the systemd journal.
> that strikes me as very much out of scope for error handling, how could that possibly ever occur in normal usage? I think it's very reasonable to say you're on your own. Nothing is "out of scope for error handling", many unknown unknowns can occur in the near infinite variation of circumstances which can occur in a system. That WHY error trapping exists. It does not matter what the individual cause may or may not be, all errors need to be trapped and dealt with. That DOES happen in this case, so I don't know why you even discuss that. What is lacking is error REPORTING. Again that is not "out of scope". I do not expect an error message catering to the very specific conditions of this problem saying "you have symlinked /var/dnf , you have sabotaged your system... ". That is not the error which has been detected. I suspect it's more along the lines of "/var/dnf file or directory not found". If something like that was reported it , it would be possible to diagnose the problem. > Seeing as by default we hide the console output ... So following your logic, nothing ever prints anything to the boot console and it gets removed from linux and we go windoze full moron on this. Because it's hidden by default, it no longer exists and this justifies not sending output to it. It is there and systemd already sends plenty to output to it. It is INCONSISTENT that this error is trapped but not reported. > there aren't enough resources for things as it is That may be the root cause of many problems I've been seeing with Fedora recently. This maybe yet another symptom. That is unlikely to change in a positive direction in the future. I'm currently testing another distro, this has issue has been the straw which broke this camel's back.
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.