Bug 1761251 - upgrade to Fed30 never boots to new kernel
Summary: upgrade to Fed30 never boots to new kernel
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: 29
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-13 19:09 UTC by Federic
Modified: 2019-11-27 18:06 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-27 18:06:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dnf.librepo.log (1.88 MB, text/plain)
2019-10-13 19:09 UTC, Federic
no flags Details
dnf.log (1.37 MB, text/plain)
2019-10-13 19:10 UTC, Federic
no flags Details
new dnf.librepo.log (1.88 MB, text/plain)
2019-10-13 19:50 UTC, Federic
no flags Details
new dnf.log (1.15 MB, text/plain)
2019-10-21 17:09 UTC, Federic
no flags Details
log file re. boot to recovery shell (242.04 KB, text/plain)
2019-11-15 07:05 UTC, Federic
no flags Details

Description Federic 2019-10-13 19:09:40 UTC
Created attachment 1625359 [details]
dnf.librepo.log

Description of problem:
upgrade to Fed30 never boots to new kernel , does not install Fed30 pkgs, remains at Fed29

Version-Release number of selected component (if applicable):


How reproducible:
always


Steps to Reproduce:
1.install Fed29 from LXDE spin 
2.dnf update
3.dnf system-upgrade download --refresh --releasever=30 --allowerasing
4.dnf system-upgrade reboot


Actual results:
reboot to Fed29 kernel
before reaching user runlevel does shutdown and reboots
still no Fed30
no delay while installing new packages
boots to Fed 29 as usual

Expected results:
after initial reboot , reboot to Fed30 kernel , install new F30 pkgs with msg "this may take a while". 
User login to F30.

Additional info:

Comment 1 Federic 2019-10-13 19:10:38 UTC
Created attachment 1625360 [details]
dnf.log

adding dnf.log

Comment 2 Jonathan Dieter 2019-10-13 19:20:09 UTC
This may be a dumb question, but you did actually run dnf system-upgrade reboot, right?  I'm only asking because there's no indication of that in the logs.

Comment 3 Federic 2019-10-13 19:36:04 UTC
Yes, I've been round this loop at least 5 times now trying to see something I missed. 

here's a snip from root history: 

  996  dnf update
  997  dnf system-upgrade download --refresh --releasever=30 --allowerasing
  998  dnf --refresh upgrade
  999  dnf system-upgrade download --refresh --releasever=30 --allowerasing
 1000  dnf system-upgrade reboot


I reiterate , as reported, it does a partial reboot and then restarts itself, it is not just doing a normal reboot.

Comment 4 Federic 2019-10-13 19:40:53 UTC
Hang on, I'll system-upgrade reboot again. I think that log was since the last reboot since you said it was not sufficient. I'll reboot and post again.

Comment 5 Federic 2019-10-13 19:50:04 UTC
Created attachment 1625374 [details]
new dnf.librepo.log

Comment 6 Federic 2019-10-13 19:51:37 UTC
That log should now include output for :

dnf system-upgrade download --refresh --releasever=30 --allowerasing
dnf system-upgrade reboot

Comment 7 Federic 2019-10-13 20:33:34 UTC
It seems that the reboot only produced a couple of added lines. As with the log file I posted on the other thread, where I had cleared the file just before the reboot command. Even though that looked erroneous, it is consistent with what just happened again. 

the download command seems to go as normal, gets everything, checks and runs the transaction. That output is all familiar to me. 

After reboot I don't see any obvious error msgs but it all flies by so quickly it's hard to note anything of import.

Comment 8 Jonathan Dieter 2019-10-13 20:52:49 UTC
Well, I'm afraid I can't help.  I'm the zchunk developer, not the DNF system-upgrade plugin developer. You've successfully downloaded the packages which means you're well past the step where zchunk is used, so it must be something in the system-upgrade plugin.  It might be worth checking the journal to see if there's anything obvious there, but you might be best off attempting to upgrade using a different method.

Comment 9 Federic 2019-10-14 03:42:04 UTC
OK, thanks for taking a look. Hopefully one of the relevant devs will pick this up before it gets swept under rug by an EOL robot. 

How do I get the journal output for that operation? 

For "a different method" do I have any choices apart from ditching what I have, doing a fresh installation and trying to piece together all that changes I've made since installing Fed23 ?!

Comment 10 Pavla Kratochvilova 2019-10-16 07:12:08 UTC
Hello Federic,

Ales Matej made the following update to dnf-plugin-extras in Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2019-df11921f9a
It contains fix for a similar looking bug. Can you please try the system-upgrade again with version dnf-plugin-system-upgrade-4.0.7-1.fc29 (either from updates-testing repository or wait until it gets to stable)?
 
> How do I get the journal output for that operation? 

Useful information for system-upgrade are in "/var/log/dnf.log" and the output of "dnf system-upgrade log --number=1".

Comment 11 Federic 2019-10-16 07:30:45 UTC
thanks, I've added plugin update . I will test when I have some downtime. 

dnf system-upgrade log --number=1
Error: Cannot find logs with this index.

??

Comment 12 Pavla Kratochvilova 2019-10-17 11:32:19 UTC
(In reply to Federic from comment #11)
> thanks, I've added plugin update . I will test when I have some downtime. 

Great, thanks.

> dnf system-upgrade log --number=1
> Error: Cannot find logs with this index.
> 
> ??

It's probably because there are no logs (you can run "dnf system-upgrade log" to check if there are any). It can happen if the system-upgrade crashes at the very beginning and that's probably the case here.

Comment 13 Federic 2019-10-17 13:18:58 UTC
dnf system-upgrade log
The following boots appear to contain upgrade logs:
-- no logs were found --



I probably cleared all old content in /var/log some time back, since disk space is a bit tight. I know I did that for 28/29 upgrade, and I think once since for general housecleaning duties. 


I need to install a new disk and clone the root fs , I do not have my usual confidence in the upgrade going without problems.

Comment 14 Federic 2019-10-19 04:49:01 UTC
Just been around the loop again. 

dnf system-upgrade download --refresh --releasever=30 --allowerasing
dnf system-upgrade reboot

No update happened, just as before. 

 dnf system-upgrade log --number=1
Error: Cannot find logs with this index.

Comment 15 Federic 2019-10-19 05:02:01 UTC
I tried to upload dnf.log but keep getting server 500 error. File is now 3.8MB , is there a limit?  How many lines are required. currently 32000 lines.

Comment 16 Federic 2019-10-19 08:31:27 UTC

ls -ail /boot/grub2/grubenv
7318 lrwxrwxrwx. 1 root root 25 Oct  4  2018 /boot/grub2/grubenv -> ../efi/EFI/fedora/grubenv

thus h/w boots BIOS !

Is it possible that some changes are written here which are not referenced due to it being a BIOS booting system? 


Are the multiple issues with  GRUB_ENABLE_BLSCFG relevant here also?

Comment 17 Federic 2019-10-19 12:03:51 UTC
I had three kernel versions installed. I just removed the oldest one yet it still appears in Grub menu. 

this may be part of what is preventing system-upgrade from workig.

Comment 18 Federic 2019-10-20 08:51:19 UTC
I'm becoming more convinced this is a BLS issue

I had previously set GRUB_ENABLE_BLSCFG=False because it seemed there were a lot of unresolved/ unexpected outcomes. I figured I'd wait until the bugs got resolved before trying it. It seems that grubby-deprecated has not disappeared from the Fed29 repos. Quietly pulling the rug from under anyone who held back on BLS. 

I ran grub2-switch-to-blscfg as suggested here. Not sure what if anything it did. /boot/grub2/grub.conf still has a datestamp from yesterday.
https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault

I rebooted and still see the now removed kernel kernel options still showing in the grub menu. 

If grub2 is not getting correctly set I'm guessing that the new F30 kernel is not getting installed ( or not included in grub ) and there is little chance of the upgrade continuing if it does not boot to new kernel.

I've added 'medium' severity since not being able to upgrade seems a fairly important problem, even though system remains functional at fed29.

Can someone give some guidance on how to get this back into a functional state so that the upgrade can be completed? 

thx.

Comment 19 Federic 2019-10-21 17:09:14 UTC
Created attachment 1627737 [details]
new dnf.log

dnf.log was cleared before doing dnf update and reboot commands.

11k lines and not errors I can see but still returns the following: 

dnf system-upgrade log --number=1
Error: Cannot find logs with this index.

If the upgrade process is not even starting how can I determine what is preventing that?

Obviously all the preamble of getting and checking pkgs is working and preparing the update "transaction" is fine.

What triggers the update process on reboot and how can I find out what is blocking? It certainly does not seem to be much.

Comment 20 Pavla Kratochvilova 2019-10-24 07:56:58 UTC
There were multiple bugs reported on system-upgrade from Fedora 29 to 30 that were reassigned to grub2 (bz 1686059, bz 1707199, bz 1720911) and based on the comment 18, this seems like a connected issue. I am reassigning to grub2 to investigate there.

Comment 21 Javier Martinez Canillas 2019-10-24 14:45:17 UTC
(In reply to Federic from comment #18)
> I'm becoming more convinced this is a BLS issue
> 
> I had previously set GRUB_ENABLE_BLSCFG=False because it seemed there were a
> lot of unresolved/ unexpected outcomes. I figured I'd wait until the bugs
> got resolved before trying it. It seems that grubby-deprecated has not
> disappeared from the Fed29 repos. Quietly pulling the rug from under anyone
> who held back on BLS. 
> 
> I ran grub2-switch-to-blscfg as suggested here. Not sure what if anything it
> did. /boot/grub2/grub.conf still has a datestamp from yesterday.
> https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault
> 
> I rebooted and still see the now removed kernel kernel options still showing
> in the grub menu. 
>
> If grub2 is not getting correctly set I'm guessing that the new F30 kernel
> is not getting installed ( or not included in grub ) and there is little
> chance of the upgrade continuing if it does not boot to new kernel.
>
> I've added 'medium' severity since not being able to upgrade seems a fairly
> important problem, even though system remains functional at fed29.
> 
> Can someone give some guidance on how to get this back into a functional
> state so that the upgrade can be completed? 
> 
> thx.

Not sure I understood what's your current setup now. To what value do you have GRUB_ENABLE_BLSCFG set now and do you have the grubby-deprecated installed or not?

Comment 22 Javier Martinez Canillas 2019-10-24 14:46:30 UTC
(In reply to Pavla Kratochvilova from comment #20)
> There were multiple bugs reported on system-upgrade from Fedora 29 to 30
> that were reassigned to grub2 (bz 1686059, bz 1707199, bz 1720911) and based
> on the comment 18, this seems like a connected issue. I am reassigning to
> grub2 to investigate there.

Are we sure they are not facing the known issue of the installed GRUB being too old and needed to be updated?

https://fedoraproject.org/wiki/Common_F30_bugs#GRUB_boot_menu_is_not_populated_after_an_upgrade

Comment 23 Federic 2019-10-24 16:03:03 UTC
First of all this box was installed at Fed23 and so the Fed20 or older condition does not apply. 

grubby-deprecated is no longer available !!  Nice move pulling it from the repos when it is still needed because of this buggy behaviour. 

I did have GRUB_ENABLE_BLSCFG=false that worked last time I hit this issue , now seems I have no choice. 

dnf install grubby-deprecated 
Fedora Modular 29 - x86_64                       73 kB/s |  24 kB     00:00    
Fedora Modular 29 - x86_64 - Updates             38 kB/s |  23 kB     00:00    
Fedora 29 - x86_64 - Updates                     84 kB/s |  23 kB     00:00    
Fedora 29 - x86_64                               42 kB/s |  24 kB     00:00    
RPM Fusion for Fedora 29 - Free tainted          36 kB/s | 9.6 kB     00:00    
RPM Fusion for Fedora 29 - Free - Updates        37 kB/s | 9.6 kB     00:00    
RPM Fusion for Fedora 29 - Free                  38 kB/s |  10 kB     00:00    
RPM Fusion for Fedora 29 - Nonfree - Updates     18 kB/s | 9.8 kB     00:00    
RPM Fusion for Fedora 29 - Nonfree               37 kB/s |  11 kB     00:00    
No match for argument: grubby-deprecated
Error: Unable to find a match

I ran grub2-switch-to-blscfg but the upgrade problem did not change. 

Maybe someone who understands the guts of this problem can suggest a way clear it. 

Thanks for any suggestions.

Comment 24 Federic 2019-10-25 07:06:58 UTC
Currently I have grubby installed but not grubby-bls

Comment 25 Javier Martinez Canillas 2019-10-25 08:14:57 UTC
(In reply to Federic from comment #23)
> First of all this box was installed at Fed23 and so the Fed20 or older
> condition does not apply. 
> 

Yes, but I would still update GRUB with grub2-install /dev/sda (or whatever is your block device) to the latest installed version just to be sure.

> grubby-deprecated is no longer available !!  Nice move pulling it from the
> repos when it is still needed because of this buggy behaviour. 
>

The grubby-deprecated package is available in F30 onwards.

> I did have GRUB_ENABLE_BLSCFG=false that worked last time I hit this issue ,
> now seems I have no choice. 
> 
> dnf install grubby-deprecated 
> Fedora Modular 29 - x86_64                       73 kB/s |  24 kB     00:00 
> 
> Fedora Modular 29 - x86_64 - Updates             38 kB/s |  23 kB     00:00 
> 
> Fedora 29 - x86_64 - Updates                     84 kB/s |  23 kB     00:00 
> 
> Fedora 29 - x86_64                               42 kB/s |  24 kB     00:00 
> 
> RPM Fusion for Fedora 29 - Free tainted          36 kB/s | 9.6 kB     00:00 
> 
> RPM Fusion for Fedora 29 - Free - Updates        37 kB/s | 9.6 kB     00:00 
> 
> RPM Fusion for Fedora 29 - Free                  38 kB/s |  10 kB     00:00 
> 
> RPM Fusion for Fedora 29 - Nonfree - Updates     18 kB/s | 9.8 kB     00:00 
> 
> RPM Fusion for Fedora 29 - Nonfree               37 kB/s |  11 kB     00:00 
> 
> No match for argument: grubby-deprecated
> Error: Unable to find a match
>

Right for F29 the default was still not BLS, so the package that's now grubby-deprecated is still just called grubby.
 
> I ran grub2-switch-to-blscfg but the upgrade problem did not change. 
>

So you wanted to enable BLS for F29? I thought you mention that wanted to disable it for F30.
 
> Maybe someone who understands the guts of this problem can suggest a way
> clear it. 
> 

BLS was not the default for F29 so I wouldn't enable there. Just remove the GRUB_ENABLE_BLSCFG from /etc/default/grub) and do the upgrade.

If you have any issues, there will be a grub.cfg.rpmsave that contains a copy of your old grub.cfg so you could just do from the GRUB prompt:

grub> configfile /grub2/grub.cfg.rpmsave

Boot your machine and disable BLS (set GRUB_ENABLE_BLSCFG=false in /etc/default/grub) and install the grubby-deprecated package.

If BLS works correctly, then I would just leave BLS enabled and use the grubby package installed by F30 (that used to be grubby-bls in F29).

Comment 26 Federic 2019-10-25 09:10:45 UTC
many thanks for the input.  

I commented out the GRUB_ENABLE_BLSCFG line. 

grub2-install /dev/sda
grub2-mkconfig -o /boot/grub2/grub.cfg


here hoping I still have a Fedora system ...

Comment 27 Federic 2019-10-25 17:07:04 UTC
Well that was fun, now I've blown a hole in grub as well .

I did manage to reboot from the grub shell using configfile command, so it looks like just grub2-install did not go to plan. 

So now how do I repair grub. 

I have /boot on /dev/sda3  /root on /dev/sda6 

I can not recall whether grub was installed on /dev/sda before since it's probably 10 years since I needed to do that. 

Even if it was, it should be now. So why do I get dumped straight to 

grub>


Thanks for suggestions.

Comment 28 Federic 2019-10-25 19:59:21 UTC
just to clarify I was able to boot with the new grub.cfg. So it just seems the grub2-install went wrong. 

grub>  configfile (hd0,3)/grub2/grub.cfg

Comment 29 Federic 2019-10-26 12:04:40 UTC
checked the BIOS and it was set to boot from sdb . Having reinstalled grub to that device, it reboots. 

I will now try the upgrade again and report back.

Comment 30 Federic 2019-10-26 15:29:05 UTC
OK, so after that little detour, I now have up to date grub2 and the bug is still identical: 


dnf system-upgrade log --number=1
Error: Cannot find logs with this index.

Comment 31 Ben Cotton 2019-10-31 18:44:05 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 32 Federic 2019-11-01 06:58:10 UTC
yes, I know Fed29 is reaching EOL, THAT IS WHY I HAVE BEEN TRYING TO UPGRADE FOR THE LAST MONTH !

"BLS was not the default for F29 so I wouldn't enable there. Just remove the GRUB_ENABLE_BLSCFG from /etc/default/grub) and do the upgrade."

no BLS did not make change this one iota. 

"Are we sure they are not facing the known issue of the installed GRUB being too old and needed to be updated?"

as stated in #9 this system was installed at fed23 so it never was eligible for that 'old grub' bug. Reinstalling current made not difference. 


Reassigning this to dnf-system-upgrade since I see no evidence this is in fact grub related or tied to existing grub issues. no input from grub2 maintainers. 

dnf system-upgrade log --number=1
Error: Cannot find logs with this index.


I have a system upgrade which is not even starting and only adds two lines to dnf.log , I see no evidence for passing this to grub unless anyone can point to something grub related or direct evidence grub is failing. I would like some help before this gets EOLed. 

if I have to scrap my current system and reinstall afresh because of this I doubt it will fedora which gets installed. I have remote systems which I maintain where fedora's rolling upgrades are an advantage. But that needs to work and I need to know there is support when it doesn't . getting ELDed is not an acceptable outcome.

Comment 33 Bob Gustafson 2019-11-07 20:05:36 UTC
Thanks for trying the grub2 fix. It saved me some time by not trying it.

My problem is a failing f29->f30 upgrade, but on a UEFI system.

The symptoms are slightly different,

The download Errors out with a message saying that I am already at f30

    sudo dnf system-upgrade download --refresh --releasever=30

    [sudo] password for user1: 

    Before you continue ensure that your system is fully upgraded by running "dnf --refresh upgrade". Do you want to continue       [y/N]: y
    Error: Need a --releasever greater than the current system version.

My boot menu only contains f29 entries. If I could just populate my boot menu with the f30 entries that are already loaded, maybe my system would boot to f30.

See https://bugzilla.redhat.com/show_bug.cgi?id=1711710
Have fun

Comment 34 Federic 2019-11-08 04:07:52 UTC
> Thanks for trying the grub2 fix. It saved me some time by not trying it.

Actually I would still run grub2-install. There are clearly some ( erroneous ) assumptions being made by devs  about the state installations are in and what changes need to be taken into account. Reducing the number of possible differences seems like a good idea. At least having the current grub installed in MBR will mean next time they assume what you have installed , it will be nearer the truth.

Comment 35 Federic 2019-11-08 12:19:25 UTC
I finally got out of the deadlock on this by using :

dnf --releasever=30 distro-sync

This did install all the packages and installed kernel files in /boot  but did not modify the grub menu and thus still booted to fed29 kernel ( running fed30 ). 

I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg  and was able to boot to fed30 kernel. 

I should note that I had GRUB_ENABLE_BLSCFG=false in /etc/default/grub 

This did not prevent grubby getting installed and did not cause grubby-deprecated to be installed. 

However: 

#ls /boot/loader/entries
87ebc272236b419da0c6cab75da6ddad-0-rescue.conf	87ebc272236b419da0c6cab75da6ddad-5.3.8-200.fc30.x86_64.conf

#find /boot/ -iname "grub.cfg"
/boot/grub2/grub.cfg

apart from the file I created by hand, it does not seem to have changed or created a new grub.conf. 

It is not surprising that the update script does not proceed with the packages if it does not get the correct kernel on reboot. 

Is GRUB_ENABLE_BLSCFG=false in /etc/default/grub something that is not handled correctly when upgrading fed29

Comment 36 Javier Martinez Canillas 2019-11-08 13:08:46 UTC
(In reply to Federic from comment #35)
> I finally got out of the deadlock on this by using :
> 
> dnf --releasever=30 distro-sync
> 
> This did install all the packages and installed kernel files in /boot  but
> did not modify the grub menu and thus still booted to fed29 kernel ( running
> fed30 ). 
> 
> I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg  and was able to boot
> to fed30 kernel. 
> 
> I should note that I had GRUB_ENABLE_BLSCFG=false in /etc/default/grub 
> 
> This did not prevent grubby getting installed and did not cause
> grubby-deprecated to be installed. 
> 
> However: 
> 
> #ls /boot/loader/entries
> 87ebc272236b419da0c6cab75da6ddad-0-rescue.conf
> 87ebc272236b419da0c6cab75da6ddad-5.3.8-200.fc30.x86_64.conf
> 
> #find /boot/ -iname "grub.cfg"
> /boot/grub2/grub.cfg
> 
> apart from the file I created by hand, it does not seem to have changed or
> created a new grub.conf. 
> 
> It is not surprising that the update script does not proceed with the
> packages if it does not get the correct kernel on reboot. 
> 
> Is GRUB_ENABLE_BLSCFG=false in /etc/default/grub something that is not
> handled correctly when upgrading fed29

The default is to switch to a BLS configuration for the reasons that Chris mentioned.

If a user explicitly set GRUB_ENABLE_BLSCFG=false, then the grub2-switch-to-blscfg won't be executed when the grub2 updated but as you said the grubby-deprecated is needed to add new boot entries.

As a workaround the grub.cfg can be re-generated using grub2-mkconfig and future entries will be added by the grubby-deprecated package.

That needs to be improved, Chris suggested just calling grub2-mkconfig for each kernel install if GRUB_ENABLE_BLSCFG=false which probably makes sense. That way the old grubby tool could be deprecated and users that don't want to use a BLS configuration won't need any additional tool to add new boot entries.

The drawback is that os-prober will be executed for each kernel install and that's fragile since it mounts all the available block devices to search for other OS. Avoiding calling os-prober on each kernel install was one of the reasons why the old grubby tool existed and why the BLS configuration was introduced so the configuration could be modified using drop-ins files in a directory instead of editing a monolithic configuration file.

Comment 37 Federic 2019-11-08 15:15:52 UTC
OK, so it is a know issue ( feature? ) that setting GRUB_ENABLE_BLSCFG=false in Fed29 will prevent upgrade from going ahead?  Why was no one able to point that out before I spent 2 months messing around and discovered it for myself? 

Maybe that should be in the upgrade notes. 
https://fedoraproject.org/wiki/Common_F30_bugs#GRUB_boot_menu_is_not_populated_after_an_upgrade

"DO NOT set GRUB_ENABLE_BLSCFG=false  in Fed29: this will prevent upgrade from working. "


I had read there were issues and decided I definitely did not want BLS so I set this line /etc/default/grub  , there was no indication that this should not be done or would break the update. If that was a "feature" why was it not clearly documented? 

Hopefully this can be fixed.

Comment 38 Federic 2019-11-08 15:30:49 UTC
There is the added issue that there is apparently no output about this fault condition. The upgrade fails to even start so something has been detected which prevents this but there is no output to note that it bailed out or as what the error was. 

This is primarily a dnf issue since if it produced a message on console and in logs, it would have been starting point to finding the underlying cause. A lot of time was wasted just determining exactly what was ( or was not ) happening before even being able the investigate the cause. 

If grubby is know to fail when GRUB_ENABLE_BLSCFG=false is set in fed 29 then this also could trapped instead of silently failing and letting everyone pick out the bones. 

It would not be beyond the means of a script to temporarily remove it for the update then reinstated it later, though simply halting the upgrade and recommending the user change it would be simplest.

Comment 39 Bob Gustafson 2019-11-08 17:12:17 UTC
(In reply to Federic from comment #35)

> I finally got out of the deadlock on this by using :
> 
> dnf --releasever=30 distro-sync
> 
> This did install all the packages and installed kernel files in /boot  but
> did not modify the grub menu and thus still booted to fed29 kernel ( running
> fed30 ).

I think I must have done something like this because I have both fc29 and fc30 files

> 
> I manually ran grub2-mkconfig -o /boot/grub2/grub.cfg  and was able to boot
> to fed30 kernel. 

Ahh, but that seems like a BIOS boot. I am (and have been for years) running an EFI boot.

Comment 40 Federic 2019-11-08 17:33:03 UTC
> Ahh, but that seems like a BIOS boot. I am (and have been for years) running an EFI boot.

That is normal since this is my bug report and I'm running BIOS. I posted the EFI equivalent on your bug.

Comment 41 Federic 2019-11-11 14:40:56 UTC
This bug seems to be persistent.  

I just tried to upgrade the same system to Fed31 by the usual system-upgrade plugin and it fails just as before and reboots to Fed30 with no Fed31 kernel. 

While I could use the same trick and use distro-sync  but why is the update plugin broken and why is there no will to fix it?

Comment 42 Javier Martinez Canillas 2019-11-11 16:01:46 UTC
(In reply to Federic from comment #41)
> This bug seems to be persistent.  
> 
> I just tried to upgrade the same system to Fed31 by the usual system-upgrade
> plugin and it fails just as before and reboots to Fed30 with no Fed31
> kernel. 
> 
> While I could use the same trick and use distro-sync  but why is the update
> plugin broken and why is there no will to fix it?

Was the entry for F31 added? You mentioned that set GRUB_ENABLE_BLSCFG=false in /etc/default/grub. I assume that you also have the grubby-deprecated package installed?

Comment 43 Federic 2019-11-11 16:38:21 UTC
 GRUB_ENABLE_BLSCFG=false , yes,  grubby-deprecated package installed? yes. 

Usual upgrade proceedure: 

dnf system-upgrade download --refresh --releasever=31 --allowerasing

All downloads fine. 

dnf system-upgrade reboot

as before no new kernel, in /boot/ no new grub entry, direct boot to existing version, no upgrade of pkgs.

Comment 44 Javier Martinez Canillas 2019-11-11 16:55:18 UTC
(In reply to Federic from comment #43)
>  GRUB_ENABLE_BLSCFG=false , yes,  grubby-deprecated package installed? yes. 
> 

Any reasons why you are not enabling the BLS configuration and not using grubby-deprecated? That's the default now and what gets more testing. Diverging from the default usually leads to issues.

> Usual upgrade proceedure: 
> 
> dnf system-upgrade download --refresh --releasever=31 --allowerasing
> 
> All downloads fine. 
> 
> dnf system-upgrade reboot
> 

Having said that, I don't see why disabling BLS and using grubby-deprecated would cause any issues. I'll try to reproduce your problem.

> as before no new kernel, in /boot/ no new grub entry, direct boot to
> existing version, no upgrade of pkgs.

Comment 45 Bob Gustafson 2019-11-11 17:28:43 UTC
My own problem may have been due to my being unaware of the changes in boot and grub.cfg process.

I was thinking that the menu was in grub.cfg, but I think it is constructed on-the-fly at boot time.

The fact that the date on grub.cfg is old is misleading.

You might increase the GRUB_TIMEOUT=10 in /etc/default/grub  (from 5 to 10 for example) so you can better eyeball the boot menu.

Check also the GRUB_DEFAULT=0 in the same file. In my file, the value was 'saved', which doesn't say much about the value at boot time (saved where?). I think the saved value was 3, which was selecting an old kernel.

When you boot, try to see whether there really are fc30/fc31 kernels which you can boot.

---

I was able to beat the boot into submission and I am now running fc31 on two of my systems.

Comment 46 Chris Murphy 2019-11-11 20:23:21 UTC
Fedoric, Bob, you two really need to stick to your own bugs. The cross posting is confusing. Federic tells Bob to do a grub2-install which is really bad advice on UEFI and is certain to cause more confusion than it will solve because on UEFI Fedora supplies a premade grubx64.efi. The way to reinstall GRUB on UEFI is 'dnf reinstall grub2-efi shim' per:
https://fedoraproject.org/wiki/GRUB_2#Install_the_bootloader_files

The central part of this bug is this comment still dnf system-upgrade, not BLS and not GRUB.

(In reply to Federic from comment #43)
>  GRUB_ENABLE_BLSCFG=false , yes,  grubby-deprecated package installed? yes. 
> 
> Usual upgrade proceedure: 
> 
> dnf system-upgrade download --refresh --releasever=31 --allowerasing
> 
> All downloads fine. 
> 
> dnf system-upgrade reboot
> 
> as before no new kernel, in /boot/ no new grub entry, direct boot to
> existing version, no upgrade of pkgs.

What I can tell you Federic, is that there are thousands of dnf system-upgrade tests done per cycle. There is no general problem with it, so the fact you're having a problem is certainly some sort of local customization you have that's confusing the plug and results in failure to trigger the upgrade. The only possible way it gets fixed is to discover why it's failing to get triggered, there's no magical way it's just going to happen.

The lack of dnf logs is a problem.

'dnf system-upgrade reboot' should cause a symlink to be created:


/system-update -> /var/lib/system-update

So it's worth checking that it exists or not, and that also /var/lib/system-update exists and is populated with all the downloaded files. And then also possibly useful is attaching the journal for the boot that should have been an upgrade, but instead booted normally without an upgrade. The way to get that journal is to just do 'dnf system-upgrade reboot' and at the next reboot do 'journalctl -o short-monotonic > bug1761251_journal.log' and attach that log to this bug.

If at all possible, immediately after the reboot and you see the GRUB menu, if you can edit the boot entry, and add to the end of the kerneal parameters line, systemd.log_level=debug it will temporarily add a lot of debugging information that will then be included in the journal that you'll attach including possibly why the offline upgrade isn't being triggered.

Comment 47 Chris Murphy 2019-11-11 20:34:14 UTC
Possible duplicates:
https://bugzilla.redhat.com/show_bug.cgi?id=1764169
https://bugzilla.redhat.com/show_bug.cgi?id=1770703

Federic, check traceback for dnf crashing in those cases. And then search the journal you'll create from the steps in comment 46 for evidence of a similar problem. If it's a match, then you can optionally try the recommendation in https://bugzilla.redhat.com/show_bug.cgi?id=1770703#c1 and report back whether it fixes the problem or not. If it's not a match, then it might not be a dup of either of those.

Comment 48 Federic 2019-11-11 20:56:34 UTC
thanks for the input Chris. Yes, I agree it may some corner case condition which is triggering this. 


>'dnf system-upgrade reboot' should cause a symlink to be created:
>/system-update -> /var/lib/system-update

Firstly, could you verify that is update and not upgrade ?? 
I have /var/lib/dnf/system-upgrade populated with all the files. 

Also I had to symlink the download location to another partition since /var is on rootfs and there was not enough space to run the upgrade. 


ls -ail /var/lib/dnf
1572903 lrwxrwxrwx. 1 root root 9 Oct 26 15:39 /var/lib/dnf -> /bigpart/dnf

Can you explain the discrepancy in the download locations and comment on whether that symlink could be throwing something? 

I'll try to get debug output if that does not shed some light.

Comment 49 Federic 2019-11-11 21:00:28 UTC
In the current state neither spelling exists as a symlink.  I may have rebooted more that once since 'dnf system-upgrade reboot' .

bash-5.0#ls -ail /system-update
ls: cannot access '/system-update': No such file or directory
bash-5.0#ls -ail /system-upgrade
ls: cannot access '/system-upgrade': No such file or directory

Comment 50 Chris Murphy 2019-11-11 21:50:57 UTC
(In reply to Federic from comment #48)
> Firstly, could you verify that is update and not upgrade ?? 
> I have /var/lib/dnf/system-upgrade populated with all the files.

I'm going off memory, which is also based on older dnf knowledge. I haven't gone to check the code, which is perhaps the best way to confirm/deny. Chances are it's since been changed from update to upgrade.

> Also I had to symlink the download location to another partition since /var
> is on rootfs and there was not enough space to run the upgrade. 
> 
> 
> ls -ail /var/lib/dnf
> 1572903 lrwxrwxrwx. 1 root root 9 Oct 26 15:39 /var/lib/dnf -> /bigpart/dnf

It seems decently likely this is related. How is this other partition being mounted by fstab? I'd expect that it's being mounted too late, and the upgrade process doesn't see any files available for updating so it fails to initiate.

The proper way to handle a full system root is a) grow it, or b) change from /var as a directory on system root, to a separate /var partition, adding a /var mount entry to fstab. The systemd handling for /var is prioritized compared to /home and any non-standard mount points. In fact I'm not even absolutely certain that systemd will mount /home or non-standard mount points during the offline system upgrade because its intended to be a minimalist environment.

But anyway, the systemd.log_level=debug journal will tell us where the confusion is.

Comment 51 Federic 2019-11-12 08:41:15 UTC
Yes, the partition is in fstab on automount. If systemd does not mount anything beyond /var this could explain what we are seeing. 

This comes back to #11 and #13 where NO error is reported and the update process just craps out without leaving the slightest trace of why it failed. Even if this situation is a corner case, the lack of error reporting a fatal error is serious bug.  I should have been able to find some error message the first time this happened and it would have been fixed the same day, instead of spending that last two months at blind man's bluff trying to guess what is or is not happening.

Over the many times I've watched the console output of process, I seem to recall partition umounts happening in the initial partial boot where it installs the new kernel. IIRC this was more than / and /boot. Needs checking. 


I will try to reorganise the system so this this partition is mounted as /var , that seems like a good thing to eliminate.

Comment 52 Chris Murphy 2019-11-12 11:07:40 UTC
(In reply to Federic from comment #51)
> This comes back to #11 and #13 where NO error is reported and the update
> process just craps out without leaving the slightest trace of why it failed.
> Even if this situation is a corner case, the lack of error reporting a fatal
> error is serious bug.

That would definitely be revealed in the journal with systemd debug enabled; possibly even without it enabled. Literally the journal is the #1 go to for finding such error reporting, not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method.

If you were upgrading with GNOME Software (and also if you're specifically using Workstation edition) and run into a similar problem without error handling presented to the user at the next boot, i.e. the upgrade failed, you'd have a stronger case because it's a GUI upgrade mechanism. But even there, the upgrade happens offline in a special systemd upgrade target whether it's dnf or GNOME Software, so any problems are merely logged.

>  I should have been able to find some error message
> the first time this happened and it would have been fixed the same day,
> instead of spending that last two months at blind man's bluff trying to
> guess what is or is not happening.

The frustration is reasonable, but the complaint isn't convincing because the error message will be in the journal. And I know how incredibly tedious it is to go through the normal journal looking for problems, it's a metric F ton more verbose and tedious to wade through it when the debug option is set. But you can cheat by grepping it for things you know you're looking for like dnf. If you just filter that entire log for dnf entries, I really expect it to become obvious why it did not reach the offline upgrade target and it will say so, probably because the upgrade dir wasn't found or something to that effect; and then go looking at when /var was mounted in comparison to when the offline upgrade started (or tried to start and then presumably immediately failed).

There's just no such thing as discrete error messages with a connect the dots explanation. That's not even the case on macOS or Windows.

Comment 53 feddy 2019-11-12 22:13:05 UTC
firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output to the boot console when it worked, so manageing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

 
If I grep dmesg I get loads of systemd lines , I see them on booting.

> There's just no such thing as discrete error messages with a connect the dots explanation. 

it seems fairly clear by now that it was failing pretty early because it found an empty rpm cache or simply the simlink you indicated was either broken or pointing to an empty dir. Something like the following could be output.

[FAILURE] simlink /system-upgrade-> /var/lib/dnf/system-upgrade does not contain update rpms.

That could be echoed to as many log files as you like.

It would still be rather confusing in the context I was in but at least we know what it was expecting, what problem it finds and why it craps out. It would then be a case of understanding why it differed from what I saw before reboot. But at least you have a lead to follow instead of nothing more than finding your self back at a standard login on the non upgraded system. No clues at all. 

> That's not even the case on macOS or Windows.

Not even windows ? LOL As if that OS was not already the global reference in obfuscation and non indication of internal errors. That is one of the main reasons I adopted Linux, at least you can look under the hood. My main criticism here is that it was treated in a very windozian fashion. Part of a worrying trend to make Linux ever more windows like.

Comment 54 feddy 2019-11-12 22:14:54 UTC
firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output to the boot console when it worked, so manageing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

firstly good news, it was indeed that late mounting of the partition which apparently caused the failure.
Once I took the whole of /var on to that partition, it booted fine and went into the upgrade procedure as expected.

I did the log with debug but it's pretty irrelevant since it does not follow the bug conditions. I suppose I'd need to break it again to get a log output. 

> not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.

sorry , I don't agree with that claim. Firstly the user does initiate it by using software to set up the systemd target and rebooting. systemd outputs tons of stuff to console via the partial boot which it runs and all the package updates. I got about 10 thousand lines output the boot console when it worked, so managing one line when it fails to find the rpm cache is hardly beyond the realms of computer science.

 
If I grep dmesg I get loads of systemd lines , I see them on booting.

> There's just no such thing as discrete error messages with a connect the dots explanation. 

it seems fairly clear by now that it was failing pretty early because it found an empty rpm cache or simply the simlink you indicated was either broken or pointing to an empty dir. Something like the following could be output.

[FAILURE] simlink /system-upgrade-> /var/lib/dnf/system-upgrade does not contain update rpms.

That could be echoed to as many log files as you like.

It would still be rather confusing in the context I was in but at least we know what it was expecting, what problem it finds and why it craps out. It would then be a case of understanding why it differed from what I saw before reboot. But at least you have a lead to follow instead of nothing more than finding your self back at a standard login on the non upgraded system. No clues at all. 

> That's not even the case on macOS or Windows.

Not even windows ? LOL As if that OS was not already the global reference in obfuscation and non indication of internal errors. That is one of the main reasons I adopted Linux, at least you can look under the hood. My main criticism here is that it was treated in a very windozian fashion. Part of a worrying trend to make Linux ever more windows like.

Comment 55 feddy 2019-11-13 10:46:06 UTC
#12
> It's probably because there are  logs (you can run "dnf system-upgrade log" to check if there are any). It can happen if the system-upgrade crashes at the 
 very beginning and that's probably the case here.

If it is crashing that needs to be fixed. If it is simply failing and exiting silently then this needs changing so that it does at least leave some trace that it even started. Preferably with an indication of why it terminated early.

Comment 56 Chris Murphy 2019-11-13 16:06:43 UTC
> > not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this not the CLI program itself because the update doesn't start by user initiation, there is no possible way to report an error to tty or shell, errors can only be logged with this upgrade method. upgrade method.
> 
> sorry , I don't agree with that claim. Firstly the user does initiate it by
> using software to set up the systemd target and rebooting. systemd outputs
> tons of stuff to console via the partial boot which it runs and all the
> package updates. I got about 10 thousand lines output the boot console when
> it worked, so managing one line when it fails to find the rpm cache is
> hardly beyond the realms of computer science.

The user is activating a trigger. They are not running a command that immediately performs the upgrade. The upgrade starts via the trigger on a reboot. The user instance of dnf is long gone at this point, there isn't a way to get a message back to that program. Should the upgrade unit fail in this case and be marked as failed? Sure, absolutely. But by default, Workstation uses 'quiet rhgb' so such messages are invisible, and it's important that upgrades are failsafe. We can't just hang for 1 or 2 minutes, let alone indefinitely, waiting around doing nothing at all without informing the user, and informing the user of this means text has to be created for display by plymouth, needs translations into different languages, needs various a11y and i18n review, and so on. Every little thing is actually often not so little.

So the proper thing for dnf offline upgrade to assume in this case, is that the user didn't actually run 'dnf system-upgrade download' by accident, report only in the journal that upgrade RPMs aren't available and therefore the upgrade was not initiated, and drop them right back to a working Fedora system that hasn't been touched. There isn't a way to bring up a GUI dialog in this instance, dnf is not a GUI program, they're not obligated to provide failure hints for GNOME or KDE or whatever DE, to pick up on and inform the user. That is something I'd sorta expect to see from GNOME Software but I haven't really attempted this particular act of sabotage (no offense) to see if it can handle it. The thing is, GNOME Software really can't get you into this kind of trouble by mistake - it's not possible to initiate the upgrade reboot until the download has happened and is verified behind the scenes by PackageKit.

Which leads to another point of confusion is that there are two update/upgrade mechanisms that are not in conflict as much as they are competing implementations.

> [FAILURE] simlink /system-upgrade-> /var/lib/dnf/system-upgrade does not
> contain update rpms.
> 
> That could be echoed to as many log files as you like.

It may have. We don't have any attached journal entry, with or without debug logging enabled.

 
> It would still be rather confusing in the context I was in but at least we
> know what it was expecting, what problem it finds and why it craps out. It
> would then be a case of understanding why it differed from what I saw before
> reboot. But at least you have a lead to follow instead of nothing more than
> finding your self back at a standard login on the non upgraded system. No
> clues at all. 

I can't agree with no clues at all. Clues should be in the journal. There at least should be enough of a clue in the journal without debugging that our suspicion increases that there is in fact some kind of system confusion, which in turn causes us to enable systemd debugging options to get more information if necessary.



> 
> > That's not even the case on macOS or Windows.
> 
> Not even windows ? LOL As if that OS was not already the global reference in
> obfuscation and non indication of internal errors. That is one of the main
> reasons I adopted Linux, at least you can look under the hood. My main
> criticism here is that it was treated in a very windozian fashion. Part of a
> worrying trend to make Linux ever more windows like.

Yeah I don't agree with the logic at all. Windows and macOS are so locked down they wouldn't have even allowed you to sabotage your layout by creating a symlink to a totally different volume containing the upgrade RPMs. In fact their enforcement would have prevented you from getting yourself into trouble in the first place. Their logging is just nowhere near as good by default, let alone with debugging enabled, as it is on linux/systemd systems. So even once you do get into trouble, it's much more straightforward to get the information needed from the system to understand where the failure is happening.

Anyway, I'm glad the problem is fixed. I'm going to mark this closed notabug for now, because not initiating upgrades when upgrade RPMs aren't found isn't a bug, it's expected. If you can demonstrate that there's insufficient information logged in the journal in this case, then that's a reason to reopen and look into improving the logging.

Comment 57 Federic 2019-11-14 12:32:52 UTC
Well I'm not out of the woods yet with this upgrade. 

on reboot it dumps me to recovery console, though I don't see any show stoppers in journalctl -xb 

If I cntl-D out it boots fine and seems to be fully functional. 

I noted these few lines when in rescue shell:

reached target system initialisation
reached target Sound card
starting CUPS schedule
started rescue shell. 


a little lower showed printer service startup completed. 

any ideas why it is dropping into a rescue shell? 

thx

Comment 58 Bob Gustafson 2019-11-14 19:51:30 UTC
Is the rescue shell one of the items in the on-the-fly-generated-boot-menu?

Comment 59 Chris Murphy 2019-11-14 23:59:48 UTC
> any ideas why it is dropping into a rescue shell? 

Nope. Post the journal for this boot: 'journalctl -b -o short-monotonic > journal.log'

Comment 60 Federic 2019-11-15 06:39:07 UTC
I found this in the journal.  I had removed virtualbox because I was seeing this and had assumed that removing vbox would have fixed it, I should have checked. 

[    2.229998] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxdrv'
[    2.230402] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxnetflt'
[    2.230687] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxnetadp'
[    2.230943] localhost.localdomain systemd-modules-load[208]: Failed to find module 'vboxpci'
[    2.231181] localhost.localdomain systemd-modules-load[208]: Inserted module 'fuse'
[    2.231480] localhost.localdomain systemd[1]: Starting Create Volatile Files and Directories...


[   18.746836] localhost.localdomain kernel: vboxdrv: loading out-of-tree module taints kernel.
[   18.747785] localhost.localdomain kernel: vboxdrv: module verification failed: signature and/or required key missing - tainting kernel
[   18.752036] localhost.localdomain kernel: vboxdrv: Found 2 processor cores
[   18.753094] localhost.localdomain kernel: vboxdrv: fAsync=1 offMin=0xe2631 offMax=0xe2631


[   18.790821] localhost.localdomain systemd[1]: systemd-journald.service: Succeeded.
[   18.837086] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxdrv'
[   18.853609] localhost.localdomain kernel: vboxdrv: TSC mode is Asynchronous, tentative frequency 3006270319 Hz
[   18.853963] localhost.localdomain kernel: vboxdrv: Successfully loaded version 6.0.14_RPMFusion (interface 0x00290008)
[   18.922154] localhost.localdomain kernel: VBoxNetFlt: Successfully started.
[   18.905465] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxnetflt'
[   18.975920] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxnetadp'
[   18.992585] localhost.localdomain kernel: VBoxNetAdp: Successfully started.
[   19.046633] localhost.localdomain kernel: VBoxPciLinuxInit
[   19.047005] localhost.localdomain kernel: vboxpci: IOMMU not found (not registered)
[   19.030620] localhost.localdomain systemd-modules-load[456]: Inserted module 'vboxpci'
[   19.092758] localhost.localdomain systemd[1]: Started Load Kernel Modules.


[   19.812164] localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe>
[   19.813924] localhost.localdomain systemd[1]: Starting udev Kernel Device Manager...
[   20.742296] localhost.localdomain systemd-udevd[482]: /etc/udev/rules.d/60-ipath.rules:4 Invalid value "kcopy/%02n" for NAME (char 7: invalid substitution type), ignoring, but please fix it.
[   21.007117] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/65-md-incremental.rules:28 Invalid value "/sbin/mdadm -I $env{DEVNAME} --export $devnode --offroot ${DEVLINKS}" f>
[   21.657117] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules:5 Invalid value "/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" for RUN (char 27: >
[   21.657344] localhost.localdomain systemd-udevd[482]: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules:6 Invalid value "/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" for RUN (char 27: >
[   21.659554] localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/sy>
[   21.659667] localhost.localdomain systemd[1]: Started udev Kernel Device Manager.


Why is it still trying to load stuff I have removed ?? 


dnf install  kmod-VirtualBox VirtualBox-devel virtualbox-guest-additions akmod-VirtualBox
Last metadata expiration check: 0:01:29 ago on Fri 15 Nov 2019 06:35:50 GMT.
Dependencies resolved.
=================================================================================================================================================================================================
 Package                                                Architecture                       Version                                      Repository                                          Size
=================================================================================================================================================================================================
Installing:
 virtualbox-guest-additions                             x86_64                             6.0.14-1.fc31                                updates                                            1.2 M
 VirtualBox-devel                                       x86_64                             6.0.14-2.fc31                                rpmfusion-free-updates                             6.2 M
 akmod-VirtualBox                                       x86_64                             6.0.14-2.fc31                                rpmfusion-free                                      47 k
 kmod-VirtualBox                                        x86_64                             6.0.14-2.fc31                                rpmfusion-free                                      12 k
Installing dependencies:
 akmods                                                 noarch                             0.5.6-23.fc31                                updates                                             23 k
 kmodtool                                               noarch                             1-37.fc31                                    updates                                             13 k
 python3-VirtualBox                                     x86_64                             6.0.14-2.fc31                                rpmfusion-free-updates                             158 k

Transaction Summary
=================================================================================================================================================================================================
Install  7 Packages

Total download size: 7.6 M
Installed size: 26 M
Is this ok [y/N]: n
Operation aborted.




also this despite reaching target, cups is not working. 

[   30.842080] localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-update-utmp-runlevel comm="systemd" ex>
[   31.882787] localhost.localdomain cupsd[642]: Missing value on line 790 of /var/cache/cups/job.cache.
[   31.895075] localhost.localdomain cupsd[642]: Missing value on line 826 of /var/cache/cups/job.cache.
[   31.895644] localhost.localdomain cupsd[642]: Missing value on line 838 of /var/cache/cups/job.cache.
[   31.896156] localhost.localdomain cupsd[642]: Missing value on line 850 of /var/cache/cups/job.cache.
[   31.952869] localhost.localdomain systemd[1]: Started CUPS Scheduler.
[   31.954545] localhost.localdomain systemd[1]: Reached target Printer.
[   31.971751] localhost.localdomain kernel: kauditd_printk_skb: 9 callbacks suppressed

offending lines had no entry after "Name " ; deleted the cache file


thanks.

Comment 61 Federic 2019-11-15 06:47:00 UTC
ps I'm still seeing vbox stuff in current kernel:

ls /lib/modules/5.3.9-300.fc31.x86_64/extra/VirtualBox
vboxdrv.ko  vboxnetadp.ko  vboxnetflt.ko  vboxpci.ko  vboxsf.ko

Comment 62 Federic 2019-11-15 06:50:48 UTC
bash-5.0#rpmquery -f /lib/modules/5.3.9-300.fc31.x86_64/extra/VirtualBox/vboxdrv.ko
kmod-VirtualBox-5.3.9-300.fc31.x86_64-6.0.14-2.fc31.x86_64
bash-5.0#dnf list kmod-VirtualBox
Last metadata expiration check: 0:13:43 ago on Fri 15 Nov 2019 06:35:50 GMT.
Available Packages
kmod-VirtualBox.x86_64                                                                        6.0.14-2.fc31                                                                        rpmfusion-free
bash-5.0#dnf remove kmod-VirtualBox
No match for argument: kmod-VirtualBox
No packages marked for removal.
Dependencies resolved.
Nothing to do.
Complete!


orphaned garbage?

Comment 63 Federic 2019-11-15 07:05:23 UTC
Created attachment 1636356 [details]
log file re. boot to recovery shell

after manually removing the remaining modules in extra , it is still trying to load them.  what do I need to do to GET RID of vbox?

Do you think that is the reason it is booting to a shell?

thanks.

Comment 64 Chris Murphy 2019-11-15 13:09:59 UTC
"Post the journal for this boot" 

Federic, please attach the complete unedited journal file for the failed boot. I don't want to have a back and forth about every little thing which may or may not be related. And no, I don't think it's vbox related to this bug at all.

Comment 65 Federic 2019-11-15 13:22:28 UTC
the attached file was the output from what you requested. 

journalctl -b -o short-monotonic

What is "edited" ?

Comment 66 Bob Gustafson 2019-11-15 14:02:26 UTC
If you are still booting to the recovery shell..

A quicky experiment, totally benign.

Change /etc/default/grub GRUB_DEFAULT=  Set it to zero ->  GRUB_DEFAULT=0

and then do a normal plain reboot.

Comment 67 Federic 2019-11-15 14:14:16 UTC
No, I don't think vobx is related to the orginal bug here either, it's just part of the mess Fedora upgrade has become, after being totally reliable to 6 years. 

I'll open a new bug about vbox mess. It happened as a result of the upgrade but is not related to failure of the upgrade to run which is the OP subject here. So let's stick to the subject of the bug:

>  Workstation uses 'quiet rhgb' so such messages are invisible,

Yes, that is another defect of windozification attitude. However, having had the upgrade reboot process fail persistently , it seems pretty sensible and obvious step to DISABLE this "feature" so that the console output during the failing boot process is at least VISIBLE. I don't see that as a reason make such output useless by no longer reporting systematic failure to the boot console.


>So the proper thing for dnf offline upgrade to assume in this case, is that the user didn't actually run 'dnf system-upgrade download' by accident, report only in the 

>journal that upgrade RPMs aren't available and therefore the upgrade was not initiated, and drop them right back to a working Fedora system that hasn't been touched.

No problem with rebooting to a working system. Since the upgrade may be done remotely, it is not desirable to block indefinately as a command prompt : this situation I currently do have !!

 I don't see where "only in the journal" comes from. If I saw something like: "[UPGRADE FAILURE] : no rpm downloaded rmps in /var/lib/dnf.... " even for 5 seconds to read it, it would be a boon. Or as with the rescue shell at least say where the errors have been logged and how to get them.


> The thing is, GNOME Software really can't get you into this kind of trouble by mistake - it's not possible to initiate the upgrade reboot until the download has happened and is verified behind the scenes by PackageKit.

Not sure why Gnome is the definitive reference for this , it is a low level package management issue. Any GUI has to deal with that. It is not an excuse. However, exactly the same thing could happen with a GUI tool since all the downloads did happen the transaction was "checked" ran and completed before the reboot and no errors were evident at that stage. 

> I haven't really attempted this particular act of sabotage (no offense) to see if it can handle it.

I realise that you use the term sabotage light-heartedly and there is no offence taken but this is not sabotage. I temporarily needed some extra storage to deal with about 7GB requirement of the upgrade process. It seemed a logical and legitimate solution to simlink the download dir to somewhere which could handle that. I was not aware dnf upgrade-plugin booted to a crippled ( no offence ) system where half the partitions were not mounted.  That is not mentioned in the upgrade notes. 

resizing a root partition means taking the system out of service and will probably be an all-night operation even where possible. Mounting /var elsewhere is a good alternative but more complicated than a temporary simlink. I the absence of any good reason not to simlink it seems the quickest and most attractive option. 


Since rootfs requirements are growing rapidly , maybe the upgrade notes need to contain some specific information about this issue and the crippled state of the systemd boot under which it occurs.

Once I have finished wasting further time fixing the broken system this upgrade left me with, I will attempt to further sabotage my system to get the logs of what is ( or is not ) reported about the failure to execute the systemd upgrade target.

In the meantime this is not a NOTABUG, it is a NEEDSINFO.

Comment 68 Chris Murphy 2019-11-16 22:41:34 UTC
(In reply to Federic from comment #65)
> the attached file was the output from what you requested. 

Yep, I missed the attachment. I see it now. And the reason it ends up in rescue.target is because "single" is on the command line in the journal you attached:
[    0.000000] localhost.localdomain kernel: Command line: BOOT_IMAGE=/vmlinuz-5.3.9-300.fc31.x86_64 root=UUID=0e7f65df-e28b-45ec-ab09-b5bb44122e65 ro single rhgb LANG=en_GB.UTF-8

> I'll open a new bug about vbox mess.

Yes please. I personally find off topic conversation in bug reports to be extremely irritating. This is not a chat forum. Do that on desktop@ or devel@. Seriously, developers just have no time or patience for this, they are already spread too thin. Bug reports that are 67 comments long just don't get read. It's too f'n long.

Comment 69 Federic 2019-11-17 08:18:18 UTC
Thanks, I had missed that detail.  So how on earth did that get there. It certainly was not me doing a manual edit at boot time. I've booted this system a dozen times since and it does the same every time, so presumably this must have been added by grub2-mkconfig.

Isn't this like what it would add for the recovery options in the boot menu.  

I have the following line in /etc/default/grub if is relevant. 

GRUB_DISABLE_RECOVERY="false"


I quite agree about length. Not being able upgrade the EOL system and now having it not boot properly is a major problem. It should not have taken over a month and 69 comments to get this far. Too fing long as you say.  vbox was a detail but since something was causing it to drop to a shell, problems failing to load kernel modules was something I needed to consider. 

So the remaining issue is why is single appearing on the default boot kernel line? Is there a way with grub2 to simulate the output that will displayed in the grub menu, to inspect it on a running system?

Comment 70 Federic 2019-11-17 08:53:15 UTC
OK, rerunning grub2-mkconfig fixed it. Now booting normally. 

many thanks for you time and patience getting to the bottom of this issue. 

I will at some point try to repeat this particular act of "sabotage" and attempt an upgrade to fed32 to see whether a similar blockage occurs and to get the detailed log to determine whether any errors are reported their. Though I have seriously over run the amount of "admin" R and D time I have available right now. This has wasted far more time and far longer to resolve than it should have.  I again thank you for picking this up otherwise it would never have been solved. 

I still maintain that there should be some output on the console. It may not be the definitive record of reference and may be so scary that it needs hiding behind a soothing image of a cloudy sky or puppy dogs under Gnome WS, but it is the traditional primary output to an operator in front of the machine. 

The boot console output does not exist for no reason. systemd normally does output there, failure to do so in this case seems atypical, there should be something.

best regards.

Comment 71 Daniel Mach 2019-11-25 09:56:56 UTC
Since the fix was in running grub2-mkconfig, assigning back to grub2 and closing.

Comment 72 Federic 2019-11-25 16:13:19 UTC
Daniel , maybe you need to read a bug before closing and reassigning it !

The bug here was caused by the upgrade plugin failing when the rpm cache is symlinked elsewhere ( to gain the necessary storage ) since though /var is mounted the rest of the system is not mounted when systemd attempts to run the upgrade. 

The fact that it runs a reduced system should be explained in the notes and is not mentioned. 

The reason this was still open is because of the issue that it silently fails without reporting anything to the boot console ( one left why the boot console even exists if this sort of thing is not posted there. ). 

Chris wanted to know whether there was something in the logs. Anyone can determine that on a fresh installation by symlinking /var/dnf to another mount point and attempting to run the dnf sytem-upgrade plugin. 

Please to NOT assign this to grub ( for a third time ) . Grub was NOT the cause.

Comment 73 Chris Murphy 2019-11-25 20:13:31 UTC
Federic there are all kinds of ways a user can customize Fedora and unwittingly sabotage it such that updates and upgrades will silently fail. Not every single permutation of user customization has to be considered or documented, there aren't enough resources for things as it is, let alone insisting people document really esoteric and arguably unsupportable configurations. What you have in effect is an incomplete /var and it's not reasonable to expect *any* error handling as a result of a user customized system that creates such a setup, because no developer would expect a partially populated /var. It's expected that /var should be completely present, or not, and if it's not present then there is some error handling for that, but that's mostly the domain of systemd since it's responsible for early startup mountpoint assembly.

Comment 74 Federic 2019-11-25 20:56:53 UTC
> ... unwittingly sabotage it such that updates and upgrades will silently fail.

Agreed, it is impossible to anticipate every eventuality, that is not a reason that the symptom which causes the failure should be silent. That is what error trapping and error reporting is for.  That should be reported. That is the bug here. 

> Not every single permutation of user customization has to be considered or documented.

No one is asking that every single permutation of user customization is documented. That is a straw man argument. I said that the fact the plugin depends up booting under a "crippled" ( no offence ) version of the system should be documented. I was totally unaware having installed a "plugin" for dnf that this involved a special "target" for systemd operating in a crippled OS. Neither the use of systemd nor the incomplete, partially mounted system is explained in the notes on its use. That is a documentation bug. 

In the absence of any contrary information, I thought this was dnf running this ( since it is a dnf plugin ) and assumed that this was done once the system partitions were mounted. 

> and if it's not present then there is some error handling for that, but that's mostly the domain of systemd since it's responsible for early startup mountpoint assembly.

Then maybe this should be assigned to systemd, so that those working on that can ensure suitable output is posted on the boot console when there is a *total failure* to reach the assigned target.  If the boot console is not used for that, what is the point of having it existing at all? Just block all output to boot console, give us a nice reassuring pic of a couple young puppies with a pink bow round their necks and some calming soft music. We would not want to scare anyone, now, would we?

Comment 75 Chris Murphy 2019-11-25 21:47:53 UTC
I guarantee you if you ask about this configuration on systemd-devel@ they will tell you it's an unsupported configuration, no documentation warning the user about it is indicated. And so far I'm inclined to say the same thing about documenting the dnf offline updates/upgrades idiosyncratic behavior of using a special minimalist offline target.

https://docs.fedoraproject.org/en-US/quick-docs/dnf-system-upgrade/
Item 2:
The system reboots into a special environment (implemented as a systemd target) to install them

If you're doing to start doing customizations at a low level like symlinking to a dir of RPMs, that strikes me as very much out of scope for error handling, how could that possibly ever occur in normal usage? I think it's very reasonable to say you're on your own.

But I leave it up to the dnf developers to decide if there should be a runtime test: hey we see the trigger symlink that says we should do an update/upgrade, but there are no RPMs, that's weird. But I bet there's a hint of that in the systemd journal. Seeing as by default we hide the console output, showing a pretty plymouth upgrade screen anyway I'm not sure there's an advantage to an error on console rather than just recording the error in the systemd journal.

Comment 76 Federic 2019-11-26 02:31:58 UTC
> that strikes me as very much out of scope for error handling, how could that possibly ever occur in normal usage? I think it's very reasonable to say you're on your own.

Nothing is "out of scope for error handling", many unknown unknowns can occur in the near infinite variation of circumstances which can occur in a system. That WHY error trapping exists. It does not matter what the individual cause may or may not be, all errors need to be trapped and dealt with. That DOES happen in this case, so I don't know why you even discuss that. 

What is lacking is error REPORTING.  Again that is not "out of scope". I do not expect an error message catering to the very specific conditions of this problem saying "you have symlinked /var/dnf , you have sabotaged your system... ". That is not the error which has been detected. I suspect it's more along the lines of "/var/dnf file or directory not found". 

If something like that was reported it , it would be possible to diagnose the problem. 


> Seeing as by default we hide the console output ... 

So following your logic, nothing ever prints anything to the boot console and it gets removed from linux and we go windoze full moron on this. Because it's hidden by default, it no longer exists and this justifies not sending output to it. 

It is there and systemd already sends plenty to output to it. It is INCONSISTENT that this error is trapped but not reported. 

> there aren't enough resources for things as it is

That may be the root cause of many problems I've been seeing with Fedora recently. This maybe yet another symptom. That is unlikely to change in a positive direction in the future.  I'm currently testing another distro, this has issue has been the straw which broke this camel's back.

Comment 77 Ben Cotton 2019-11-27 18:06:07 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.