Bug 1723057 - boot loop after upgrading F29 -> 30; grub error in diskfilter.c for a brief moment before menu loads
Summary: boot loop after upgrading F29 -> 30; grub error in diskfilter.c for a brief m...
Keywords:
Status: CLOSED DUPLICATE of bug 1699761
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: 30
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-22 15:08 UTC by Cody
Modified: 2019-10-15 11:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-15 06:58:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Cody 2019-06-22 15:08:42 UTC
Description of problem:

After a system upgrade from F29 -> F30 following the procedure (system-upgrade plugin of dnf? Whatever it was called) the system repeatedly reboots. There is an error for a split second before the menu loads which I will include (what I could capture of it anyway) below.

Version-Release number of selected component (if applicable):

I’m not actually sure but I did an upgrade yesterday. I’m not sure because it's tedious to boot off the install medium and wait for it to find the install and then load it before I can chroot into it. And it's time consuming; having to do this to try different things including that which sounds like my same issue (reboot loop) and then the same problem - only to have to start all over is extremely annoying and I’ve too much going on and too little patience to keep at it for any real length at any given time.

How reproducible:

I don't know but certainly 99% of the time. I say 99% rather than 100% because one time the rescue kernel did boot; every other time it rebooted just like the normal kernel modes (which I might add is extremely ironic that the 'rescue' kernel does the same thing). At one point sometimes it would boot but usually it would reboot in a loop. And at this point I decided (perhaps foolishly) to update again. How wrong I was to try that.


Steps to Reproduce:

Well the thing is that I have so much going on and mostly I have not been at this computer but rather a laptop (as I’m typing on now). But when I did upgrades and updates I went to the console. But I mostly was logged in over ssh including reboots (though that was seldom). Either way :

1. Upgrade from F29 to F30.
2. Reboot after the upgrade finished.
3. Have 'fun' watching the system get dizzy and faint until it passes out (as in I turn it off from utter frustration) from repeated short runs.

Actual results:

There's a grub error that's hard to see (and if it wasn't for phone camera I wouldn't have any idea; as it is I didn't even see that there was an error the first time). And then it starts to boot, seems to be going fine but I do see some error (I cannot recall and I’m not at the computer at this time and I’m not even sure if it san error as such; I will have to check if necessary but I don't think it is the kernel but maybe it is; either way it proceeds for a short bit after this). After many additional lines I'd expect to see the computer reboots. 


Expected results:

It boots completely and doesn't reboot?


Additional info:

It does begin to load the system but before I see a login prompt of any kind the system reboots. Repeatedly. I have noticed that sometimes the rescue kernel works but usually not (I don't know if this is the same for the normal kernel but I’m beyond flustered with this and have much to concern myself with too). At first the boot menu did not update but after looking at several bug reports I do now have it showing F30 kernels. The error that I have is:

> error: ../../grub-core/disk/diskfilter.c:916: disk filter writes are [I don't know what's beyond this]

Yes I've run grub2-install. Yes I've regenerated the config file. Yes I have grubby-deprecated installed (because in hope I disabled the BLS setting - hope because of a bug entry here). The last time I've had a boot problem was when grub had the major release to grub2. It caused me a world of frustrations and I do not remember what I did to get it to boot properly. I probably reinstalled. And yet in those days at least I didn't have the 'live images' that are limited and not at all what I need. Then again I don't need to deal with this because I have far too much going on. Yet it will bother me whilst it's having this problem. It also means I cannot access files that are on the volumes easily. Backups you see aren't very useful when you can't even boot. 

I can as I said boot up in the rescue mode of the installer but then what? I've done what is supposedly the solution but then I've not seen in any bug reports the above error. I have however read about boot loops and what fixed it for the users. Oh and the .rpmsave file did not work. It had the same problem. Nothing works.

I’m going to say that as a programmer I understand that major changes sometimes breaks things but this one is something I’m having a hard time to really sympathise with. Still the point there is that if any of the above is sardonic - and I sense some of it - or sarcastic (which is how I am in general) a part of it is that I have so much going on and I’m fed up with things going wrong left right and centre near every day. I can provide additional information but I must add that I’m unwell and so I might be a bit slower - and I’m dead tired from horrible sleep so I could easily misread something.

Do I need to include the grub.cfg file? If so I'll have to gather it later on hopefully today. I don't know what else to include but I can at least try capturing files to a USB drive but even that might prove difficult. And no I do not know when I first installed this other than it's been years. Probably the previous major change to grub though I have this vague memory there was another time after that. I do not know but it's been several releases anyway. I have both LVM and MD raids incidentally.

Comment 1 Cody 2019-06-22 18:27:13 UTC
I just extracted the following directories (all recursively) :

/var/log
/tmp (in the rescue mode of the installer - so logs there too)
/etc
/boot
/usr/lib/grub 

Please let me know what will be helpful. I've (not yet at least) looked at the grub2 source code but I have seen elsewhere what looks like the error is to do with me having LVM and/or MD raids but I do not know for certain one way or another. Maybe it's not even related to the boot loop but it's certainly new along with that rebooting loop. And I apologise for the tone of part of the original report; I really am quite stressed and have had a lot of trauma in the family and many other things going on so I’m really not in the best state of mind. Ideally this can be fixed but I do fear I’m going to have to reinstall and that itself is not helping me here either.

I will attach any files that might be relevant but only those necessary to save time and to not add more information than is necessary.

Comment 2 Cody 2019-06-23 16:14:04 UTC
Well I’m blest. I decided I couldn't deal with this so I installed anew. And I really am appalled.

The documentation for F30 says this:

>Verifying checksums on Linux and OSX systems
> Download the Fedora image of your choice from https://fedoraproject.org/get-fedora and the corresponding checksum file from https://fedoraproject.org/verify

And what do I see but a 404 for the checksum file... That's just amazing.

Then I had a hell of a time getting the installer to recognise my MD arrays. I had to search it out and only in another bug report did I find I had to set an option. This of course took more time because I had to reboot again. And this has been going on for years it seems; it's all fine for the installer to create new arrays but not use existing arrays by default? Of course I wouldn't have had to reinstall if there wasn't this bug (or bugs it appears) in the first place which would have meant this is a non-issue. But! Did it get better after a reinstall? No. It actually got worse if a reboot loop can actually get worse. That it can is shocking.

Not only does it still have that error I reported but it also reboots in a loop - only that sometimes it successfully boots okay. Wonderful? No. Because the GUI won't log in. Input password and it shows the console momentarily and then the log in prompt for the GUI again. Repeatedly. Waste of time reconfiguring things (since I decided I would configure at the console from backups first) and waste of DVD too.

Maybe it's time to find another distribution. I'll give it a few days but this is just shameful. I try very very hard to be understanding and I aim for only constructive criticism (which this comment actually has some) but the F30 release is truly appalling. No checksum file that's linked in the documentation. Reboot loop. Doesn't want to log on GUI. Upgraded the official way and errors reboot loop. Try a new install and same thing. More time lost and absolutely no progress made.

I do truly hope that this bug entry will be addressed and I can only say that maybe it's not been because it's a weekend but if that's not it... Really rather sad to see Fedora start out poor in the early days, become stable and then get worse and worse to the point of the above. And I know others have had the reboot loop. I also imagine many more have it only they didn't bother reporting it and went to a different distribution. And who could blame them?

Comment 3 Cody 2019-07-15 14:47:27 UTC
The reason I am bothering here is sadly not because I expect a reaction from the devs but rather for anyone who might run into this or a similar problem; indeed I find it rather disconcerting and tragic that Fedora has it seems to me in recent years only regressed but be that as it may I have reasons to stay with RH and for regular non server use the best is Fedora even if it’s a case of more regression than progression. 

Whatever. It *appears* that at least the reboot loop is what I suspected all along: a video driver issue. Of course that would be nouveau (which unfortunately is not in the least bit surprising). Probably in part my fault: for I typically will make sure to update the third party repos but with all going on I seem to recall that I forgot to do it prior to upgrade. Then it was a matter of getting the right NVIDIA drivers for my older video card.

After downgrading to F29 I had the same bloody problem! This is when I wanted to scream but reason held out. I went back to my original thought and so did what I should have done all along: verify if I had the wrong driver. Well clearly I did as it wasn’t loading and I seem to recall it had tainted the kernel. So I checked lspci to find out which NVIDIA driver I needed for akmods. After three reboots in a row without a reboot before log in prompt I believe - but am also sceptical until I have many more reboots - I think I am good. But why nouveau would do this and sometimes reboot repeatedly until by chance it doesn’t is very beyond me. Of course it’s rather difficult to further debug this when it’s not even completely booting so I didn’t even try. 

But one could easily be forgiven for associating a grub error with this problem especially when at one point the system wouldn’t even find the grub conf file! Yes yes it’s in the common errors but really it’s not something that a user should have to do; fine I am more than capable with and in fact much prefer the command line but what about people who use GUIs for system administration? How do you expect them to deal with this? 

As a programmer I am a long proponent of devs working with users but sadly - and I have called both out on this on BZ here in fact - but unfortunately many can’t be bothered and only like to bicker at each other blaming the other. And yes devs are very guilty of this. Neither help the other resolve the problem which is simply ridiculous and an utter waste of time. I am equally very aware that programming involves risk even if it wasn’t for humans being imperfect (esp those blessed souls who in their arrogance believe themselves superior). But to have a bug where the system doesn’t even boot as not a blocker is something I cannot understand. But if arguing with each other is non productive so is no reaction at all. Even sadder when the reporter provides information that would potentially be useful - and would be willing to provide more - despite the fact it requires some chance in capturing it. 

I am not marking this resolved or anything else because frankly I only consider the reboot loop resolved or potentially resolved. If any dev wishes to do so so be it but at least I can report one possible cause of a reboot loop. Arguably a bug report for nouveau should be submitted but then what is the point if there’s no reaction here ? It’s rather a shame and it’s unsurprising if some would try a different distro. 

Anyway I will worry about F30 when I have no choice but for anyone who has the reboot loop problem make sure to check if nouveau is causing a problem and if you can find a working driver. A sign that it might be this is if it sometimes boots but only after several tries. Or so it seems that’s part of it. One thing that clued me into it is I disable the graphical boot so I see the progress; from that I saw the oddities with the driver. I just hope that Fedora devs eventually work on making Fedora better because at this rate it seems that it’s only going to get worse. I really hope I am wrong though. 

Cheers.

Comment 4 Javier Martinez Canillas 2019-10-15 06:58:03 UTC

*** This bug has been marked as a duplicate of bug 1699761 ***

Comment 5 Cody 2019-10-15 11:23:57 UTC
Four months later almost ... and it's a duplicate of a bug only then? Yet it's not a duplicate. I pointed out the real problem in a comment. The grub2 error wasn't related.

It was a video driver issue. It's just at first (very first) I only had the error to go by even though it went past that.


Note You need to log in before you can comment on or make changes to this bug.