1739796 – Rescue Kernel Recreated After FC30 Kernel Update Times Out on initiatorname.iscsi

Bug 1739796 - Rescue Kernel Recreated After FC30 Kernel Update Times Out on initiatorname.iscsi

Summary: Rescue Kernel Recreated After FC30 Kernel Update Times Out on initiatorname.i...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	30
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-10 14:53 UTC by jumanji
Modified:	2020-06-01 18:06 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-25 22:32:41 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
report provided by dracut after failed boot (111.00 KB, text/plain) 2019-08-10 14:53 UTC, jumanji	no flags	Details
View All

Description jumanji 2019-08-10 14:53:37 UTC

Created attachment 1602482 [details]
report provided by dracut after failed boot

1. Please describe the problem:
I wanted to update my rescue kernel based on the latest kernel update (5.2.6-200). I deleted the previous rescue kernel so that the kernel update via yum would create a new one. After doing that and trying to boot the new rescue kernel, it times out complaining about /etc/iscsi/initiatorname.iscsi. This file exists and contains the following line: 

InitiatorName=iqn.1994-05.com.redhat:cdc987e5473

The updated kernel works fine. Also, grub.cfg and grubenv update correctly. While the cause of this bug may not be serious, the issue is a serious one, since Fedora cannot boot into rescue mode with the updated rescue kernel...or something else is out of sync.

2. What is the Version-Release number of the kernel:
5.2.6-200.fc30.x86_64 / Latest FC30 as of today

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
I think the original FC30 install rescue kernel worked, but kernels recreated by the post-install script have not worked in several months, at least (as long as I have tried).

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Reproducibility: 100% (on my system, at least)
1. Delete the current rescue kernel
2. Update the kernel via yum where all the post-install scripts run and recreate the rescue kernel based on the newly updated version
3. Boot into rescue mode 
--> Boot times out and fails with:
[    3.722544] localhost iscsid[677]: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi



5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I don't know.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
I use a module for the HighPoint RocketRaid 640L, which is working fine. 

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Attached rdsosreport.txt. I have also noticed some other bugs relating to initiatorname, but they don't seem to be like mine. However, they may indicate that there is lurking gremlin somewhere relating to this functionality. Or, it could just be a post-install script issue. I don't know. I am hoping that someone much more in-the-know than I am will immediate spot why this failure is happening.

Comment 1 jumanji 2019-08-10 15:06:40 UTC

If it helps, privileges on /etc/iscsi/initiatorname.iscsi are: root:root 644
Also, /etc/iscsi/iscsid.conf exists (root:root 600) and should be unmodified from the default install.

Comment 2 jumanji 2019-08-13 17:00:22 UTC

I tried to reproduce this issue again with the new 5.2.7-200.fc30.x86_64 update, repeating the steps above. In this case, I tried running rescue without installing the RR640L driver. The problem still occurs, so it is not related to the RR640L module. The rdsosreport.txt is the same, timing out on initiatorname.scsi.

After the timeout, I noticed that were dracut warnings at the top of the screen about not being able to find /dev/fedora/root and /dev/mapper/fedora-root. It's running the boot image, so why wouldn't it find fedora-root? Also, I don't see that error in the rdsosreport.txt file.

Any suggestions?

Comment 3 jumanji 2019-08-22 17:00:30 UTC

Update: This problem is still occurring with the kernel 5.2.9 update. Is there any insight from my attached dracut log? I would like to try to resolve this issue as soon as I can and am willing to do more contextual debugging based on my original report, as needed. 

Currently, the kernel updates are not capable of correctly regenerating the rescue kernel, which is a serious issue that affects system recovery and reliability and one that should probably be a blocker for any new kernel release if it is occurring in general context. There is nothing special about my install.

Comment 4 Justin M. Forbes 2020-03-03 16:37:37 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs.

Fedora 30 has now been rebased to 5.5.7-100.fc30.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31.

If you experience different issues, please open a new bug report for those.

Comment 5 Justin M. Forbes 2020-03-25 22:32:41 UTC

*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 6 jumanji 2020-06-01 18:06:00 UTC

How not to manage Quality:

Steps to Reproduce:
1. Never assign bugs in a timely manner.
2. Never respond to the user for 7 months after they spend hours trying to characterize a serious problem for you.
3. Never attempt to reproduce the problem with the exact set of steps the user reports for the target/current environment.
4. Never review or comment on any of the materials the user submits.
5. Indicate that you have too many bugs and the user's bug is stale (because you never did anything about it) and have a new release now.
6. Close the bug in 3 weeks to clean up your bug counts (WooHoo! No more bugs! We are SO GMC!), still without any meaningful review or comment (vs. your 8 months of inaction).

So - too many bugs? Why not consider addressing user reports instead of repeatedly and globally sweeping them under the carpet?

It should be a matter of serious concern that the OS cannot create it's own rescue kernel, and even that level of seriousness and an attempt at a detailed report wasn't enough to warrant any kind of review. When I reported this bug, I thought it might be a P1 issue and receive immediate attention. I guess I must have been hit on the head too hard with a linux server as a baby. If only I had been wearing a soft, fluffy red hat to cushion the blow.

RedHat's poor quality process management is probably the biggest, current bug in the "OS." And, yes, my bug should be closed for INSUFFICIENT_DATA, because RedHat never assigned the bug, never did any review and never provided any meaningful input, and those are the data that have been correctly labeled as "insufficient."

I sincerely hope that, going forward, RedHat will work to modify its quality/delegation process to address these serious issues. I also don't understand what has stopped you from verifying this (maybe serious) bug, as reported, even retrospectively (albeit with possible delusion on my part, owing to the above-mentioned childhood injury and perhaps a burgeoning case of sphenisciphobia).

Note You need to log in before you can comment on or make changes to this bug.