2151500 – kdump to ssh fails to build initrd: dracut[7215]: Failed to get the driver of lo

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2151500 - kdump to ssh fails to build initrd: dracut[7215]: Failed to get the driver of lo

Summary: kdump to ssh fails to build initrd: dracut[7215]: Failed to get the driver of lo

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	9.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	9.2
Assignee:	Lichen Liu
QA Contact:	xiaoying yan
Docs Contact:
URL:
Whiteboard:	CockpitTest
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-07 11:24 UTC by Martin Pitt
Modified:	2023-05-16 03:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:	kexec-tools-2.0.25-10.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 08:14:43 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-141523	0	None	None	None	2022-12-07 11:34:35 UTC
Red Hat Product Errata	RHBA-2023:2463	0	None	None	None	2023-05-09 08:14:54 UTC

Description Martin Pitt 2022-12-07 11:24:26 UTC

Description of problem: Our most recent RHEL 9.2 image refresh in Cockpit's CI shows that configuring kdump to ssh is now broken.

Version-Release number of selected component (if applicable):

kexec-tools-2.0.25-7.el9.x86_64
dracut-057-13.git20220816.el9.x86_64
NetworkManager-1.41.6-1.el9.x86_64
kernel-5.14.0-205.el9.x86_64

How reproducible: Always


Steps to Reproduce:

First, set up an SSH target, this can be to the same host in a test VM:
1. Ensure you have an SSH key, run `ssh-keygen` if necessary
2. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3. Ensure you can log in with "ssh root@localhost" (accept the FP)

Now configure kdump:
1. printf 'ssh root@localhost\nsshkey /root/.ssh/id_rsa\n' >> /etc/kdump.conf 

2. SSH target needs -F option:
   sed -i '/core_collector/ s/$/ -F/' /etc/kdump.conf 

3. systemctl restart kdump

Actual results: Failed:

# systemctl status kdump
× kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2022-12-07 06:23:58 EST; 1s ago
   Duration: 1min 4.065s
    Process: 1773 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
   Main PID: 1773 (code=exited, status=1/FAILURE)
        CPU: 1.710s

Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 dracut[2020]: Connection 'lo' (e2602913-4e87-401b-a544-f683dc479071) successfully deleted.
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[3331]: Cannot get driver information: Operation not supported
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1991]: dracut: Failed to get the driver of lo
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 dracut[2020]: Failed to get the driver of lo
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1775]: kdump: Starting kdump: [FAILED]
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Failed with result 'exit-code'.
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: Failed to start Crash recovery kernel arming.
Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Consumed 1.710s CPU time.


Expected results: kdump works, as before.


Additional info:

Comment 1 Lichen Liu 2022-12-08 07:54:41 UTC

Hi Martin,

I see your test used the localhost as the ssh dump target, kexec-tools uses ethtool to check the driver which is used by the nic.
As far as I know, "ethtool -i lo" always fail with the following error message:
```
Cannot get driver information: Operation not supported
```
In my opinion, when using ssh, localhost is not a valid target, how about to re-test it by using the remote host as the dump target?

Thanks,
Lichen

Comment 2 Martin Pitt 2022-12-08 08:10:56 UTC

Well, *shrug* it has worked with localhost for years. E.g. on the Testing Farm (Fedora/RHEL gating) it's not possible to spawn a second VM to talk to, and even in our upstream tests it's fairly expensive to do that. 
What we could do is to not literally talk to "localhost", but to the IP address of any "proper" eth iface:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 172.27.0.15/24 brd 172.27.0.255 scope global dynamic noprefixroute eth0
       valid_lft 86232sec preferred_lft 86232sec

and use

   ssh root.0.15

in kdump.conf instead. But that doesn't work either, same error message.

How do you test kdump ssh upstream/in RHEL? That could be interesting for us to look at.

Note that talking to a remote machine does not currently work either, see bug 2151504 (although that was for NFS, but it fails very early on during network configuration in initrd)

Comment 3 Rich Megginson 2022-12-13 15:25:14 UTC

The system role kdump test https://github.com/linux-system-roles/kdump/blob/master/tests/tests_ssh.yml is also affected by this issue

Comment 4 Coiby 2022-12-14 07:01:37 UTC

(In reply to Martin Pitt from comment #2)
> Well, *shrug* it has worked with localhost for years. E.g. on the Testing
> Farm (Fedora/RHEL gating) it's not possible to spawn a second VM to talk to,
> and even in our upstream tests it's fairly expensive to do that. 

You use tmt to run tests, right? Yeah, supporting multihost tests is still an ongoing effort
https://github.com/teemtee/tmt/issues/726 for tmt.

> What we could do is to not literally talk to "localhost", but to the IP
> address of any "proper" eth iface:

> 
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
> group default qlen 1000
>     inet 172.27.0.15/24 brd 172.27.0.255 scope global dynamic noprefixroute
> eth0
>        valid_lft 86232sec preferred_lft 86232sec
> 
> and use
> 
>    ssh root.0.15
> 
> in kdump.conf instead. But that doesn't work either, same error message.

Yes, this is expected because it's still dumping to localhost thus still still using loopback device.

> 
> How do you test kdump ssh upstream/in RHEL? That could be interesting for us
> to look at.


We use own implemented framework for the tests. You can check https://src.fedoraproject.org/rpms/kexec-tools/blob/rawhide/f/tests.
For an example of triggering it in Github, you can also check out the experimental https://github.com/coiby/kexec-tools/tree/github_action/tests

> 
> Note that talking to a remote machine does not currently work either, see
> bug 2151504 (although that was for NFS, but it fails very early on during
> network configuration in initrd)

I've asked for some debugging logs as I failed to reproduce that bug.

Comment 5 Martin Pitt 2022-12-14 10:54:27 UTC

Thanks Coiby! I replied to your proposed patch with a (successful) test result and a proposal how to make it more robust and generic. I probably can't post to the kexec-@ list, but you were in CC:.

> For an example of triggering it in Github, you can also check out the experimental https://github.com/coiby/kexec-tools/tree/github_action/tests

Interesting -- do you use self-hosted runners, or do you just suffer through the glacial speed of emulation? (As GitHub's runners don't have /dev/kvm)

> bug 2151504

Thanks for your debugging there, this seems well understood now.

Comment 13 errata-xmlrpc 2023-05-09 08:14:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kexec-tools bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2463

Note You need to log in before you can comment on or make changes to this bug.