Bug 2151500
| Summary: | kdump to ssh fails to build initrd: dracut[7215]: Failed to get the driver of lo | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Martin Pitt <mpitt> |
| Component: | kexec-tools | Assignee: | Lichen Liu <lichliu> |
| Status: | CLOSED ERRATA | QA Contact: | xiaoying yan <yiyan> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.2 | CC: | coxu, jieli, lichliu, rmeggins, xiawu |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | 9.2 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | CockpitTest | ||
| Fixed In Version: | kexec-tools-2.0.25-10.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-09 08:14:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi Martin, I see your test used the localhost as the ssh dump target, kexec-tools uses ethtool to check the driver which is used by the nic. As far as I know, "ethtool -i lo" always fail with the following error message: ``` Cannot get driver information: Operation not supported ``` In my opinion, when using ssh, localhost is not a valid target, how about to re-test it by using the remote host as the dump target? Thanks, Lichen Well, *shrug* it has worked with localhost for years. E.g. on the Testing Farm (Fedora/RHEL gating) it's not possible to spawn a second VM to talk to, and even in our upstream tests it's fairly expensive to do that.
What we could do is to not literally talk to "localhost", but to the IP address of any "proper" eth iface:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 172.27.0.15/24 brd 172.27.0.255 scope global dynamic noprefixroute eth0
valid_lft 86232sec preferred_lft 86232sec
and use
ssh root.0.15
in kdump.conf instead. But that doesn't work either, same error message.
How do you test kdump ssh upstream/in RHEL? That could be interesting for us to look at.
Note that talking to a remote machine does not currently work either, see bug 2151504 (although that was for NFS, but it fails very early on during network configuration in initrd)
The system role kdump test https://github.com/linux-system-roles/kdump/blob/master/tests/tests_ssh.yml is also affected by this issue (In reply to Martin Pitt from comment #2) > Well, *shrug* it has worked with localhost for years. E.g. on the Testing > Farm (Fedora/RHEL gating) it's not possible to spawn a second VM to talk to, > and even in our upstream tests it's fairly expensive to do that. You use tmt to run tests, right? Yeah, supporting multihost tests is still an ongoing effort https://github.com/teemtee/tmt/issues/726 for tmt. > What we could do is to not literally talk to "localhost", but to the IP > address of any "proper" eth iface: > > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP > group default qlen 1000 > inet 172.27.0.15/24 brd 172.27.0.255 scope global dynamic noprefixroute > eth0 > valid_lft 86232sec preferred_lft 86232sec > > and use > > ssh root.0.15 > > in kdump.conf instead. But that doesn't work either, same error message. Yes, this is expected because it's still dumping to localhost thus still still using loopback device. > > How do you test kdump ssh upstream/in RHEL? That could be interesting for us > to look at. We use own implemented framework for the tests. You can check https://src.fedoraproject.org/rpms/kexec-tools/blob/rawhide/f/tests. For an example of triggering it in Github, you can also check out the experimental https://github.com/coiby/kexec-tools/tree/github_action/tests > > Note that talking to a remote machine does not currently work either, see > bug 2151504 (although that was for NFS, but it fails very early on during > network configuration in initrd) I've asked for some debugging logs as I failed to reproduce that bug. Thanks Coiby! I replied to your proposed patch with a (successful) test result and a proposal how to make it more robust and generic. I probably can't post to the kexec-@ list, but you were in CC:. > For an example of triggering it in Github, you can also check out the experimental https://github.com/coiby/kexec-tools/tree/github_action/tests Interesting -- do you use self-hosted runners, or do you just suffer through the glacial speed of emulation? (As GitHub's runners don't have /dev/kvm) > bug 2151504 Thanks for your debugging there, this seems well understood now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (kexec-tools bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2463 |
Description of problem: Our most recent RHEL 9.2 image refresh in Cockpit's CI shows that configuring kdump to ssh is now broken. Version-Release number of selected component (if applicable): kexec-tools-2.0.25-7.el9.x86_64 dracut-057-13.git20220816.el9.x86_64 NetworkManager-1.41.6-1.el9.x86_64 kernel-5.14.0-205.el9.x86_64 How reproducible: Always Steps to Reproduce: First, set up an SSH target, this can be to the same host in a test VM: 1. Ensure you have an SSH key, run `ssh-keygen` if necessary 2. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 3. Ensure you can log in with "ssh root@localhost" (accept the FP) Now configure kdump: 1. printf 'ssh root@localhost\nsshkey /root/.ssh/id_rsa\n' >> /etc/kdump.conf 2. SSH target needs -F option: sed -i '/core_collector/ s/$/ -F/' /etc/kdump.conf 3. systemctl restart kdump Actual results: Failed: # systemctl status kdump × kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2022-12-07 06:23:58 EST; 1s ago Duration: 1min 4.065s Process: 1773 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 1773 (code=exited, status=1/FAILURE) CPU: 1.710s Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 dracut[2020]: Connection 'lo' (e2602913-4e87-401b-a544-f683dc479071) successfully deleted. Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[3331]: Cannot get driver information: Operation not supported Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1991]: dracut: Failed to get the driver of lo Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 dracut[2020]: Failed to get the driver of lo Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 kdumpctl[1775]: kdump: Starting kdump: [FAILED] Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Failed with result 'exit-code'. Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: Failed to start Crash recovery kernel arming. Dec 07 06:23:58 rhel-9-2-127-0-0-2-2201 systemd[1]: kdump.service: Consumed 1.710s CPU time. Expected results: kdump works, as before. Additional info: