Bug 2124455 - crash: Kernel handling of CPU and memory hot un/plug
Summary: crash: Kernel handling of CPU and memory hot un/plug
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kexec-tools
Version: 9.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Baoquan He
QA Contact: Jie Li
URL:
Whiteboard:
Depends On: 2118897
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-06 08:13 UTC by Baoquan He
Modified: 2023-08-15 07:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-133284 0 None None None 2022-09-06 08:26:02 UTC

Description Baoquan He 2022-09-06 08:13:58 UTC
This bug was initially created as a copy of Bug #2118897

I am copying this bug because: 
Corresponding to kernel change, the user space need adjustment too to make the feature take effect. For kexec_file_load, things as below need bedone:


 - Prevent udev from updating kdump crash kernel on hot un/plug changes.
   Add the following as the first lines to the udev rule file
   /usr/lib/udev/rules.d/98-kexec.rules:

   # The kernel handles updates to crash elfcorehdr for cpu and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

   These lines will cause cpu and memory hot un/plug events to be
   skipped within this rule file, if the kernel has these changes
   enabled.

Description of problem:
When kdump service is loaded, if a CPU or memory is hot un/plugged, the crash elfcorehdr, which describes the CPUs and memory in the system, must also be updated, else the resulting vmcore is inaccurate (eg. missing either CPU context or memory regions).

The current solution utilizes udev to initiate an unload-then-reload of the kdump image (e. kernel, initrd, boot_params, puratory and elfcorehdr) by the userspace kexec utility. This brings significant performance problems related to offloading this activity to userspace.

In upstream, a patchset introduces a generic crash hot un/plug handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, this generic handler is invoked and invokes architecture specific handler to do the appropriate updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory. No involvement
with userspace needed.

[PATCH v10 0/8] crash: Kernel handling of CPU and memory hot un/plug
https://lore.kernel.org/all/20220721181747.1640-1-eric.devolder@oracle.com/T/#u

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Baoquan He 2023-03-14 04:57:08 UTC
Defer this to rhel9.4 since the upstream patches are still under discussion.


Note You need to log in before you can comment on or make changes to this bug.