Bug 2057391

Summary: kexec-tools built with gcc 12 will fail kexec/kdump jumping to 2nd kernel with kexec_load interface
Product: [Fedora] Fedora Reporter: Baoquan He <bhe>
Component: kexec-toolsAssignee: Baoquan He <bhe>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 36CC: bcotton, bhe, coxu, mattdm, ruyang, ryncsn, xiawu
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: bcotton: fedora_prioritized_bug-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2.0.23-6.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-07 04:18:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Baoquan He 2022-02-23 10:21:05 UTC
Description of problem:
While kdump can work well with kexec_file_load. debugging to see what's going on. kexec reboot also failed to jump, and just goes to firmware to reboot.

oot@dell-pet610-01 ~]# 
[root@dell-pet610-01 ~]# [ 1284.865822] sysrq: Trigger a crash
[ 1284.869383] Kernel panic - not syncing: sysrq triggered crash
[ 1284.875163] CPU: 9 PID: 1348 Comm: bash Kdump: loaded Tainted: G          I      --------- ---  5.17.0-0.rc5.038101e6b2cd.103.test.fc36.x86_61
[ 1284.888182] Hardware name: Dell Inc. PowerEdge T610/0N028H, BIOS 6.4.0 07/23/2013
[ 1284.895651] Call Trace:
[ 1284.898093]  <TASK>
[ 1284.900191]  dump_stack_lvl+0x5d/0x78
[ 1284.903853]  panic+0x111/0x32d
[ 1284.906930]  sysrq_handle_crash+0x18/0x20
[ 1284.910936]  __handle_sysrq+0x17d/0x1e0
[ 1284.914775]  write_sysrq_trigger+0x44/0x50
[ 1284.918870]  proc_reg_write+0x47/0xa0
[ 1284.922534]  vfs_write+0x108/0x360
[ 1284.925936]  ? lock_release+0x2eb/0x410
[ 1284.929779]  ? syscall_enter_from_user_mode+0x2e/0x1c0
[ 1284.934924]  ksys_write+0x5b/0xb0
[ 1284.938246]  do_syscall_64+0x43/0x90
[ 1284.941823]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1284.946869] RIP: 0033:0x7f617ab63027
[ 1284.950451] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 4
[ 1284.969190] RSP: 002b:00007ffc3060d998 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1284.976752] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f617ab63027
[ 1284.983875] RDX: 0000000000000002 RSI: 0000563a892fbda0 RDI: 0000000000000001
[ 1284.990998] RBP: 0000563a892fbda0 R08: 0000000000000000 R09: 0000000000000073
[ 1284.998120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 1285.005242] R13: 00007f617ac555a0 R14: 0000000000000002 R15: 00007f617ac55780
[ 1285.012397]  </TASK>
�`�fϞ`�f�����������������



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Baoquan He 2022-03-08 02:14:01 UTC
After investigation, the kdump failing to switching into 2nd kernel is caused by gcc 12 upgrading. In fedora rawhide, gcc has been upgraded from 11 to 12. Then kexec-tools built with gcc 12 will fail the swithing with kexec_load interface. With kexec_file_load interface, kdump and kexec reboot all works well. The kexec reboot with kexec_load is also caused failure by this.

The root cause why kexec-tools built with gcc 12 will fail the jumping into 2nd kernel hasn't been made clear.

Comment 2 Baoquan He 2022-03-08 02:14:55 UTC
By the way, kernel built with gcc 12 doesn't matter with kdump/kexec jumping.

Comment 3 Baoquan He 2022-03-08 02:39:44 UTC
*** Bug 2056876 has been marked as a duplicate of this bug. ***

Comment 4 Dave Young 2022-03-30 07:43:22 UTC
Note:  kexec reboot also fails with a reset to bios with below test steps:


kexec -l /boot/vmlinuz-`uname -r` --initrd /boot/initramfs-`uname -r`.img --reuse-cmdline
reboot

Expect test result: reboot into new kernel without going through bios
Actual result: reset to bios and a hard reboot happens.

An upstream kexec-tools patch is posted this week, it will be merged soon.

Comment 5 Dave Young 2022-03-30 07:52:09 UTC
Patch link: http://lists.infradead.org/pipermail/kexec/2022-March/024408.html

Comment 6 Matthew Miller 2022-04-06 14:26:38 UTC
Hi Dave! Can you clarify what action would be helpful as a prioritized bug? Is it concern with getting that patch into the Fedora package, or with testing it? Does this affect F36 in addition to Rawhide (and if so, would a freeze exception be a good idea?)?

Comment 7 Ben Cotton 2022-04-06 15:07:01 UTC
In today's Prioritized Bugs meeting, we agreed to defer a decision on this bug pending the input requested in comment 6
https://meetbot.fedoraproject.org/fedora-meeting-1/2022-04-06/fedora_prioritized_bugs_and_issues.2022-04-06-14.01.log.html#l-72

Comment 8 Dave Young 2022-04-07 08:42:39 UTC
Hi Matthew,

I do not know the exact Fedora process, just add the flag so that people can be aware of this bug, and I hope we can fix the bug in F36 :)

Coiby, the kexec-tools Fedora maintainer said he has merged the fixes in Fedora 36 branch and made a build today.

Yes, it affect F36 and rawhide as well, I moved the bz to F36. For rawhide we can have a rebase later to include the fixes automatically.

An exception is good if the process requires it to be added in F36.

Thanks
Dave

Comment 9 Coiby 2022-04-07 09:00:05 UTC
Yes, kexec-tools-2.0.23-6.fc36 [1] has been released to fix this bug.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1941936

Comment 10 Fedora Update System 2022-04-07 09:15:20 UTC
FEDORA-2022-c7080eb130 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-c7080eb130

Comment 11 Fedora Update System 2022-04-07 18:01:27 UTC
FEDORA-2022-c7080eb130 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-c7080eb130`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-c7080eb130

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Ben Cotton 2022-04-20 14:37:32 UTC
In today's Prioritized Bugs meeting, we agreed to reject this as a Prioritized Bug as a fix is already in the updates-testing repo and it does not seem to affect a large number of Fedora Linux users. 

https://meetbot.fedoraproject.org/fedora-meeting-1/2022-04-20/fedora_prioritized_bugs_and_issues.2022-04-20-14.01.log.html#l-52

Comment 13 Fedora Update System 2022-05-07 04:18:42 UTC
FEDORA-2022-c7080eb130 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.