2222526 – kdump failed with the error "Cannot open /proc/vmcore: No such file or directory"

Bug 2222526 - kdump failed with the error "Cannot open /proc/vmcore: No such file or directory"

Summary: kdump failed with the error "Cannot open /proc/vmcore: No such file or direct...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Pingfan Liu
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	kdump:Fedora
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-07-13 04:43 UTC by Coiby
Modified:	2023-09-29 14:51 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-07-25 17:06:33 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FC-893	0	None	None	None	2023-07-13 05:13:36 UTC

Description Coiby 2023-07-13 04:43:34 UTC

Kudmp failed for kernel 6.5.0-0.rc1.11.fc39.ppc64le.

    [   20.681017] systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service... 
    [   20.875696] kdump.sh[429]: kdump: saving to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/ 
    [   20.935902] kdump.sh[429]: kdump: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/ 
    [   20.938392] kdump.sh[474]: Cannot open /proc/vmcore: No such file or directory 
    [   20.940384] kdump.sh[429]: kdump: saving vmcore-dmesg.txt failed 
    [   20.940709] kdump.sh[429]: kdump: saving vmcore 
    [   20.989322] kdump.sh[476]: open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory 
    [   20.996785] kdump.sh[476]: makedumpfile Failed. 
    [   20.997576] kdump.sh[429]: kdump: saving vmcore failed, exitcode:1 
    [   20.997868] kdump.sh[429]: kdump: saving vmcore failed 
    [   21.038209] kdump.sh[429]: kdump: saving the /run/initramfs/kexec-dmesg.log to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/// 
    [   21.046453] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE 
    [   21.046806] systemd[1]: kdump-capture.service: Failed with result 'exit-code'. 


Reproducible: Always

Steps to Reproduce:
1.dnf install kexec-tools kernel-modules -y
2. reboot
3. systemctl start kdump
4. trigger kernel crash
Actual Results:  
kdump failed to save the kernel coredump.

Expected Results:  
kdump successfully saves the kernel coredump.

Originally reported by CoreOS team https://github.com/coreos/fedora-coreos-tracker/issues/1523

Comment 1 Baoquan He 2023-07-17 06:40:31 UTC

In several CKI failure reports, I saw the similar problem. Besides, it has shown it may be caused by corrupted eflcorehdr as below:

[    0.148565] Warning: Core image elf header is not sane 
[    0.148570] Kdump: vmcore not initialized 

Please see one test_console.og from one failed cki case:
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/930627185/4650966145/redhat%3A930627185/build_ppc64le_redhat%3A930627185-ppc64le-kernel/tests/2/results_0001/job.01/recipes/14221455/tasks/7/logs/test_console.log

Thanks
Baoquan

Comment 2 Coiby 2023-07-17 08:20:16 UTC

I did a git bisection using kernel-auto-bisect [1] and the first bad commit is 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

[1] https://gitlab.com/redhat/centos-stream/src/kernel/utils/tools/-/tree/main/kernel-auto-bisect

Comment 3 Dave Young 2023-07-18 02:29:36 UTC

Created attachment 1976289 [details]
untested patch

Thanks Coiby for bisecting.  If any of you have the machine, could you try the untested patch see if it works?

Comment 4 Pingfan Liu 2023-07-21 15:12:27 UTC

Tried Fedora 38, but it has a significant bug, which panics the kernel during the compiling of kernel.

I tried to re-install the baremetal with RHEL-9, then tested the latest upstream kernel but the kdump kernel experiences another type of panic.

[   21.619230] usb 2-4: new SuperSpeed USB device number 2 using xhci_hcd
[   21.671227] usb 2-4: New USB device found, idVendor=0451, idProduct=8140, bcdDevice= 1.00
[   21.671248] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   37.349138] watchdog: CPU 68 detected hard LOCKUP on other CPUs 0
[   37.349156] watchdog: CPU 68 TB:10136783097847, last SMP heartbeat TB:10128591098849 (15999ms ago)
[   37.349307] watchdog: CPU 0 Hard LOCKUP
[   37.349310] watchdog: CPU 0 TB:10136783184586, last heartbeat TB:10126477817591 (20127ms ago)
[   37.349313] Modules linked in:
[   37.349317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.0-rc2+ #1
[   37.349322] Hardware name: 9006-22P POWER9 (raw) 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   37.349324] NIP:  0000000030005104 LR: c0000000080cea00 CTR: c0000000080d7360
[   37.349327] REGS: c000000107be3d60 TRAP: 0100   Not tainted  (6.5.0-rc2+)
[   37.349330] MSR:  9000000000081002 <SF,HV,ME,RI>  CR: 22004484  XER: 0000005b
[   37.349340] CFAR: 000000003000510c IRQMASK: 3 
[   37.349340] GPR00: 0000000000000009 c000000107e63cd0 0000000030000000 00000000000ffff6 
[   37.349340] GPR04: c000000107e63e20 0000000000000040 3ffffffff1ae9700 000000000000000e 
[   37.349340] GPR08: c00000000e516950 0000000000000000 0000000000000000 0000000000000001 
[   37.349340] GPR12: 0000000031ee0000 c000000107fef480 0000000000000000 0000000000000000 
[   37.349340] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   37.349340] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   37.349340] GPR24: 0000000000000009 0000000000000003 0000000000000000 0000000000000000 
[   37.349340] GPR28: c00000000e516950 000000000000000e 3ffffffff1ae9700 c000000107e63e20 
[   37.349394] NIP [0000000030005104] 0x30005104
[   37.349399] LR [c0000000080cea00] opal_return+0x0/0x30
[   37.349406] Call Trace:
[   37.349407] [c000000107e63cd0] [c0000000080cc024] opal_call+0xe4/0x1c0 (unreliable)
[   37.349416] [c000000107e63d90] [c0000000080cc468] opal_handle_interrupt+0x28/0x40
[   37.349423] [c000000107e63e00] [c0000000080d739c] opal_interrupt+0x3c/0xa0
[   37.349430] [c000000107e63e30] [c0000000082029f8] __handle_irq_event_percpu+0x88/0x230
[   37.349437] [c000000107e63ed0] [c000000008202cb4] handle_irq_event+0x74/0x130
[   37.349444] [c000000107e63f00] [c00000000820a86c] handle_fasteoi_irq+0xbc/0x350
[   37.349450] [c000000107e63f40] [c000000008200910] generic_handle_irq+0x50/0x80
[   37.349456] [c000000107e63f60] [c000000008017318] __do_irq+0xb8/0x230
[   37.349462] [c000000107e63fe0] [c000000008017c68] __do_IRQ+0x88/0xe0
[   37.349468] [c00000000e733b10] [0000000000000000] 0x0
[   37.349472] [c00000000e733b50] [c000000008017d10] do_IRQ+0x50/0xb0
[   37.349478] [c00000000e733b80] [c00000000800b63c] h_virt_irq_common_virt+0x28c/0x290
[   37.349486] --- interrupt: ea0 at arch_local_irq_restore.part.0+0x188/0x190
[   37.349492] NIP:  c000000008038098 LR: c000000008fc380c CTR: c0000000080d26e0
[   37.349494] REGS: c00000000e733bb0 TRAP: 0ea0   Not tainted  (6.5.0-rc2+)
[   37.349497] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28004484  XER: 0000005b
[   37.349514] CFAR: 0000000000000000 IRQMASK: 0 
[   37.349514] GPR00: c000000008fc380c c00000000e733e50 c000000009572c00 0000000000000000 
[   37.349514] GPR04: 0000000000000000 0000000000000000 c00000000ac42a80 c000000107fef480 
[   37.349514] GPR08: 00000000f8c90000 0000000000000000 0000000000008002 0000000028002822 
[   37.349514] GPR12: c0000000080d26e0 c000000107fef480 c000001ff4663f90 0000000000000000 
[   37.349514] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   37.349514] GPR20: 0000000000000006 c000001ff4660000 0000000000000000 0000000000000001 
[   37.349514] GPR24: 0000000000000000 0000000030942298 0000000031ee00b0 c00000000ac56760 
[   37.349514] GPR28: 0000000000000002 0000000000000003 0000000000000000 fcffffffffffffff 
[   37.349568] NIP [c000000008038098] arch_local_irq_restore.part.0+0x188/0x190
[   37.349574] LR [c000000008fc380c] default_idle_call+0x6c/0x140
[   37.349579] --- interrupt: ea0
[   37.349580] [c00000000e733e50] [c00000000e733e90] 0xc00000000e733e90 (unreliable)
[   37.349585] [c00000000e733e90] [c000000008fc380c] default_idle_call+0x6c/0x140
[   37.349591] [c00000000e733eb0] [c0000000081ce0bc] cpuidle_idle_call+0x1bc/0x260
[   37.349596] [c00000000e733f10] [c0000000081ce268] do_idle+0x108/0x1c0
[   37.349601] [c00000000e733f60] [c0000000081ce558] cpu_startup_entry+0x38/0x40
[   37.349606] [c00000000e733f90] [c00000000805f88c] start_secondary+0x24c/0x250
[   37.349613] [c00000000e733fe0] [c00000000800e058] start_secondary_prolog+0x10/0x14
[   37.349619] Code: 4c006c81 00000b2c 3c00e241 02000b2c 0c008240 feff6038 00010048 48006c81 ffff6b39 48006c91 780b217c 78fbff7f <4c006c81> 01000b2c f8ff8241 7813427c

Comment 5 Pingfan Liu 2023-07-22 01:50:29 UTC

On this baremetal (ibm-p9b-26.ibm2.lab.eng.bos.redhat.com)

I tried to reproduce this bug by checkout the first bad commit
 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

But the compiled kernel boot up with panic



[  OK  ] Finished Load/Save Random Seed.
[  OK  ] Finished Create Static Device Nodes in /dev.
         Starting Rule-based Manage…for Device Events and Files...
[  OK  ] Finished Monitoring of LVM… dmeventd or progress polling.
[  OK  ] Started Rule-based Manager for Device Events and Files.
         Starting Load Kernel Module configfs...
[  OK  ] Finished Load Kernel Module configfs.
         Starting Load Kernel Module fuse...
[  OK  ] Finished Load Kernel Module fuse.
[   10.694120] IPMI message handler: version 39.2
[   10.771425] ipmi device interface
[   10.830605] ipmi-powernv ibm,opal:ipmi: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x0985, dev_id: 0x20)
[   10.870743] at24 0-0050: 16384 byte 24c128 EEPROM, writable, 1 bytes/write
[   10.917802] at24 2-0050: 32768 byte 24c256 EEPROM, writable, 1 bytes/write
[   24.024955] watchdog: CPU 4 detected hard LOCKUP on other CPUs 6
[   24.024980] watchdog: CPU 4 TB:19395148891436, last SMP heartbeat TB:19386956899153 (15999ms ago)
[   24.025121] watchdog: CPU 6 Hard LOCKUP
[   24.025123] watchdog: CPU 6 TB:19395148977304, last heartbeat TB:19386956898656 (16000ms ago)
[   24.025126] Modules linked in: at24 ipmi_powernv ofpart regmap_i2c ipmi_devintf powernv_flash opal_prd ibmpowernv ipmi_msghandler mtd xfs libcrc32c sd_mod t10_pi sg ast drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_shmem_helper drm i40e vmx_crypto aacraid drm_panel_orientation_quirks fuse
[   24.025157] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.4.0-rc2+ #2
[   24.025161] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   24.025162] NIP:  0000000030005104 LR: c0000000000cf300 CTR: c0000000000d7ad0
[   24.025165] REGS: c000001fff3bbd60 TRAP: 0100   Not tainted  (6.4.0-rc2+)
[   24.025168] MSR:  9000000000081002 <SF,HV,ME,RI>  CR: 22004822  XER: 20040000
[   24.025175] CFAR: 000000003000510c IRQMASK: 3 
[   24.025175] GPR00: 0000000000000009 c000001fff657850 0000000030000000 00000000000ffff6 
[   24.025175] GPR04: c000001fff6579a0 0000000000000000 0000000000000000 c00000000400cbe8 
[   24.025175] GPR08: c00000000400cb08 0000000000000000 c00000000400cbe0 0000000000000001 
[   24.025175] GPR12: 0000000031c30000 c000001fff6bc880 0000000000000000 0000000000000000 
[   24.025175] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   24.025175] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   24.025175] GPR24: 0000000000000009 0000000000000003 c00000000400cbe0 0000000000000000 
[   24.025175] GPR28: c00000000400cb08 c00000000400cbe8 0000000000000000 c000001fff6579a0 
[   24.025216] NIP [0000000030005104] 0x30005104
[   24.025221] LR [c0000000000cf300] opal_return+0x0/0x30
[   24.025227] Call Trace:
[   24.025228] [c000001fff657850] [c0000000000cc8a4] opal_call+0xe4/0x1c0 (unreliable)
[   24.025235] [c000001fff657910] [c0000000000ccce8] opal_handle_interrupt+0x28/0x40
[   24.025240] [c000001fff657980] [c0000000000d7b0c] opal_interrupt+0x3c/0xa0
[   24.025246] [c000001fff6579b0] [c000000000203808] __handle_irq_event_percpu+0x88/0x230
[   24.025251] [c000001fff657a50] [c000000000203ac4] handle_irq_event+0x74/0x130
[   24.025256] [c000001fff657a80] [c00000000020b3ac] handle_fasteoi_irq+0xbc/0x300
[   24.025261] [c000001fff657ac0] [c0000000002018d0] generic_handle_irq+0x50/0x80
[   24.025266] [c000001fff657ae0] [c000000000017f98] __do_irq+0xb8/0x230
[   24.025271] [c000001fff657b60] [c000000000018918] __do_IRQ+0xb8/0xe0
[   24.025275] [c000001fff657ba0] [c000000000018990] do_IRQ+0x50/0xb0
[   24.025280] [c000001fff657bd0] [c00000000000b63c] h_virt_irq_common_virt+0x28c/0x290
[   24.025286] --- interrupt: ea0 at arch_local_irq_restore.part.0+0x188/0x190
[   24.025291] NIP:  c0000000000386f8 LR: c000000000fbfb98 CTR: c000000000029310
[   24.025293] REGS: c000001fff657c00 TRAP: 0ea0   Not tainted  (6.4.0-rc2+)
[   24.025295] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22004822  XER: 20040000
[   24.025305] CFAR: 0000000000000000 IRQMASK: 0 
[   24.025305] GPR00: c000000000fbfb98 c000001fff657ea0 c000000001552900 0000000000000000 
[   24.025305] GPR04: c000001ffa050400 ffffffffffffffff 0005f5e100000000 000000000083126f 
[   24.025305] GPR08: 0000001ff7ef0000 0000000000000000 0000000000008002 0000000000004000 
[   24.025305] GPR12: c000000000029310 c000001fff6bc880 0000000000000000 0000000000000000 
[   24.025305] GPR16: 0000000000000001 c000000002ba2a80 0000000000000000 00000000ffff8f15 
[   24.025305] GPR20: c000000002167888 000000000000000a c0000000021f2000 0000000000000000 
[   24.025305] GPR24: 0000000000000000 0000001ff7ef0000 c000000003831680 c000000002bb61e0 
[   24.025305] GPR28: 0000000000000002 0000000000000003 c000000002160400 fcffffffffffffff 
[   24.025346] NIP [c0000000000386f8] arch_local_irq_restore.part.0+0x188/0x190
[   24.025350] LR [c000000000fbfb98] __do_softirq+0xe8/0x3dc
[   24.025355] --- interrupt: ea0
[   24.025356] [c000001fff657ea0] [c000000003831680] 0xc000000003831680 (unreliable)
[   24.025360] [c000001fff657ee0] [c000000000fbfb98] __do_softirq+0xe8/0x3dc
[   24.025365] [c000001fff657fe0] [c000000000018a30] do_softirq_own_stack+0x40/0x60
[   24.025370] [c0000000038b39f0] [c00000000015a268] __irq_exit_rcu+0x158/0x190
[   24.025376] [c0000000038b3a20] [c00000000015adc0] irq_exit+0x20/0x40
[   24.025381] [c0000000038b3a40] [c0000000000297c4] timer_interrupt+0x174/0x320
[   24.025386] [c0000000038b3aa0] [c000000000009f8c] decrementer_common_virt+0x28c/0x290
[   24.025391] --- interrupt: 900 at arch_local_irq_restore.part.0+0x110/0x190
[   24.025396] NIP:  c000000000038680 LR: c000000000038658 CTR: c0000000000291d0
[   24.025398] REGS: c0000000038b3ad0 TRAP: 0900   Not tainted  (6.4.0-rc2+)
[   24.025400] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004822  XER: 00000000
[   24.025410] CFAR: 0000000000000000 IRQMASK: 0 
[   24.025410] GPR00: c000000000038658 c0000000038b3d70 c000000001552900 000000028a2a36f3 
[   24.025410] GPR04: 0000000000000001 ffffffffffffffff 0000000000000004 0000001ff7ef0000 
[   24.025410] GPR08: c000001ffa0decf8 0000000000000000 0000000000008002 0000000000004000 
[   24.025410] GPR12: c0000000000291d0 c000001fff6bc880 c000001ff44cff90 0000000000000000 
[   24.025410] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   24.025410] GPR20: 0000000000000006 c000001ff44cc000 c000000002010030 0000000000000001 
[   24.025410] GPR24: 0000000000000000 0000000000000004 000000028aff879e 0000000000000004 
[   24.025410] GPR28: 0000000000000002 0000000000000003 0000000000000004 fcffffffffffffff 
[   24.025450] NIP [c000000000038680] arch_local_irq_restore.part.0+0x110/0x190
[   24.025454] LR [c000000000038658] arch_local_irq_restore.part.0+0xe8/0x190
[   24.025458] --- interrupt: 900
[   24.025459] [c0000000038b3db0] [c000000000fb3bf8] cpuidle_enter_state+0xf8/0x5d8
[   24.025463] [c0000000038b3e50] [c000000000bd951c] cpuidle_enter+0x4c/0x70
[   24.025468] [c0000000038b3e90] [c0000000001c778c] call_cpuidle+0x4c/0xa0
[   24.025473] [c0000000038b3eb0] [c0000000001ceda8] cpuidle_idle_call+0x168/0x260
[   24.025478] [c0000000038b3f10] [c0000000001cefa8] do_idle+0x108/0x1c0
[   24.025483] [c0000000038b3f60] [c0000000001cf29c] cpu_startup_entry+0x3c/0x40
[   24.025489] [c0000000038b3f90] [c00000000005feec] start_secondary+0x24c/0x250
[   24.025494] [c0000000038b3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
[   24.025498] Code: 4c006c81 00000b2c 3c00e241 02000b2c 0c008240 feff6038 00010048 48006c81 ffff6b39 48006c91 780b217c 78fbff7f <4c006c81> 01000b2c f8ff8241 7813427c 
[   70.924966] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   70.924990] rcu:     6-...0: (1 GPs behind) idle=a024/1/0x4000000000000002 softirq=177/179 fqs=2994
[   70.925018] rcu:     (detected by 16, t=6002 jiffies, g=-67, q=10478 ncpus=80)
[   70.925031] Sending NMI from CPU 16 to CPUs 6:
[   76.514210] CPU 6 didn't respond to backtrace IPI, inspecting paca.
[   76.514228] irq_soft_mask: 0x03 in_mce: 0 in_nmi: 0 current: 0 (swapper/6)
[   76.514249] Back trace of paca->saved_r1 (0xc0000000038b3c50) (possibly stale):
[   76.514262] Call Trace:
[   76.514270] rcu: rcu_sched kthread starved for 558 jiffies! g-67 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=5
[   76.514295] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   76.514327] rcu: RCU grace-period kthread stack dump:
[   76.514344] task:rcu_sched       state:I stack:0     pid:15    ppid:2      flags:0x00000000
[   76.514369] Call Trace:
[   76.514375] [c0000000038f7a40] [c000001ff9fa2900] 0xc000001ff9fa2900 (unreliable)
[   76.514409] [c0000000038f7bf0] [c00000000001fcd0] __switch_to+0x130/0x220
[   76.514443] [c0000000038f7c50] [c000000000fb4d58] __schedule+0x258/0x6d0
[   76.514475] [c0000000038f7d20] [c000000000fb5244] schedule+0x74/0x140
[   76.514506] [c0000000038f7d90] [c000000000fbdb34] schedule_timeout+0xa4/0x1d0
[   76.514540] [c0000000038f7e60] [c000000000224eac] rcu_gp_fqs_loop+0x40c/0x540
[   76.514574] [c0000000038f7f00] [c000000000229bd0] rcu_gp_kthread+0x190/0x200
[   76.514608] [c0000000038f7f90] [c00000000018b018] kthread+0x138/0x140
[   76.514640] [c0000000038f7fe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[   76.514673] rcu: Stack dump where RCU GP kthread last ran:
[   76.514691] Sending NMI from CPU 16 to CPUs 5:
[   76.514712] NMI backtrace for cpu 5
[   76.514733] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.4.0-rc2+ #2
[   76.514772] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   76.514821] NIP:  c0000000000383bc LR: c0000000000386c8 CTR: c0000000000291d0
[   76.514870] REGS: c0000000038f3be8 TRAP: 0a00   Not tainted  (6.4.0-rc2+)
[   76.514906] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004424  XER: 00000000
[   76.514962] CFAR: 0000000000000000 IRQMASK: 0 
[   76.514962] GPR00: c0000000000386c8 c0000000038f3d70 c000000001552900 c0000000038f3bb8 
[   76.514962] GPR04: 00000011cfbdb9e1 ffffffffffffffff 002887fa00000000 0000000000000018 
[   76.514962] GPR08: 0000000000003b08 0000000000000043 0000001ff7e50000 00000000000026f9 
[   76.514962] GPR12: c0000000000291d0 c000001fff7fc680 c000001ff44cbf90 0000000000000000 
[   76.514962] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   76.514962] GPR20: 0000000000000006 c000001ff44c8000 c000000002010030 0000000000000001 
[   76.514962] GPR24: 0000000000000000 0000000000000004 00000011d09b71c4 0000000000000004 
[   76.514962] GPR28: 0000000000000002 0000000000000003 0000000000000004 fcffffffffffffff 
[   76.515291] NIP [c0000000000383bc] __replay_soft_interrupts+0x3c/0x160
[   76.515332] LR [c0000000000386c8] arch_local_irq_restore.part.0+0x158/0x190
[   76.515371] Call Trace:
[   76.515390] [c0000000038f3d70] [c0000000000386c8] arch_local_irq_restore.part.0+0x158/0x190 (unreliable)
[   76.515441] [c0000000038f3db0] [c000000000fb3bf8] cpuidle_enter_state+0xf8/0x5d8
[   76.515482] [c0000000038f3e50] [c000000000bd951c] cpuidle_enter+0x4c/0x70
[   76.515520] [c0000000038f3e90] [c0000000001c778c] call_cpuidle+0x4c/0xa0
[   76.515556] [c0000000038f3eb0] [c0000000001ceda8] cpuidle_idle_call+0x168/0x260
[   76.515604] [c0000000038f3f10] [c0000000001cefa8] do_idle+0x108/0x1c0
[   76.515645] [c0000000038f3f60] [c0000000001cf29c] cpu_startup_entry+0x3c/0x40
[   76.515684] [c0000000038f3f90] [c00000000005feec] start_secondary+0x24c/0x250
[   76.515734] [c0000000038f3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
[   76.515787] Code: 60000000 7c0802a6 f8010010 f821fe51 e92d0af8 f92101a8 39200000 38610028 892d0933 61290040 992d0933 48044359 <60000000> 39200000 e9410130 f9210160

Comment 6 Pingfan Liu 2023-07-23 13:33:21 UTC

(In reply to Dave Young from comment #3)
> Created attachment 1976289 [details]
> untested patch
> 
> Thanks Coiby for bisecting.  If any of you have the machine, could you try
> the untested patch see if it works?

Test it on ibm-p9z-06-lp9.khw3.lab.eng.bos.redhat.com. Before this patch, it can not work with bad commit 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

After this patch, the vmcore can be saved.

Comment 7 Pingfan Liu 2023-07-25 02:12:06 UTC

I have opened an upstream bug: https://bugzilla.kernel.org/show_bug.cgi?id=217702

Comment 8 Pingfan Liu 2023-07-25 09:14:19 UTC

This issue has been fixed in upstream by

106ea7ffd56b ("Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"")

Comment 9 Scott Weaver 2023-07-25 17:06:33 UTC

Thanks Pingfan. Feel free to reopen this if needed.

Comment 10 Gursewak Singh 2023-09-28 23:05:57 UTC

We are seeing this issue again. We weren't testing the kdump.crash in FCOS-Rawhide due to other Selinux-policy related issue so didn't catch this early. Apparently, the transition of kernel version that seems to have caused this is `kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40` -> `kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40`.

In short, the `kdump.crash` test in Rawhide:
Passes with `kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40`
Fails with `kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40`

Comment 11 Dusty Mabe 2023-09-29 14:51:11 UTC

Let's open a new BZ. I think this is probably a new regression.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
bhe
bskeggs
dustymabe
gurssing
hdegoede
hpa
jarodwilson
josef
kernel-maint
lgoncalv
linville
ltao
masami256
mchehab
piliu
ptalbert
ruyang
scweaver
steved