Bug 2222526 - kdump failed with the error "Cannot open /proc/vmcore: No such file or directory"
Summary: kdump failed with the error "Cannot open /proc/vmcore: No such file or direct...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Pingfan Liu
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: kdump:Fedora
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-13 04:43 UTC by Coiby
Modified: 2023-09-29 14:51 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-25 17:06:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FC-893 0 None None None 2023-07-13 05:13:36 UTC

Description Coiby 2023-07-13 04:43:34 UTC
Kudmp failed for kernel 6.5.0-0.rc1.11.fc39.ppc64le.

    [   20.681017] systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service... 
    [   20.875696] kdump.sh[429]: kdump: saving to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/ 
    [   20.935902] kdump.sh[429]: kdump: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/ 
    [   20.938392] kdump.sh[474]: Cannot open /proc/vmcore: No such file or directory 
    [   20.940384] kdump.sh[429]: kdump: saving vmcore-dmesg.txt failed 
    [   20.940709] kdump.sh[429]: kdump: saving vmcore 
    [   20.989322] kdump.sh[476]: open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory 
    [   20.996785] kdump.sh[476]: makedumpfile Failed. 
    [   20.997576] kdump.sh[429]: kdump: saving vmcore failed, exitcode:1 
    [   20.997868] kdump.sh[429]: kdump: saving vmcore failed 
    [   21.038209] kdump.sh[429]: kdump: saving the /run/initramfs/kexec-dmesg.log to /sysroot/var/crash/127.0.0.1-2023-07-12-03:35:02/// 
    [   21.046453] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE 
    [   21.046806] systemd[1]: kdump-capture.service: Failed with result 'exit-code'. 


Reproducible: Always

Steps to Reproduce:
1.dnf install kexec-tools kernel-modules -y
2. reboot
3. systemctl start kdump
4. trigger kernel crash
Actual Results:  
kdump failed to save the kernel coredump.

Expected Results:  
kdump successfully saves the kernel coredump.

Originally reported by CoreOS team https://github.com/coreos/fedora-coreos-tracker/issues/1523

Comment 1 Baoquan He 2023-07-17 06:40:31 UTC
In several CKI failure reports, I saw the similar problem. Besides, it has shown it may be caused by corrupted eflcorehdr as below:

[    0.148565] Warning: Core image elf header is not sane 
[    0.148570] Kdump: vmcore not initialized 

Please see one test_console.og from one failed cki case:
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/930627185/4650966145/redhat%3A930627185/build_ppc64le_redhat%3A930627185-ppc64le-kernel/tests/2/results_0001/job.01/recipes/14221455/tasks/7/logs/test_console.log

Thanks
Baoquan

Comment 2 Coiby 2023-07-17 08:20:16 UTC
I did a git bisection using kernel-auto-bisect [1] and the first bad commit is 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

[1] https://gitlab.com/redhat/centos-stream/src/kernel/utils/tools/-/tree/main/kernel-auto-bisect

Comment 3 Dave Young 2023-07-18 02:29:36 UTC
Created attachment 1976289 [details]
untested patch

Thanks Coiby for bisecting.  If any of you have the machine, could you try the untested patch see if it works?

Comment 4 Pingfan Liu 2023-07-21 15:12:27 UTC
Tried Fedora 38, but it has a significant bug, which panics the kernel during the compiling of kernel.

I tried to re-install the baremetal with RHEL-9, then tested the latest upstream kernel but the kdump kernel experiences another type of panic.

[   21.619230] usb 2-4: new SuperSpeed USB device number 2 using xhci_hcd
[   21.671227] usb 2-4: New USB device found, idVendor=0451, idProduct=8140, bcdDevice= 1.00
[   21.671248] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   37.349138] watchdog: CPU 68 detected hard LOCKUP on other CPUs 0
[   37.349156] watchdog: CPU 68 TB:10136783097847, last SMP heartbeat TB:10128591098849 (15999ms ago)
[   37.349307] watchdog: CPU 0 Hard LOCKUP
[   37.349310] watchdog: CPU 0 TB:10136783184586, last heartbeat TB:10126477817591 (20127ms ago)
[   37.349313] Modules linked in:
[   37.349317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.0-rc2+ #1
[   37.349322] Hardware name: 9006-22P POWER9 (raw) 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   37.349324] NIP:  0000000030005104 LR: c0000000080cea00 CTR: c0000000080d7360
[   37.349327] REGS: c000000107be3d60 TRAP: 0100   Not tainted  (6.5.0-rc2+)
[   37.349330] MSR:  9000000000081002 <SF,HV,ME,RI>  CR: 22004484  XER: 0000005b
[   37.349340] CFAR: 000000003000510c IRQMASK: 3 
[   37.349340] GPR00: 0000000000000009 c000000107e63cd0 0000000030000000 00000000000ffff6 
[   37.349340] GPR04: c000000107e63e20 0000000000000040 3ffffffff1ae9700 000000000000000e 
[   37.349340] GPR08: c00000000e516950 0000000000000000 0000000000000000 0000000000000001 
[   37.349340] GPR12: 0000000031ee0000 c000000107fef480 0000000000000000 0000000000000000 
[   37.349340] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   37.349340] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   37.349340] GPR24: 0000000000000009 0000000000000003 0000000000000000 0000000000000000 
[   37.349340] GPR28: c00000000e516950 000000000000000e 3ffffffff1ae9700 c000000107e63e20 
[   37.349394] NIP [0000000030005104] 0x30005104
[   37.349399] LR [c0000000080cea00] opal_return+0x0/0x30
[   37.349406] Call Trace:
[   37.349407] [c000000107e63cd0] [c0000000080cc024] opal_call+0xe4/0x1c0 (unreliable)
[   37.349416] [c000000107e63d90] [c0000000080cc468] opal_handle_interrupt+0x28/0x40
[   37.349423] [c000000107e63e00] [c0000000080d739c] opal_interrupt+0x3c/0xa0
[   37.349430] [c000000107e63e30] [c0000000082029f8] __handle_irq_event_percpu+0x88/0x230
[   37.349437] [c000000107e63ed0] [c000000008202cb4] handle_irq_event+0x74/0x130
[   37.349444] [c000000107e63f00] [c00000000820a86c] handle_fasteoi_irq+0xbc/0x350
[   37.349450] [c000000107e63f40] [c000000008200910] generic_handle_irq+0x50/0x80
[   37.349456] [c000000107e63f60] [c000000008017318] __do_irq+0xb8/0x230
[   37.349462] [c000000107e63fe0] [c000000008017c68] __do_IRQ+0x88/0xe0
[   37.349468] [c00000000e733b10] [0000000000000000] 0x0
[   37.349472] [c00000000e733b50] [c000000008017d10] do_IRQ+0x50/0xb0
[   37.349478] [c00000000e733b80] [c00000000800b63c] h_virt_irq_common_virt+0x28c/0x290
[   37.349486] --- interrupt: ea0 at arch_local_irq_restore.part.0+0x188/0x190
[   37.349492] NIP:  c000000008038098 LR: c000000008fc380c CTR: c0000000080d26e0
[   37.349494] REGS: c00000000e733bb0 TRAP: 0ea0   Not tainted  (6.5.0-rc2+)
[   37.349497] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28004484  XER: 0000005b
[   37.349514] CFAR: 0000000000000000 IRQMASK: 0 
[   37.349514] GPR00: c000000008fc380c c00000000e733e50 c000000009572c00 0000000000000000 
[   37.349514] GPR04: 0000000000000000 0000000000000000 c00000000ac42a80 c000000107fef480 
[   37.349514] GPR08: 00000000f8c90000 0000000000000000 0000000000008002 0000000028002822 
[   37.349514] GPR12: c0000000080d26e0 c000000107fef480 c000001ff4663f90 0000000000000000 
[   37.349514] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   37.349514] GPR20: 0000000000000006 c000001ff4660000 0000000000000000 0000000000000001 
[   37.349514] GPR24: 0000000000000000 0000000030942298 0000000031ee00b0 c00000000ac56760 
[   37.349514] GPR28: 0000000000000002 0000000000000003 0000000000000000 fcffffffffffffff 
[   37.349568] NIP [c000000008038098] arch_local_irq_restore.part.0+0x188/0x190
[   37.349574] LR [c000000008fc380c] default_idle_call+0x6c/0x140
[   37.349579] --- interrupt: ea0
[   37.349580] [c00000000e733e50] [c00000000e733e90] 0xc00000000e733e90 (unreliable)
[   37.349585] [c00000000e733e90] [c000000008fc380c] default_idle_call+0x6c/0x140
[   37.349591] [c00000000e733eb0] [c0000000081ce0bc] cpuidle_idle_call+0x1bc/0x260
[   37.349596] [c00000000e733f10] [c0000000081ce268] do_idle+0x108/0x1c0
[   37.349601] [c00000000e733f60] [c0000000081ce558] cpu_startup_entry+0x38/0x40
[   37.349606] [c00000000e733f90] [c00000000805f88c] start_secondary+0x24c/0x250
[   37.349613] [c00000000e733fe0] [c00000000800e058] start_secondary_prolog+0x10/0x14
[   37.349619] Code: 4c006c81 00000b2c 3c00e241 02000b2c 0c008240 feff6038 00010048 48006c81 ffff6b39 48006c91 780b217c 78fbff7f <4c006c81> 01000b2c f8ff8241 7813427c

Comment 5 Pingfan Liu 2023-07-22 01:50:29 UTC
On this baremetal (ibm-p9b-26.ibm2.lab.eng.bos.redhat.com)

I tried to reproduce this bug by checkout the first bad commit
 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

But the compiled kernel boot up with panic



[  OK  ] Finished Load/Save Random Seed.
[  OK  ] Finished Create Static Device Nodes in /dev.
         Starting Rule-based Manage…for Device Events and Files...
[  OK  ] Finished Monitoring of LVM… dmeventd or progress polling.
[  OK  ] Started Rule-based Manager for Device Events and Files.
         Starting Load Kernel Module configfs...
[  OK  ] Finished Load Kernel Module configfs.
         Starting Load Kernel Module fuse...
[  OK  ] Finished Load Kernel Module fuse.
[   10.694120] IPMI message handler: version 39.2
[   10.771425] ipmi device interface
[   10.830605] ipmi-powernv ibm,opal:ipmi: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x0985, dev_id: 0x20)
[   10.870743] at24 0-0050: 16384 byte 24c128 EEPROM, writable, 1 bytes/write
[   10.917802] at24 2-0050: 32768 byte 24c256 EEPROM, writable, 1 bytes/write
[   24.024955] watchdog: CPU 4 detected hard LOCKUP on other CPUs 6
[   24.024980] watchdog: CPU 4 TB:19395148891436, last SMP heartbeat TB:19386956899153 (15999ms ago)
[   24.025121] watchdog: CPU 6 Hard LOCKUP
[   24.025123] watchdog: CPU 6 TB:19395148977304, last heartbeat TB:19386956898656 (16000ms ago)
[   24.025126] Modules linked in: at24 ipmi_powernv ofpart regmap_i2c ipmi_devintf powernv_flash opal_prd ibmpowernv ipmi_msghandler mtd xfs libcrc32c sd_mod t10_pi sg ast drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_shmem_helper drm i40e vmx_crypto aacraid drm_panel_orientation_quirks fuse
[   24.025157] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.4.0-rc2+ #2
[   24.025161] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   24.025162] NIP:  0000000030005104 LR: c0000000000cf300 CTR: c0000000000d7ad0
[   24.025165] REGS: c000001fff3bbd60 TRAP: 0100   Not tainted  (6.4.0-rc2+)
[   24.025168] MSR:  9000000000081002 <SF,HV,ME,RI>  CR: 22004822  XER: 20040000
[   24.025175] CFAR: 000000003000510c IRQMASK: 3 
[   24.025175] GPR00: 0000000000000009 c000001fff657850 0000000030000000 00000000000ffff6 
[   24.025175] GPR04: c000001fff6579a0 0000000000000000 0000000000000000 c00000000400cbe8 
[   24.025175] GPR08: c00000000400cb08 0000000000000000 c00000000400cbe0 0000000000000001 
[   24.025175] GPR12: 0000000031c30000 c000001fff6bc880 0000000000000000 0000000000000000 
[   24.025175] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   24.025175] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   24.025175] GPR24: 0000000000000009 0000000000000003 c00000000400cbe0 0000000000000000 
[   24.025175] GPR28: c00000000400cb08 c00000000400cbe8 0000000000000000 c000001fff6579a0 
[   24.025216] NIP [0000000030005104] 0x30005104
[   24.025221] LR [c0000000000cf300] opal_return+0x0/0x30
[   24.025227] Call Trace:
[   24.025228] [c000001fff657850] [c0000000000cc8a4] opal_call+0xe4/0x1c0 (unreliable)
[   24.025235] [c000001fff657910] [c0000000000ccce8] opal_handle_interrupt+0x28/0x40
[   24.025240] [c000001fff657980] [c0000000000d7b0c] opal_interrupt+0x3c/0xa0
[   24.025246] [c000001fff6579b0] [c000000000203808] __handle_irq_event_percpu+0x88/0x230
[   24.025251] [c000001fff657a50] [c000000000203ac4] handle_irq_event+0x74/0x130
[   24.025256] [c000001fff657a80] [c00000000020b3ac] handle_fasteoi_irq+0xbc/0x300
[   24.025261] [c000001fff657ac0] [c0000000002018d0] generic_handle_irq+0x50/0x80
[   24.025266] [c000001fff657ae0] [c000000000017f98] __do_irq+0xb8/0x230
[   24.025271] [c000001fff657b60] [c000000000018918] __do_IRQ+0xb8/0xe0
[   24.025275] [c000001fff657ba0] [c000000000018990] do_IRQ+0x50/0xb0
[   24.025280] [c000001fff657bd0] [c00000000000b63c] h_virt_irq_common_virt+0x28c/0x290
[   24.025286] --- interrupt: ea0 at arch_local_irq_restore.part.0+0x188/0x190
[   24.025291] NIP:  c0000000000386f8 LR: c000000000fbfb98 CTR: c000000000029310
[   24.025293] REGS: c000001fff657c00 TRAP: 0ea0   Not tainted  (6.4.0-rc2+)
[   24.025295] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22004822  XER: 20040000
[   24.025305] CFAR: 0000000000000000 IRQMASK: 0 
[   24.025305] GPR00: c000000000fbfb98 c000001fff657ea0 c000000001552900 0000000000000000 
[   24.025305] GPR04: c000001ffa050400 ffffffffffffffff 0005f5e100000000 000000000083126f 
[   24.025305] GPR08: 0000001ff7ef0000 0000000000000000 0000000000008002 0000000000004000 
[   24.025305] GPR12: c000000000029310 c000001fff6bc880 0000000000000000 0000000000000000 
[   24.025305] GPR16: 0000000000000001 c000000002ba2a80 0000000000000000 00000000ffff8f15 
[   24.025305] GPR20: c000000002167888 000000000000000a c0000000021f2000 0000000000000000 
[   24.025305] GPR24: 0000000000000000 0000001ff7ef0000 c000000003831680 c000000002bb61e0 
[   24.025305] GPR28: 0000000000000002 0000000000000003 c000000002160400 fcffffffffffffff 
[   24.025346] NIP [c0000000000386f8] arch_local_irq_restore.part.0+0x188/0x190
[   24.025350] LR [c000000000fbfb98] __do_softirq+0xe8/0x3dc
[   24.025355] --- interrupt: ea0
[   24.025356] [c000001fff657ea0] [c000000003831680] 0xc000000003831680 (unreliable)
[   24.025360] [c000001fff657ee0] [c000000000fbfb98] __do_softirq+0xe8/0x3dc
[   24.025365] [c000001fff657fe0] [c000000000018a30] do_softirq_own_stack+0x40/0x60
[   24.025370] [c0000000038b39f0] [c00000000015a268] __irq_exit_rcu+0x158/0x190
[   24.025376] [c0000000038b3a20] [c00000000015adc0] irq_exit+0x20/0x40
[   24.025381] [c0000000038b3a40] [c0000000000297c4] timer_interrupt+0x174/0x320
[   24.025386] [c0000000038b3aa0] [c000000000009f8c] decrementer_common_virt+0x28c/0x290
[   24.025391] --- interrupt: 900 at arch_local_irq_restore.part.0+0x110/0x190
[   24.025396] NIP:  c000000000038680 LR: c000000000038658 CTR: c0000000000291d0
[   24.025398] REGS: c0000000038b3ad0 TRAP: 0900   Not tainted  (6.4.0-rc2+)
[   24.025400] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004822  XER: 00000000
[   24.025410] CFAR: 0000000000000000 IRQMASK: 0 
[   24.025410] GPR00: c000000000038658 c0000000038b3d70 c000000001552900 000000028a2a36f3 
[   24.025410] GPR04: 0000000000000001 ffffffffffffffff 0000000000000004 0000001ff7ef0000 
[   24.025410] GPR08: c000001ffa0decf8 0000000000000000 0000000000008002 0000000000004000 
[   24.025410] GPR12: c0000000000291d0 c000001fff6bc880 c000001ff44cff90 0000000000000000 
[   24.025410] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   24.025410] GPR20: 0000000000000006 c000001ff44cc000 c000000002010030 0000000000000001 
[   24.025410] GPR24: 0000000000000000 0000000000000004 000000028aff879e 0000000000000004 
[   24.025410] GPR28: 0000000000000002 0000000000000003 0000000000000004 fcffffffffffffff 
[   24.025450] NIP [c000000000038680] arch_local_irq_restore.part.0+0x110/0x190
[   24.025454] LR [c000000000038658] arch_local_irq_restore.part.0+0xe8/0x190
[   24.025458] --- interrupt: 900
[   24.025459] [c0000000038b3db0] [c000000000fb3bf8] cpuidle_enter_state+0xf8/0x5d8
[   24.025463] [c0000000038b3e50] [c000000000bd951c] cpuidle_enter+0x4c/0x70
[   24.025468] [c0000000038b3e90] [c0000000001c778c] call_cpuidle+0x4c/0xa0
[   24.025473] [c0000000038b3eb0] [c0000000001ceda8] cpuidle_idle_call+0x168/0x260
[   24.025478] [c0000000038b3f10] [c0000000001cefa8] do_idle+0x108/0x1c0
[   24.025483] [c0000000038b3f60] [c0000000001cf29c] cpu_startup_entry+0x3c/0x40
[   24.025489] [c0000000038b3f90] [c00000000005feec] start_secondary+0x24c/0x250
[   24.025494] [c0000000038b3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
[   24.025498] Code: 4c006c81 00000b2c 3c00e241 02000b2c 0c008240 feff6038 00010048 48006c81 ffff6b39 48006c91 780b217c 78fbff7f <4c006c81> 01000b2c f8ff8241 7813427c 
[   70.924966] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   70.924990] rcu:     6-...0: (1 GPs behind) idle=a024/1/0x4000000000000002 softirq=177/179 fqs=2994
[   70.925018] rcu:     (detected by 16, t=6002 jiffies, g=-67, q=10478 ncpus=80)
[   70.925031] Sending NMI from CPU 16 to CPUs 6:
[   76.514210] CPU 6 didn't respond to backtrace IPI, inspecting paca.
[   76.514228] irq_soft_mask: 0x03 in_mce: 0 in_nmi: 0 current: 0 (swapper/6)
[   76.514249] Back trace of paca->saved_r1 (0xc0000000038b3c50) (possibly stale):
[   76.514262] Call Trace:
[   76.514270] rcu: rcu_sched kthread starved for 558 jiffies! g-67 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=5
[   76.514295] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   76.514327] rcu: RCU grace-period kthread stack dump:
[   76.514344] task:rcu_sched       state:I stack:0     pid:15    ppid:2      flags:0x00000000
[   76.514369] Call Trace:
[   76.514375] [c0000000038f7a40] [c000001ff9fa2900] 0xc000001ff9fa2900 (unreliable)
[   76.514409] [c0000000038f7bf0] [c00000000001fcd0] __switch_to+0x130/0x220
[   76.514443] [c0000000038f7c50] [c000000000fb4d58] __schedule+0x258/0x6d0
[   76.514475] [c0000000038f7d20] [c000000000fb5244] schedule+0x74/0x140
[   76.514506] [c0000000038f7d90] [c000000000fbdb34] schedule_timeout+0xa4/0x1d0
[   76.514540] [c0000000038f7e60] [c000000000224eac] rcu_gp_fqs_loop+0x40c/0x540
[   76.514574] [c0000000038f7f00] [c000000000229bd0] rcu_gp_kthread+0x190/0x200
[   76.514608] [c0000000038f7f90] [c00000000018b018] kthread+0x138/0x140
[   76.514640] [c0000000038f7fe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[   76.514673] rcu: Stack dump where RCU GP kthread last ran:
[   76.514691] Sending NMI from CPU 16 to CPUs 5:
[   76.514712] NMI backtrace for cpu 5
[   76.514733] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.4.0-rc2+ #2
[   76.514772] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[   76.514821] NIP:  c0000000000383bc LR: c0000000000386c8 CTR: c0000000000291d0
[   76.514870] REGS: c0000000038f3be8 TRAP: 0a00   Not tainted  (6.4.0-rc2+)
[   76.514906] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004424  XER: 00000000
[   76.514962] CFAR: 0000000000000000 IRQMASK: 0 
[   76.514962] GPR00: c0000000000386c8 c0000000038f3d70 c000000001552900 c0000000038f3bb8 
[   76.514962] GPR04: 00000011cfbdb9e1 ffffffffffffffff 002887fa00000000 0000000000000018 
[   76.514962] GPR08: 0000000000003b08 0000000000000043 0000001ff7e50000 00000000000026f9 
[   76.514962] GPR12: c0000000000291d0 c000001fff7fc680 c000001ff44cbf90 0000000000000000 
[   76.514962] GPR16: 0000000000000000 c00000000002d18c c00000000002d164 c0000000020100e4 
[   76.514962] GPR20: 0000000000000006 c000001ff44c8000 c000000002010030 0000000000000001 
[   76.514962] GPR24: 0000000000000000 0000000000000004 00000011d09b71c4 0000000000000004 
[   76.514962] GPR28: 0000000000000002 0000000000000003 0000000000000004 fcffffffffffffff 
[   76.515291] NIP [c0000000000383bc] __replay_soft_interrupts+0x3c/0x160
[   76.515332] LR [c0000000000386c8] arch_local_irq_restore.part.0+0x158/0x190
[   76.515371] Call Trace:
[   76.515390] [c0000000038f3d70] [c0000000000386c8] arch_local_irq_restore.part.0+0x158/0x190 (unreliable)
[   76.515441] [c0000000038f3db0] [c000000000fb3bf8] cpuidle_enter_state+0xf8/0x5d8
[   76.515482] [c0000000038f3e50] [c000000000bd951c] cpuidle_enter+0x4c/0x70
[   76.515520] [c0000000038f3e90] [c0000000001c778c] call_cpuidle+0x4c/0xa0
[   76.515556] [c0000000038f3eb0] [c0000000001ceda8] cpuidle_idle_call+0x168/0x260
[   76.515604] [c0000000038f3f10] [c0000000001cefa8] do_idle+0x108/0x1c0
[   76.515645] [c0000000038f3f60] [c0000000001cf29c] cpu_startup_entry+0x3c/0x40
[   76.515684] [c0000000038f3f90] [c00000000005feec] start_secondary+0x24c/0x250
[   76.515734] [c0000000038f3fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
[   76.515787] Code: 60000000 7c0802a6 f8010010 f821fe51 e92d0af8 f92101a8 39200000 38610028 892d0933 61290040 992d0933 48044359 <60000000> 39200000 e9410130 f9210160

Comment 6 Pingfan Liu 2023-07-23 13:33:21 UTC
(In reply to Dave Young from comment #3)
> Created attachment 1976289 [details]
> untested patch
> 
> Thanks Coiby for bisecting.  If any of you have the machine, could you try
> the untested patch see if it works?

Test it on ibm-p9z-06-lp9.khw3.lab.eng.bos.redhat.com. Before this patch, it can not work with bad commit 606787fed7268feb256957872586370b56af697a "powerpc/64s: Remove support for ELFv1 little endian userspace".

After this patch, the vmcore can be saved.

Comment 7 Pingfan Liu 2023-07-25 02:12:06 UTC
I have opened an upstream bug: https://bugzilla.kernel.org/show_bug.cgi?id=217702

Comment 8 Pingfan Liu 2023-07-25 09:14:19 UTC
This issue has been fixed in upstream by

106ea7ffd56b ("Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"")

Comment 9 Scott Weaver 2023-07-25 17:06:33 UTC
Thanks Pingfan. Feel free to reopen this if needed.

Comment 10 Gursewak Singh 2023-09-28 23:05:57 UTC
We are seeing this issue again. We weren't testing the kdump.crash in FCOS-Rawhide due to other Selinux-policy related issue so didn't catch this early. Apparently, the transition of kernel version that seems to have caused this is `kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40` -> `kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40`.

In short, the `kdump.crash` test in Rawhide:
Passes with `kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40`
Fails with `kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40`

Comment 11 Dusty Mabe 2023-09-29 14:51:11 UTC
Let's open a new BZ. I think this is probably a new regression.


Note You need to log in before you can comment on or make changes to this bug.