Bug 2214351 - Fail to provision system: Unable to handle write to read-only memory in EFI runtime service [NEEDINFO]
Summary: Fail to provision system: Unable to handle write to read-only memory in EFI r...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-12 18:00 UTC by Scott Weaver
Modified: 2023-07-21 20:35 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:
scweaver: needinfo? (msalter)


Attachments (Terms of Use)

Description Scott Weaver 2023-06-12 18:00:53 UTC
While testing BZ2159239, an EFI issue was reported while trying to provision a system running Fedora rawhide version 6.4.0-0.rc4.20230529gite338142b39cf.35.fc39.aarch64.

https://beaker.engineering.redhat.com/recipes/14017452#installation




Reproducible: Didn't try

Steps to Reproduce:
1. Provision an Ampere (Lenovo HR350A) with Fedora rawhide
2
3.
Actual Results:  
Provisioning fails.


[   27.383436] CPU: 28 PID: 202 Comm: kworker/u64:2 Tainted: G          I       -------  ---  6.4.0-0.rc4.20230529gite338142b39cf.35.fc39.aarch64 #1 
m - D-Bus System Message Bus.  
[   27.399235] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021 
  
[   27.411824] Workqueue: efi_rts_wq efi_call_rts 
[   27.416512] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO -DIT -SSBS BTYPE=--) 
[   27.423467] pc : efi_call_virt_check_flags+0x48/0xb8 
[   27.428424] lr : efi_call_rts+0x3a8/0x4c8 
[   27.432422] sp : ffff80001275bd20 
[   27.435724] x29: ffff80001275bd20 x28: 0000000000000000 x27: 0000000000000000 
[   27.442848] x26: 0000000000000000 x25: ffff80000af9ca28 x24: ffff80001280bd88 
[   27.449973] x23: ffff80001280bd40 x22: ffff800009a6c530 x21: 0000000000000080 
[   27.457097] x20: ffff800009a6c530 x19: 0000000000000000 x18: ffffffffffffffff 
[   27.464221] x17: 0000000000000000 x16: ffff80000c49c000 x15: 0000000000000000 
[   27.471345] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 
[   27.478469] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800008edf2b8 
[   27.485594] x8 : 0000000000000000 x7 : 4a6851c6cc745c00 x6 : ffff80000c49bad0 
[   27.492718] x5 : ffff80000a3a5008 x4 : 000000fff6dc0018 x3 : 0000000000000001 
[   27.499842] x2 : ffff800009a6a627 x1 : ffff800009a6c530 x0 : 0000000000000080 
[   27.506966] Call trace: 
[   27.509401]  efi_call_virt_check_flags+0x48/0xb8 
[   27.514007]  efi_call_rts+0x3a8/0x4c8 
[   27.517658]  process_one_work+0x1e4/0x488 
[   27.521657]  worker_thread+0x74/0x418 
[   27.525306]  kthread+0xf4/0x108 
[   27.528438]  ret_from_fork+0x10/0x20 
[   27.532002] ---[ end trace 0000000000000000 ]--- 
[   27.536606] Disabling lock debugging due to kernel taint 
[   27.541905] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x00000080) by EFI set_variable 
[   27.550947] ------------[ cut here ]------------ 
[   27.555556] WARNING: CPU: 26 PID: 224 at drivers/firmware/efi/runtime-wrappers.c:341 virt_efi_set_variable+0x194/0x1b0 
[   27.566244] Modules linked in: uas usb_storage nvme dwc3 igb nvme_core crct10dif_ce udc_core ast polyval_ce ulpi polyval_generic ghash_ce mlx4_core(+) sbsa_gwdt nvme_common i2c_algo_bit ahci_platform i2c_xgene_slimpro libahci_platform xgene_hwmon gpio_dwapb xhci_plat_hcd sunrpc lrw dm_crypt dm_round_robin linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 scsi_dh_hp_sw squashfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath fuse 
[   27.621611] CPU: 26 PID: 224 Comm: kworker/26:1 Tainted: G        W I       -------  ---  6.4.0-0.rc4.20230529gite338142b39cf.35.fc39.aarch64 #1 
[   27.634550] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021 
[   27.644362] Workqueue: events refresh_nv_rng_seed 
[   27.649056] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) 
[   27.656005] pc : virt_efi_set_variable+0x194/0x1b0 
[   27.660785] lr : virt_efi_set_variable+0x178/0x1b0 
[   27.665564] sp : ffff80001280bcf0 
[   27.668866] x29: ffff80001280bcf0 x28: 0000000000000000 x27: 0000000000000000 
[   27.675991] x26: ffff000807442674 x25: ffff80000b9a0848 x24: ffff80000b9a0000 
[   27.683116] x23: ffff800009a6a628 x22: ffff80001280bd78 x21: 8000000000000015 
[   27.690240] x20: ffff80000ae7db20 x19: ffff80000b9a07d0 x18: 0000000000000014 
[   27.697365] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 
[   27.704489] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 
[   27.711613] x11: 0000000000000000 x10: 0000000000001da0 x9 : ffff800009263ae8 
[   27.718737] x8 : ffff000807559e00 x7 : 0000000000000000 x6 : 00000000000000b0 
[   27.725861] x5 : 00000000500f0000 x4 : 0000000000000000 x3 : 0000000000000001 
[   27.732985] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 8000000000000015 
[   27.740110] Call trace: 
[   27.742544]  virt_efi_set_variable+0x194/0x1b0 
[   27.746977]  refresh_nv_rng_seed+0x88/0xc8 
[   27.751061]  process_one_work+0x1e4/0x488 
[   27.755059]  worker_thread+0x74/0x418 
[   27.758709]  kthread+0xf4/0x108 
[   27.761839]  ret_from_fork+0x10/0x20 
[   27.765403] ---[ end trace 0000000000000000 ]---



Expected Results:  
We should be able to run Fedora rawhide on a Lenovo HR350A system.

The complete log: 
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/06/79230/7923076/14017452/console.log

Comment 1 Eirik Fuller 2023-06-12 20:18:47 UTC
Adding efi=novamap to the 5.14.0-319.el9 kernel command line on ampere-hr350a-04 does not disrupt the boot process, which further suggests that the firmware issue on Lenovo HR350A systems is different from the firmware issue in bug 2159239. A reboot of a kernel with that command line option was problematic, however, with different symptoms, as follows.


[  807.808415] CPU: 6 PID: 239 Comm: kworker/6:1 Kdump: loaded Tainted: G             L X  -------  ---  5.14.0-319.el9.aarch64 #1
[  807.819877] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
[  807.829689] Workqueue: rcu_par_gp sync_rcu_exp_select_node_cpus
[  807.835599] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  807.842547] pc : smp_call_function_single+0xf4/0x1e0
[  807.847500] lr : __sync_rcu_exp_select_node_cpus+0x280/0x420
[  807.853147] sp : ffff80000ebe3cc0
[  807.856448] x29: ffff80000ebe3cc0 x28: 0000000000000080 x27: 000000000000ff7f
[  807.863572] x26: ffff800009de41d0 x25: ffff80000a19c340 x24: ffff00be5af0e040
[  807.870696] x23: ffff0008095f9500 x22: ffff8000099a5ca8 x21: 0000000000000080
[  807.877820] x20: ffff8000099aa040 x19: ffff80000ebe3ce0 x18: ffffffffffffffff
[  807.884945] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  807.892068] x14: 0000000000000000 x13: 0000000000000010 x12: 0101010101010101
[  807.899192] x11: ffff8000099a5ca8 x10: 0000000000000001 x9 : ffff8000082465b0
[  807.906316] x8 : ffff80000a194530 x7 : ffff800009995008 x6 : ffff800008248f00
[  807.913440] x5 : 0000000000000000 x4 : ffff00be5aeef408 x3 : 0000000000000001
[  807.920564] x2 : 0000000000000000 x1 : ffff00be5aeef400 x0 : 0000000000000007
[  807.927689] Call trace:
[  807.930122]  smp_call_function_single+0xf4/0x1e0
[  807.934728]  __sync_rcu_exp_select_node_cpus+0x280/0x420
[  807.940027]  sync_rcu_exp_select_node_cpus+0x18/0x20
[  807.944980]  process_one_work+0x1e4/0x4c0
[  807.948976]  worker_thread+0x220/0x450
[  807.952713]  kthread+0xe8/0xf4
[  807.955756]  ret_from_fork+0x10/0x20


Lenovo HR350A systems are susceptible to bug 2062958 (yet another firmware issue) but it's not yet clear whether that's related to the problem described here. Offhand that seems doubtful, since the recipe linked in bug 2159239 comment 16 did not show cma=1024M in the kernel command line (indeed it does not typically show up in the PXE boot, which was the only boot in that recipe).

That recipe does show acpi=force in the kernel command line (specified in the job XML). Adding acpi=force to the 5.14.0-319.el9 kernel command line on ampere-hr350a-04 (without efi=novamap) does not seem to disrupt the boot process, so that command line option presumably does not explain the failure in the bug 2159239 comment 16 recipe.

Comment 2 Scott Weaver 2023-07-21 20:30:46 UTC
@

Comment 3 Scott Weaver 2023-07-21 20:35:16 UTC
Hi Mark,

Could you take another look into these types of errors for us?
If this is a firmware issue do we need to reach out to the vendor or what do you think the next steps are here?

Thanks for the help!
Scott


Note You need to log in before you can comment on or make changes to this bug.