Bug 1949334 - Fedora installation fails on an amberwing with btrfs
Summary: Fedora installation fails on an amberwing with btrfs
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: fedora-kernel-btrfs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-14 03:00 UTC by Paul Whalen
Modified: 2022-06-07 21:08 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-07 21:08:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl (2.11 MB, text/plain)
2021-04-14 03:02 UTC, Paul Whalen
no flags Details

Description Paul Whalen 2021-04-14 03:00:35 UTC
1. Please describe the problem:

Attempting to install Fedora-34-20210413.n.0 on an Amberwing (aarch64) crashes during installation:

Apr 14 00:53:20 fedora anaconda[2342]: packaging: Installed: coreutils-8.32-21.fc34.aarch64 1616766124 8985ff12c7941d321ae9e5a1dde92da5c4d86eb38a24771cc153d0b82f868e0f
Apr 14 00:53:20 fedora kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Apr 14 00:53:20 fedora kernel: Mem abort info:
Apr 14 00:53:20 fedora kernel:   ESR = 0x96000004
Apr 14 00:53:20 fedora kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Apr 14 00:53:20 fedora kernel:   SET = 0, FnV = 0
Apr 14 00:53:20 fedora kernel:   EA = 0, S1PTW = 0
Apr 14 00:53:20 fedora kernel: Data abort info:
Apr 14 00:53:20 fedora kernel:   ISV = 0, ISS = 0x00000004
Apr 14 00:53:20 fedora kernel:   CM = 0, WnR = 0
Apr 14 00:53:20 fedora kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=0000000102f12000
Apr 14 00:53:20 fedora kernel: [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Apr 14 00:53:20 fedora kernel: Internal error: Oops: 96000004 [#1] SMP
Apr 14 00:53:20 fedora kernel: Modules linked in: vfat fat rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver libfc scsi_transport_fc scsi_dh_rdac scsi_dh_emc scsi_dh_alua acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler cppc_cpufreq drm fuse zram overlay loop nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache rfkill rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core crct10dif_ce ghash_ce mlx5_core mlxfw qcom_rng at803x sdhci_acpi ahci_platform qcom_emac sdhci hdma hdma_mgmt xhci_plat_hcd i2c_qup sunrpc lrw dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 squashfs cramfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi aes_neon_bs
Apr 14 00:53:20 fedora kernel: CPU: 23 PID: 3449 Comm: kworker/u92:8 Not tainted 5.11.12-300.fc34.aarch64 #1
Apr 14 00:53:20 fedora kernel: Hardware name: WIWYNN Qualcomm Centriq 2400 Reference Evaluation Platform CV90-LA115-P11/Qualcomm Centriq 2400 Customer Reference Board, BIOS 
Apr 14 00:53:20 fedora kernel: Workqueue: btrfs-delalloc btrfs_work_helper
Apr 14 00:53:20 fedora kernel: pstate: 00400005 (nzcv daif +PAN -UAO -TCO BTYPE=--)
Apr 14 00:53:20 fedora kernel: pc : __list_del_entry_valid+0x30/0xb0
Apr 14 00:53:20 fedora kernel: lr : submit_compressed_extents+0x70/0x3d0
Apr 14 00:53:20 fedora kernel: sp : ffff80001ccf3c00
Apr 14 00:53:20 fedora kernel: x29: ffff80001ccf3c00 x28: ffff604f503f6800 
Apr 14 00:53:20 fedora kernel: x27: ffffd2c4c6523000 x26: ffff604fe82fb4e0 
Apr 14 00:53:20 fedora kernel: x25: 0000000000000001 x24: ffff60500c949908 
Apr 14 00:53:20 fedora kernel: x23: ffff604fe82fb6e8 x22: ffff604f96b3e000 
Apr 14 00:53:20 fedora kernel: x21: 0000000000000000 x20: ffff604fe82fb528 
Apr 14 00:53:20 fedora kernel: x19: ffff604f503f6830 x18: 000000000000003e 
Apr 14 00:53:20 fedora kernel: x17: 000000000000001f x16: 0000000000000021 
Apr 14 00:53:20 fedora kernel: x15: 0000000000000000 x14: 0000438017b30100 
Apr 14 00:53:20 fedora kernel: x13: 0000000000000006 x12: ffff606633f99740 
Apr 14 00:53:20 fedora kernel: x11: 0000000000000000 x10: ffffd2c4c653bfce 
Apr 14 00:53:20 fedora kernel: x9 : ffffd2c4c4f4ee2c x8 : 05eb06fcb90d133e 
Apr 14 00:53:20 fedora kernel: x7 : 000000000000002a x6 : 000000000000003f 
Apr 14 00:53:20 fedora kernel: x5 : 0000000000000001 x4 : ffff604f45ed5518 
Apr 14 00:53:20 fedora kernel: x3 : dead000000000122 x2 : 0000000000000000 
Apr 14 00:53:20 fedora kernel: x1 : 0000000000000000 x0 : ffff604f503f6830 
Apr 14 00:53:20 fedora kernel: Call trace:
Apr 14 00:53:20 fedora kernel:  __list_del_entry_valid+0x30/0xb0
Apr 14 00:53:20 fedora kernel:  submit_compressed_extents+0x70/0x3d0
Apr 14 00:53:20 fedora kernel:  async_cow_submit+0x6c/0xc0
Apr 14 00:53:20 fedora kernel:  run_ordered_work+0xc4/0x2b0
Apr 14 00:53:20 fedora kernel:  btrfs_work_helper+0x98/0x270
Apr 14 00:53:20 fedora kernel:  process_one_work+0x1f0/0x4cc
Apr 14 00:53:20 fedora kernel:  worker_thread+0x184/0x500
Apr 14 00:53:20 fedora kernel:  kthread+0x120/0x124
Apr 14 00:53:20 fedora kernel:  ret_from_fork+0x10/0x18
Apr 14 00:53:20 fedora kernel: Code: d2802443 f2fbd5a3 eb03003f 54000340 (f9400021) 
Apr 14 00:53:20 fedora kernel: ---[ end trace 4824d837c25d31e8 ]---


2. What is the Version-Release number of the kernel:

kernel-5.11.12-300.fc34

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Working as expected in Fedora 34 Beta - kernel-5.11.3-300.fc34

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Attempt to install the latest Fedora nightly with defaults (btrfs). The installation 
fails to complete. Server installation with xfs works as expected. 

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Paul Whalen 2021-04-14 03:02:52 UTC
Created attachment 1771731 [details]
journalctl

Comment 2 Peter Robinson 2021-04-14 10:44:56 UTC
Adding the BTRFS maintainers

Comment 3 Josef Bacik 2021-04-14 15:44:17 UTC
I'm digging into this, gonna take me a minute to get an arm virt guest and cross compiling working.

Comment 4 Paul Whalen 2021-04-14 18:27:19 UTC
(In reply to Josef Bacik from comment #3)
> I'm digging into this, gonna take me a minute to get an arm virt guest and
> cross compiling working.

I've only seen this while testing on the Amberwing, other hardware and virt work as expected.

Comment 5 Mark Salter 2021-06-10 17:12:55 UTC
Just a random observation: as old as it is, Amberwing still has a knack for turning up mm/barrier/tlb related issues.

Comment 6 Paul Whalen 2021-07-19 20:15:05 UTC
Not seeing this during Rawhide testing with 5.14 RC's.

Comment 7 Paul Whalen 2021-08-27 15:24:41 UTC
Unfortunately, hit this again today while testing Fedora 35. 

[  440.089768] Internal error: Oops: 96000004 [#1] SMP 
[  440.093685] Modules linked in: vfat fat rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver xfs libfc scsi_transport_fc scsi_dh_rdac scsi_dh_emc scsi_dh_alua acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler cppc_cpufreq drm fuse zram overlay loop nfsv3 nfs_acl nfs lockd grace fscache netfs rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core rfkill crct10dif_ce mlx5_core ghash_ce sbsa_gwdt at803x mlxfw psample qcom_rng ahci_platform qcom_emac sdhci_acpi sdhci hdma hdma_mgmt xhci_plat_hcd i2c_qup sunrpc lrw dm_crypt trusted asn1_encoder tee dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 squashfs cramfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi aes_neon_bs 
[  440.174674] CPU: 40 PID: 1527 Comm: kworker/u92:7 Not tainted 5.14.0-0.rc7.54.fc35.aarch64 #1 
[  440.183180] Hardware name: WIWYNN Qualcomm Centriq 2400 Reference Evaluation Platform CV90-LA115-P11/Qualcomm Centriq 2400 Customer Reference Board, BIOS  
[  440.196982] Workqueue: btrfs-delalloc btrfs_work_helper 
[  440.202191] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) 
[  440.208179] pc : submit_compressed_extents+0x38/0x3d0 
[  440.213214] lr : async_cow_submit+0x6c/0xc0 
[  440.217381] sp : ffff80001665bc20 
[  440.220680] x29: ffff80001665bc30 x28: 0000000000000000 x27: ffffc2db26391000 
[  440.227797] x26: fffffffffffffdd0 x25: dead000000000100 x24: ffff5e514a086408 
[  440.234915] x23: 0000000000000000 x22: 0000000000000000 x21: ffff5e5149dcf200 
[  440.242034] x20: ffff5e514a086408 x19: ffff5e514a086448 x18: 0000000000000001 
[  440.249152] x17: ffff5e5212a06abd x16: 0000000000000006 x15: 58b7bdf0f78aee0a 
[  440.256270] x14: bb5977f05d2587b6 x13: 0000000000000020 x12: ffff5e6833f998c0 
[  440.263388] x11: ffffc2db263ab500 x10: 0000000000000000 x9 : ffffc2db24b8216c 
[  440.270506] x8 : 0000000000000001 x7 : ffffc2db263ab521 x6 : 0000000000000000 
[  440.277624] x5 : 0000000000000000 x4 : ffff5e5173cb3518 x3 : 0000000000000000 
[  440.284742] x2 : ffff5e5173cb32e8 x1 : ffff5e514a086430 x0 : ffff5e514a086430 
[  440.291861] Call trace: 
[  440.294292]  submit_compressed_extents+0x38/0x3d0 
[  440.298978]  async_cow_submit+0x6c/0xc0 
[  440.302798]  run_ordered_work+0xc8/0x280 
[  440.306704]  btrfs_work_helper+0x98/0x250 
[  440.310697]  process_one_work+0x1f0/0x4ac 
[  440.314690]  worker_thread+0x188/0x504 
[  440.318423]  kthread+0x110/0x114 
[  440.321634]  ret_from_fork+0x10/0x18 
[  440.325195] Code: a9056bf9 f8428437 f9401400 d108c2fa (f9400356)  
[  440.331271] ---[ end trace 7d10bc4e24ab0123 ]---

Comment 8 Paul Whalen 2021-10-19 16:34:07 UTC
Still seeing this while testing Fedora 35(5.14.10-300.fc35.aarch64), moving to btrfs-kernel.

Comment 9 Chris Murphy 2021-10-20 01:41:50 UTC
I've been compiling the kernel for about 6 hours, with about 1 hour of package installation concurrently. And no crash. This is in Vexxhost aarch64 openstack vm, bug 2011928. I have kdump setup there, so I can get a kernel core dump to developers, but without a crash I can't do that. And in your case it's happening so early that you can't install to have a chance of setting up kdump :\ Any chance you can use a ready to go raw.xz image? The catch though is kdump needs kernel debuginfo which is almost 700M download and altogether might be 3.5G to install which almost certainly trigger the crash before you get it all setup. Maybe the thing to do is go backward to Fedora 34 with an older kernel, and setup a 5.14.x kernel+debuginfo with all the kdump tools and kernel.org git source. Then reboot into the new kernel and hopefully capture a vmcore.

Comment 10 Ben Cotton 2022-05-12 15:11:16 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 11 Ben Cotton 2022-06-07 21:08:08 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.