Bug 2022819 - Multiple kernel panics related to BFQ entity handling (bfq_deactivate_entity)
Summary: Multiple kernel panics related to BFQ entity handling (bfq_deactivate_entity)
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-12 16:59 UTC by Luca BRUNO
Modified: 2022-12-13 15:51 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-13 15:51:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos fedora-coreos-tracker issues 1021 0 None open kernel: NULL pointer dereference in rb_insert_color, coming from bfq_deactivate_entity 2021-11-12 16:59:26 UTC

Description Luca BRUNO 2021-11-12 16:59:26 UTC
Our Fedora CoreOS CI on Google Cloud recently started hitting kernel panics on current F35 kernel (version  5.14.16-301.fc35.x86_64).

We are still adding new details and stacktraces at https://github.com/coreos/fedora-coreos-tracker/issues/1021 as we observe them, but so far we have encountered:
 * a NULL pointer dereference
 * a General Protection Fault on a seemingly corrupted pointer

In both cases, the stacktraces show that the faulty accesses are happening through `bfq_deactivate_entity`.

The ticket on github contains all relevant details, including full console dumps and testcase/reproducer (though it seems non-deterministic, with only less than 1% testruns panic-ing).

Comment 1 Micah Abbott 2021-11-17 14:04:31 UTC
The openSUSE folks have some more details here - https://bugzilla.opensuse.org/show_bug.cgi?id=1192714

Comment 2 Bruno Goncalves 2022-03-21 07:33:09 UTC
Looks like I hit the same issue on Rawhide:

[  379.607022] general protection fault, probably for non-canonical address 0x28841e0200000028: 0000 [#1] PREEMPT SMP NOPTI 
[  379.617897] CPU: 6 PID: 79 Comm: kworker/u33:2 Not tainted 5.17.0-0.rc8.123.fc37.x86_64 #1 
[  379.626157] Hardware name: Dell Inc. PowerEdge R805/0GX122, BIOS 4.2.1 04/14/2010 
[  379.633636] Workqueue: xfs-cil/sdb9 xlog_cil_push_work [xfs] 
[  379.639642] RIP: 0010:__bfq_deactivate_entity+0x16b/0x240 
[  379.645040] Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d3 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 
[  379.663789] RSP: 0018:ffffb8f240c0b7a8 EFLAGS: 00010006 
[  379.669014] RAX: 28841e0200000000 RBX: ffff972583ba6908 RCX: 28841e0200000000 
[  379.676145] RDX: ffff9725863f60a8 RSI: ffff9725863f60a8 RDI: 00000065e7cd127a 
[  379.683274] RBP: ffff972583ba6880 R08: ffff972582e25160 R09: 0000000000000000 
[  379.690404] R10: ffff972583c49288 R11: 0000000000000006 R12: ffff972582e25160 
[  379.697543] R13: 0000000000000001 R14: ffff972582e25168 R15: ffff972583c48748 
[  379.704675] FS:  0000000000000000(0000) GS:ffff9725fdcc0000(0000) knlGS:0000000000000000 
[  379.712750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  379.718495] CR2: 000000000209a398 CR3: 00000000801ba000 CR4: 00000000000006e0 
[  379.725637] Call Trace: 
[  379.728090]  <TASK> 
[  379.730198]  bfq_deactivate_entity+0x50/0xc0 
[  379.734475]  bfq_del_bfqq_busy+0x91/0x1a0 
[  379.738493]  bfq_remove_request+0x125/0x350 
[  379.742682]  ? bfq_may_expire_for_budg_timeout+0xa5/0x1c0 
[  379.748081]  ? bfq_bfqq_served+0x132/0x1a0 
[  379.752174]  bfq_dispatch_request+0x443/0x1280 
[  379.756621]  ? sbitmap_get+0x90/0x1a0 
[  379.760289]  __blk_mq_do_dispatch_sched+0x1d1/0x320 
[  379.765164]  __blk_mq_sched_dispatch_requests+0x101/0x140 
[  379.770570]  blk_mq_sched_dispatch_requests+0x30/0x60 
[  379.775625]  __blk_mq_run_hw_queue+0x34/0x90 
[  379.779903]  __blk_mq_delay_run_hw_queue+0x17d/0x1b0 
[  379.784874]  blk_mq_get_tag+0x1c7/0x290 
[  379.788716]  ? do_wait_intr_irq+0xa0/0xa0 
[  379.792732]  __blk_mq_alloc_requests+0x165/0x2a0 
[  379.797357]  blk_mq_submit_bio+0x3d3/0x620 
[  379.801460]  submit_bio_noacct+0x1f3/0x2a0 
[  379.805562]  xlog_state_release_iclog+0x9e/0x210 [xfs] 
[  379.810955]  xlog_write+0x54f/0x670 [xfs] 
[  379.815219]  xlog_cil_push_work+0x458/0x7b0 [xfs] 
[  379.820161]  ? update_load_avg+0x7e/0x730 
[  379.824169]  ? cpuacct_charge+0x2e/0x50 
[  379.828009]  ? wake_up_q+0x90/0x90 
[  379.831419]  ? xfs_swap_extents+0x850/0x850 [xfs] 
[  379.836366]  process_one_work+0x1c7/0x380 
[  379.840374]  worker_thread+0x4d/0x380 
[  379.844042]  ? process_one_work+0x380/0x380 
[  379.848216]  kthread+0xe9/0x110 
[  379.851374]  ? kthread_complete_and_exit+0x20/0x20 
[  379.856157]  ret_from_fork+0x22/0x30 
[  379.859746]  </TASK> 
[  379.861936] Modules linked in: tls rfkill amd64_edac edac_mce_amd kvm_amd sunrpc snd_pcsp dcdbas snd_pcm ipmi_ssif ccp snd_timer kvm snd irqbypass soundcore bnx2 k10temp ipmi_si joydev i2c_nforce2 ipmi_devintf ipmi_msghandler acpi_cpufreq fuse zram xfs amdgpu iommu_v2 gpu_sched radeon mptsas scsi_transport_sas ata_generic pata_acpi mptscsih drm_ttm_helper uas mptbase serio_raw usb_storage sata_nv ttm nv_tco 
[  379.898006] ---[ end trace 0000000000000000 ]---

Comment 3 Dusty Mabe 2022-03-21 13:19:48 UTC
We see this periodicially on GCP when running Fedora CoreOS tests. I try to note it down in https://github.com/coreos/fedora-coreos-tracker/issues/1021 when I see it.

Comment 4 Ben Cotton 2022-11-29 17:18:27 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 5 Ben Cotton 2022-12-13 15:51:32 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.