2049004 – [ark-eln] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_request_over_limit+0x132/0x370

Bug 2049004 - [ark-eln] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_request_over_limit+0x132/0x370

Summary: [ark-eln] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_reques...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-01 10:27 UTC by Bruno Goncalves
Modified:	2022-06-08 14:02 UTC (History)
CC List:	20 users (show)
Fixed In Version:	5.17.5
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-08 14:02:27 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bruno Goncalves 2022-02-01 10:27:42 UTC

1. Please describe the problem:
During CKI storage: software RAID testing test [1] we noticed the following message on dmesg:

[10522.961891] raid6: using avx512x2 recovery algorithm
[10523.037777] async_tx: api initialized (async)
[10523.412117] loop0: detected capacity change from 0 to 1024000
[10523.575212] loop1: detected capacity change from 0 to 1024000
[10523.743184] loop2: detected capacity change from 0 to 1024000
[10523.929786] loop3: detected capacity change from 0 to 1024000
[10524.102577] loop4: detected capacity change from 0 to 1024000
[10524.273543] loop5: detected capacity change from 0 to 1024000
[10533.182602] md1: detected capacity change from 0 to 5099520
[10535.846268] XFS (md1): Mounting V5 Filesystem
[10535.852948] XFS (md1): Ending clean mount
[10535.857421] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10536.754467] XFS (md1): Unmounting Filesystem
[10536.789673] md1: detected capacity change from 5099520 to 0
[10536.795284] md: md1 stopped.
[10562.274570] md/raid1:md1: not clean -- starting background reconstruction
[10562.281407] md/raid1:md1: active with 5 out of 5 mirrors
[10562.292068] md1: detected capacity change from 0 to 1021952
[10562.297747] md: resync of RAID array md1
[10565.104456] md: md1: resync done.
[10567.539491] XFS (md1): Mounting V5 Filesystem
[10567.566166] XFS (md1): Ending clean mount
[10567.570505] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10567.635319] XFS (md1): Unmounting Filesystem
[10567.659134] md1: detected capacity change from 1021952 to 0
[10567.664745] md: md1 stopped.
[10593.121480] md/raid:md1: device loop3 operational as raid disk 3
[10593.127535] md/raid:md1: device loop2 operational as raid disk 2
[10593.133583] md/raid:md1: device loop1 operational as raid disk 1
[10593.139624] md/raid:md1: device loop0 operational as raid disk 0
[10593.146050] md/raid:md1: raid level 4 active with 4 out of 5 devices, algorithm 0
[10593.157999] md1: detected capacity change from 0 to 4079616
[10593.163694] md: recovery of RAID array md1
[10594.203677] ------------[ cut here ]------------
[10594.208322] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_request_over_limit+0x132/0x370
[10594.217774] Modules linked in: raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq dm_cache_smq dm_cache dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme nvme_core sr_mod cdrom ipmi_watchdog ipmi_poweroff ipmi_ssif ipmi_devintf nls_utf8 cifs cifs_arc4 cifs_md4 tun loop nf_tables nfnetlink veth bridge stp llc overlay sctp ip6_udp_tunnel udp_tunnel rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_log_writes dm_mod tls rfkill intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp sunrpc coretemp kvm_intel kvm acpi_ipmi vfat fat irqbypass ipmi_si rapl intel_cstate intel_uncore ioatdma ses mei_me enclosure tg3 pcspkr mei acpi_power_meter hpilo ipmi_msghandler acpi_tad ext4 lpc_ich dca mbcache jbd2 fuse zram xfs libcrc32c sd_mod t10_pi mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt
[10594.217845]  fb_sys_fops cec crct10dif_pclmul crc32_pclmul crc32c_intel smartpqi uas drm ghash_clmulni_intel scsi_transport_sas usb_storage hpwdt wmi [last unloaded: scsi_debug]
[10594.321469] CPU: 1 PID: 1525118 Comm: kworker/u16:2 Tainted: G          I    X --------- ---  5.17.0-0.rc2.83.test.eln.x86_64 #1
[10594.333100] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 05/21/2019
[10594.341671] Workqueue: loop0 loop_rootcg_workfn [loop]
[10594.346839] RIP: 0010:bfqq_request_over_limit+0x132/0x370
[10594.352269] Code: 89 1c c7 48 8b 5b 60 48 8d 70 01 48 85 db 0f 84 e7 01 00 00 48 89 f0 44 0f b6 6b 18 45 84 ed 0f 84 73 01 00 00 41 39 c4 7f d5 <0f> 0b 41 39 c4 0f 85 f7 01 00 00 41 83 f8 ff 0f 84 b7 01 00 00 44
[10594.371149] RSP: 0018:ffffc1b2855bf938 EFLAGS: 00010046
[10594.376402] RAX: 0000000000000001 RBX: ffff9c2729ec9098 RCX: 0000000000000000
[10594.383576] RDX: ffff9c26e1ec4430 RSI: 0000000000000001 RDI: ffff9c26e1ec4430
[10594.390751] RBP: ffff9c27319b46c0 R08: 0000000000000000 R09: 000000000002e078
[10594.397925] R10: 0000000000000030 R11: ffff9c29fffd3d80 R12: 0000000000000001
[10594.405100] R13: 0000000000000001 R14: 0000000000000060 R15: ffffc1b2855bf958
[10594.412274] FS:  0000000000000000(0000) GS:ffff9c29efe40000(0000) knlGS:0000000000000000
[10594.420411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10594.426186] CR2: 00007f6a5db52580 CR3: 0000000032210001 CR4: 00000000007706e0
[10594.433361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10594.440535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10594.447708] PKRU: 55555554
[10594.450428] Call Trace:
[10594.452884]  <TASK>
[10594.454993]  ? release_pages+0x150/0x480
[10594.458936]  bfq_limit_depth+0xc3/0x220
[10594.462794]  __blk_mq_alloc_requests+0x63/0x130
[10594.467350]  blk_mq_get_new_requests+0xbc/0x150
[10594.471906]  blk_mq_submit_bio+0x317/0x480
[10594.476025]  submit_bio_noacct+0xcb/0x120
[10594.480055]  iomap_submit_ioend+0x4b/0x70
[10594.484085]  xfs_vm_writepages+0x79/0xb0 [xfs]
[10594.488656]  do_writepages+0xbc/0x1b0
[10594.492338]  ? update_load_avg+0x7e/0x730
[10594.496368]  ? update_load_avg+0x7e/0x730
[10594.500397]  filemap_fdatawrite_wbc+0x5b/0x80
[10594.504778]  file_write_and_wait_range+0x9b/0xf0
[10594.509420]  xfs_file_fsync+0x5a/0x210 [xfs]
[10594.513790]  loop_process_work+0x2f9/0x5f0 [loop]
[10594.518521]  ? check_preempt_curr+0x38/0x70
[10594.522727]  ? ttwu_do_wakeup+0x17/0x160
[10594.526671]  ? try_to_wake_up+0x1e4/0x4f0
[10594.530701]  ? finish_task_switch.isra.0+0xb6/0x290
[10594.535605]  process_one_work+0x1c7/0x380
[10594.539636]  worker_thread+0x4d/0x380
[10594.543318]  ? process_one_work+0x380/0x380
[10594.547523]  kthread+0xca/0xf0
[10594.550593]  ? kthread_complete_and_exit+0x20/0x20
[10594.555409]  ret_from_fork+0x22/0x30
[10594.559005]  </TASK>
[10594.561200] ---[ end trace 0000000000000000 ]---
[10596.090879] md: md1: recovery done.
[10600.289167] XFS (md1): Mounting V5 Filesystem
[10600.297754] XFS (md1): Ending clean mount

2. What is the Version-Release number of the kernel:
kernel-5.17.0-0.rc2.83.test.eln.x86_64


So far we've hit this only once.

CKI issue tracker: https://datawarehouse.cki-project.org/issue/942

[1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/trim

Comment 1 Bruno Goncalves 2022-02-01 12:31:22 UTC

Probably same issue:

[10905.179799] XFS (md1): Mounting V5 Filesystem
[10905.206746] XFS (md1): Ending clean mount
[10905.231009] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10905.276163] XFS (md1): Unmounting Filesystem
[10905.282957] ------------[ cut here ]------------
[10905.287577] WARNING: CPU: 18 PID: 1269796 at block/bfq-iosched.c:606 bfq_limit_depth+0x4a9/0x500
[10905.296364] Modules linked in: raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_cache_smq dm_cache dm_thin_pool dm_persistent_data dm_bio_prison nvme nvme_core ipmi_watchdog ipmi_poweroff ipmi_ssif ipmi_devintf nls_utf8 cifs cifs_arc4 cifs_md4 tun loop xt_multiport xt_nat xt_addrtype xt_mark xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink veth bridge stp llc overlay vfat fat jfs sctp ip6_udp_tunnel udp_tunnel rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm nfsd auth_rpcgss nfs_acl lockd grace dm_log_writes mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common isst_if_common rfkill skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass rapl mlx5_core iTCO_wdt intel_pmc_bxt mlxfw iTCO_vendor_support dcdbas psample acpi_ipmi intel_cstate tls mei_me ipmi_si intel_uncore pci_hyperv_intf igb
[10905.296430]  i2c_i801 i40e pcspkr mei lpc_ich intel_pch_thermal i2c_smbus dca ipmi_msghandler acpi_power_meter fuse zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel megaraid_sas mgag200 ghash_clmulni_intel [last unloaded: scsi_debug]
[10905.404390] CPU: 18 PID: 1269796 Comm: kworker/u193:3 Tainted: G          I      --------- ---  5.17.0-0.rc2.83.test.fc36.x86_64 #1
[10905.416201] Hardware name: Dell Inc. PowerEdge R740xd/00WGD1, BIOS 1.4.9 06/29/2018
[10905.423853] Workqueue: loop4 loop_rootcg_workfn [loop]
[10905.428992] RIP: 0010:bfq_limit_depth+0x4a9/0x500
[10905.433699] Code: fd ff ff 45 8b 46 40 44 0f b6 cb 31 f6 48 c7 c2 5f aa 64 ab 48 c7 c1 84 aa 64 ab 31 c0 41 55 e8 bd 8b b5 ff 48 83 c4 08 eb 9d <0f> 0b e9 7b fd ff ff 48 8b 7c 24 10 e8 c6 80 00 00 48 89 c7 e8 9e
[10905.452444] RSP: 0018:ffffa22747cff900 EFLAGS: 00010097
[10905.457671] RAX: 0000000000000001 RBX: 0000000000000060 RCX: ffff96f09835c120
[10905.464803] RDX: ffff96ef00732400 RSI: ffffffffab6e6244 RDI: ffffffffab64de30
[10905.471936] RBP: ffffa22747cffa38 R08: ffff96918e8d3878 R09: 0000000000037748
[10905.479068] R10: 000000000000007d R11: ffffffffaa6d9c70 R12: ffffa22747cffaa0
[10905.486199] R13: ffffa22747cff970 R14: 0000000000000001 R15: 0000000000000002
[10905.493332] FS:  0000000000000000(0000) GS:ffff96ef00840000(0000) knlGS:0000000000000000
[10905.501417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10905.507162] CR2: 00007f695066d855 CR3: 0000006998e10003 CR4: 00000000007706e0
[10905.514297] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10905.521427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10905.528560] PKRU: 55555554
[10905.531272] Call Trace:
[10905.533728]  <TASK>
[10905.535835]  ? bfq_setup_cooperator+0x16f/0x270
[10905.540365]  ? bfq_allow_bio_merge+0x67/0xb0
[10905.544638]  ? elv_merge+0xad/0x150
[10905.548128]  ? bfq_request_merge+0x3c/0x90
[10905.552231]  __blk_mq_alloc_requests+0x83/0x240
[10905.556761]  blk_mq_submit_bio+0x204/0x560
[10905.560862]  submit_bio_noacct+0x22c/0x270
[10905.564961]  ? filemap_dirty_folio+0x55/0x70
[10905.569232]  iomap_writepages+0x82/0x90
[10905.573072]  xfs_vm_writepages+0xc4/0xe0 [xfs]
[10905.577617]  do_writepages+0xcb/0x1f0
[10905.581278]  ? check_preempt_wakeup+0x3ae/0x500
[10905.585810]  filemap_fdatawrite_wbc+0x61/0x80
[10905.590171]  file_write_and_wait_range+0xa8/0x100
[10905.594878]  xfs_file_fsync+0x4f/0x1d0 [xfs]
[10905.599201]  loop_process_work+0x1b4/0x860 [loop]
[10905.603906]  ? resched_curr+0x22/0xc0
[10905.607573]  ? try_to_wake_up+0x21d/0x3f0
[10905.611584]  process_one_work+0x1a1/0x350
[10905.615597]  worker_thread+0x270/0x430
[10905.619349]  ? worker_clr_flags+0x50/0x50
[10905.623364]  kthread+0xe7/0x100
[10905.626509]  ? kthread_blkcg+0x30/0x30
[10905.630262]  ret_from_fork+0x22/0x30
[10905.633842]  </TASK>
[10905.636033] ---[ end trace 0000000000000000 ]---
[10905.664632] md1: detected capacity change from 1021952 to 0
[10905.670219] md: md1 stopped.

Comment 2 Chris Murphy 2022-04-07 16:21:15 UTC

Hopeful this will fix it, and should appear in 5.17.3
https://lore.kernel.org/all/20220407140738.9723-1-jack@suse.cz/

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
bskeggs
bugzilla
czhong
hdegoede
jarodwilson
jeremy
jglisse
jonathan
josef
kernel-maint
lgoncalv
linville
masami256
mchehab
ptalbert
steved