Bug 2049004 - [ark-eln] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_request_over_limit+0x132/0x370
Summary: [ark-eln] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_reques...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-01 10:27 UTC by Bruno Goncalves
Modified: 2022-06-08 14:02 UTC (History)
20 users (show)

Fixed In Version: 5.17.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-08 14:02:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bruno Goncalves 2022-02-01 10:27:42 UTC
1. Please describe the problem:
During CKI storage: software RAID testing test [1] we noticed the following message on dmesg:

[10522.961891] raid6: using avx512x2 recovery algorithm
[10523.037777] async_tx: api initialized (async)
[10523.412117] loop0: detected capacity change from 0 to 1024000
[10523.575212] loop1: detected capacity change from 0 to 1024000
[10523.743184] loop2: detected capacity change from 0 to 1024000
[10523.929786] loop3: detected capacity change from 0 to 1024000
[10524.102577] loop4: detected capacity change from 0 to 1024000
[10524.273543] loop5: detected capacity change from 0 to 1024000
[10533.182602] md1: detected capacity change from 0 to 5099520
[10535.846268] XFS (md1): Mounting V5 Filesystem
[10535.852948] XFS (md1): Ending clean mount
[10535.857421] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10536.754467] XFS (md1): Unmounting Filesystem
[10536.789673] md1: detected capacity change from 5099520 to 0
[10536.795284] md: md1 stopped.
[10562.274570] md/raid1:md1: not clean -- starting background reconstruction
[10562.281407] md/raid1:md1: active with 5 out of 5 mirrors
[10562.292068] md1: detected capacity change from 0 to 1021952
[10562.297747] md: resync of RAID array md1
[10565.104456] md: md1: resync done.
[10567.539491] XFS (md1): Mounting V5 Filesystem
[10567.566166] XFS (md1): Ending clean mount
[10567.570505] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10567.635319] XFS (md1): Unmounting Filesystem
[10567.659134] md1: detected capacity change from 1021952 to 0
[10567.664745] md: md1 stopped.
[10593.121480] md/raid:md1: device loop3 operational as raid disk 3
[10593.127535] md/raid:md1: device loop2 operational as raid disk 2
[10593.133583] md/raid:md1: device loop1 operational as raid disk 1
[10593.139624] md/raid:md1: device loop0 operational as raid disk 0
[10593.146050] md/raid:md1: raid level 4 active with 4 out of 5 devices, algorithm 0
[10593.157999] md1: detected capacity change from 0 to 4079616
[10593.163694] md: recovery of RAID array md1
[10594.203677] ------------[ cut here ]------------
[10594.208322] WARNING: CPU: 1 PID: 1525118 at block/bfq-iosched.c:602 bfqq_request_over_limit+0x132/0x370
[10594.217774] Modules linked in: raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq dm_cache_smq dm_cache dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme nvme_core sr_mod cdrom ipmi_watchdog ipmi_poweroff ipmi_ssif ipmi_devintf nls_utf8 cifs cifs_arc4 cifs_md4 tun loop nf_tables nfnetlink veth bridge stp llc overlay sctp ip6_udp_tunnel udp_tunnel rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_log_writes dm_mod tls rfkill intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp sunrpc coretemp kvm_intel kvm acpi_ipmi vfat fat irqbypass ipmi_si rapl intel_cstate intel_uncore ioatdma ses mei_me enclosure tg3 pcspkr mei acpi_power_meter hpilo ipmi_msghandler acpi_tad ext4 lpc_ich dca mbcache jbd2 fuse zram xfs libcrc32c sd_mod t10_pi mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt
[10594.217845]  fb_sys_fops cec crct10dif_pclmul crc32_pclmul crc32c_intel smartpqi uas drm ghash_clmulni_intel scsi_transport_sas usb_storage hpwdt wmi [last unloaded: scsi_debug]
[10594.321469] CPU: 1 PID: 1525118 Comm: kworker/u16:2 Tainted: G          I    X --------- ---  5.17.0-0.rc2.83.test.eln.x86_64 #1
[10594.333100] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 05/21/2019
[10594.341671] Workqueue: loop0 loop_rootcg_workfn [loop]
[10594.346839] RIP: 0010:bfqq_request_over_limit+0x132/0x370
[10594.352269] Code: 89 1c c7 48 8b 5b 60 48 8d 70 01 48 85 db 0f 84 e7 01 00 00 48 89 f0 44 0f b6 6b 18 45 84 ed 0f 84 73 01 00 00 41 39 c4 7f d5 <0f> 0b 41 39 c4 0f 85 f7 01 00 00 41 83 f8 ff 0f 84 b7 01 00 00 44
[10594.371149] RSP: 0018:ffffc1b2855bf938 EFLAGS: 00010046
[10594.376402] RAX: 0000000000000001 RBX: ffff9c2729ec9098 RCX: 0000000000000000
[10594.383576] RDX: ffff9c26e1ec4430 RSI: 0000000000000001 RDI: ffff9c26e1ec4430
[10594.390751] RBP: ffff9c27319b46c0 R08: 0000000000000000 R09: 000000000002e078
[10594.397925] R10: 0000000000000030 R11: ffff9c29fffd3d80 R12: 0000000000000001
[10594.405100] R13: 0000000000000001 R14: 0000000000000060 R15: ffffc1b2855bf958
[10594.412274] FS:  0000000000000000(0000) GS:ffff9c29efe40000(0000) knlGS:0000000000000000
[10594.420411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10594.426186] CR2: 00007f6a5db52580 CR3: 0000000032210001 CR4: 00000000007706e0
[10594.433361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10594.440535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10594.447708] PKRU: 55555554
[10594.450428] Call Trace:
[10594.452884]  <TASK>
[10594.454993]  ? release_pages+0x150/0x480
[10594.458936]  bfq_limit_depth+0xc3/0x220
[10594.462794]  __blk_mq_alloc_requests+0x63/0x130
[10594.467350]  blk_mq_get_new_requests+0xbc/0x150
[10594.471906]  blk_mq_submit_bio+0x317/0x480
[10594.476025]  submit_bio_noacct+0xcb/0x120
[10594.480055]  iomap_submit_ioend+0x4b/0x70
[10594.484085]  xfs_vm_writepages+0x79/0xb0 [xfs]
[10594.488656]  do_writepages+0xbc/0x1b0
[10594.492338]  ? update_load_avg+0x7e/0x730
[10594.496368]  ? update_load_avg+0x7e/0x730
[10594.500397]  filemap_fdatawrite_wbc+0x5b/0x80
[10594.504778]  file_write_and_wait_range+0x9b/0xf0
[10594.509420]  xfs_file_fsync+0x5a/0x210 [xfs]
[10594.513790]  loop_process_work+0x2f9/0x5f0 [loop]
[10594.518521]  ? check_preempt_curr+0x38/0x70
[10594.522727]  ? ttwu_do_wakeup+0x17/0x160
[10594.526671]  ? try_to_wake_up+0x1e4/0x4f0
[10594.530701]  ? finish_task_switch.isra.0+0xb6/0x290
[10594.535605]  process_one_work+0x1c7/0x380
[10594.539636]  worker_thread+0x4d/0x380
[10594.543318]  ? process_one_work+0x380/0x380
[10594.547523]  kthread+0xca/0xf0
[10594.550593]  ? kthread_complete_and_exit+0x20/0x20
[10594.555409]  ret_from_fork+0x22/0x30
[10594.559005]  </TASK>
[10594.561200] ---[ end trace 0000000000000000 ]---
[10596.090879] md: md1: recovery done.
[10600.289167] XFS (md1): Mounting V5 Filesystem
[10600.297754] XFS (md1): Ending clean mount

2. What is the Version-Release number of the kernel:
kernel-5.17.0-0.rc2.83.test.eln.x86_64


So far we've hit this only once.

CKI issue tracker: https://datawarehouse.cki-project.org/issue/942

[1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/trim

Comment 1 Bruno Goncalves 2022-02-01 12:31:22 UTC
Probably same issue:

[10905.179799] XFS (md1): Mounting V5 Filesystem
[10905.206746] XFS (md1): Ending clean mount
[10905.231009] xfs filesystem being mounted at /mnt/md_test supports timestamps until 2038 (0x7fffffff)
[10905.276163] XFS (md1): Unmounting Filesystem
[10905.282957] ------------[ cut here ]------------
[10905.287577] WARNING: CPU: 18 PID: 1269796 at block/bfq-iosched.c:606 bfq_limit_depth+0x4a9/0x500
[10905.296364] Modules linked in: raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_cache_smq dm_cache dm_thin_pool dm_persistent_data dm_bio_prison nvme nvme_core ipmi_watchdog ipmi_poweroff ipmi_ssif ipmi_devintf nls_utf8 cifs cifs_arc4 cifs_md4 tun loop xt_multiport xt_nat xt_addrtype xt_mark xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink veth bridge stp llc overlay vfat fat jfs sctp ip6_udp_tunnel udp_tunnel rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm nfsd auth_rpcgss nfs_acl lockd grace dm_log_writes mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common isst_if_common rfkill skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass rapl mlx5_core iTCO_wdt intel_pmc_bxt mlxfw iTCO_vendor_support dcdbas psample acpi_ipmi intel_cstate tls mei_me ipmi_si intel_uncore pci_hyperv_intf igb
[10905.296430]  i2c_i801 i40e pcspkr mei lpc_ich intel_pch_thermal i2c_smbus dca ipmi_msghandler acpi_power_meter fuse zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel megaraid_sas mgag200 ghash_clmulni_intel [last unloaded: scsi_debug]
[10905.404390] CPU: 18 PID: 1269796 Comm: kworker/u193:3 Tainted: G          I      --------- ---  5.17.0-0.rc2.83.test.fc36.x86_64 #1
[10905.416201] Hardware name: Dell Inc. PowerEdge R740xd/00WGD1, BIOS 1.4.9 06/29/2018
[10905.423853] Workqueue: loop4 loop_rootcg_workfn [loop]
[10905.428992] RIP: 0010:bfq_limit_depth+0x4a9/0x500
[10905.433699] Code: fd ff ff 45 8b 46 40 44 0f b6 cb 31 f6 48 c7 c2 5f aa 64 ab 48 c7 c1 84 aa 64 ab 31 c0 41 55 e8 bd 8b b5 ff 48 83 c4 08 eb 9d <0f> 0b e9 7b fd ff ff 48 8b 7c 24 10 e8 c6 80 00 00 48 89 c7 e8 9e
[10905.452444] RSP: 0018:ffffa22747cff900 EFLAGS: 00010097
[10905.457671] RAX: 0000000000000001 RBX: 0000000000000060 RCX: ffff96f09835c120
[10905.464803] RDX: ffff96ef00732400 RSI: ffffffffab6e6244 RDI: ffffffffab64de30
[10905.471936] RBP: ffffa22747cffa38 R08: ffff96918e8d3878 R09: 0000000000037748
[10905.479068] R10: 000000000000007d R11: ffffffffaa6d9c70 R12: ffffa22747cffaa0
[10905.486199] R13: ffffa22747cff970 R14: 0000000000000001 R15: 0000000000000002
[10905.493332] FS:  0000000000000000(0000) GS:ffff96ef00840000(0000) knlGS:0000000000000000
[10905.501417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10905.507162] CR2: 00007f695066d855 CR3: 0000006998e10003 CR4: 00000000007706e0
[10905.514297] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10905.521427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10905.528560] PKRU: 55555554
[10905.531272] Call Trace:
[10905.533728]  <TASK>
[10905.535835]  ? bfq_setup_cooperator+0x16f/0x270
[10905.540365]  ? bfq_allow_bio_merge+0x67/0xb0
[10905.544638]  ? elv_merge+0xad/0x150
[10905.548128]  ? bfq_request_merge+0x3c/0x90
[10905.552231]  __blk_mq_alloc_requests+0x83/0x240
[10905.556761]  blk_mq_submit_bio+0x204/0x560
[10905.560862]  submit_bio_noacct+0x22c/0x270
[10905.564961]  ? filemap_dirty_folio+0x55/0x70
[10905.569232]  iomap_writepages+0x82/0x90
[10905.573072]  xfs_vm_writepages+0xc4/0xe0 [xfs]
[10905.577617]  do_writepages+0xcb/0x1f0
[10905.581278]  ? check_preempt_wakeup+0x3ae/0x500
[10905.585810]  filemap_fdatawrite_wbc+0x61/0x80
[10905.590171]  file_write_and_wait_range+0xa8/0x100
[10905.594878]  xfs_file_fsync+0x4f/0x1d0 [xfs]
[10905.599201]  loop_process_work+0x1b4/0x860 [loop]
[10905.603906]  ? resched_curr+0x22/0xc0
[10905.607573]  ? try_to_wake_up+0x21d/0x3f0
[10905.611584]  process_one_work+0x1a1/0x350
[10905.615597]  worker_thread+0x270/0x430
[10905.619349]  ? worker_clr_flags+0x50/0x50
[10905.623364]  kthread+0xe7/0x100
[10905.626509]  ? kthread_blkcg+0x30/0x30
[10905.630262]  ret_from_fork+0x22/0x30
[10905.633842]  </TASK>
[10905.636033] ---[ end trace 0000000000000000 ]---
[10905.664632] md1: detected capacity change from 1021952 to 0
[10905.670219] md: md1 stopped.

Comment 2 Chris Murphy 2022-04-07 16:21:15 UTC
Hopeful this will fix it, and should appear in 5.17.3
https://lore.kernel.org/all/20220407140738.9723-1-jack@suse.cz/


Note You need to log in before you can comment on or make changes to this bug.