Bug 2033457
| Summary: | "BUG: scheduling while atomic" sometimes when suspending VDO while writes are active | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | sclafani | ||||
| Component: | kmod-kvdo | Assignee: | sclafani | ||||
| Status: | CLOSED ERRATA | QA Contact: | Filip Suba <fsuba> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 9.0 | CC: | awalsh, cwei, fsuba | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kmod-kvdo-8.1.1.360-12.el9 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-05-17 15:49:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Available in the most recent build. Verified with kmod-kvdo-8.1.1.360-14.el9. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: kmod-kvdo), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:3919 |
Created attachment 1846645 [details] test case Description of problem: Suspending VDO does a synchronous flush, calling submit_bio_wait(). This is being called while a spin lock in the batch processor is held. Under some circumstances (so far only growing the logical volume when writes are happening) that flush can actually have to wait. Version-Release number of selected component (if applicable): How reproducible: Intermittently, running our basic test of growing a logical volume, which creates a 5GB VDO volume and then grows the logical size by 40GB while concurrently writing 5000 blocks to VDO. Steps to Reproduce: See attached test case outline. Actual results: "BUG: scheduling while atomic" and a backtrace (see below) are logged. Expected results: The message and backtrace do not appear. Additional info: Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: preparing to modify device '253:4' Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: Preparing to resize logical to 11796736 Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: Done preparing to resize logical Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: suspending device '253:4' Dec 15 02:31:33 pfarm-069 kernel: BUG: scheduling while atomic: kvdo1:cpuQ1/1725297/0x00000002 Dec 15 02:31:33 pfarm-069 kernel: Modules linked in: kvdo(OE) uds(OE) lz4_compress ext4 mbcache jbd2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iscs\ i_target_mod target_core_mod nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel kvm cir\ rus drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 virtio_balloon cec joydev pcspkr vfat fat drm fuse permatest(POE) xfs libcr\ c32c ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ata_piix virtio_net net_failover libata failover serio_raw virtio_blk sunrpc dm_\ mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_tran\ sport_iscsi [last unloaded: uds] Dec 15 02:31:33 pfarm-069 kernel: Preemption disabled at: Dec 15 02:31:33 pfarm-069 kernel: [<0000000000000000>] 0x0 Dec 15 02:31:33 pfarm-069 kernel: CPU: 1 PID: 1725297 Comm: kvdo1:cpuQ1 Kdump: loaded Tainted: P OE --------- --- 5.14.0-29.el9.x86_64 #1 Dec 15 02:31:33 pfarm-069 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014 Dec 15 02:31:33 pfarm-069 kernel: Call Trace: Dec 15 02:31:33 pfarm-069 kernel: dump_stack_lvl+0x34/0x44 Dec 15 02:31:33 pfarm-069 kernel: __schedule_bug.cold+0x7d/0x8b Dec 15 02:31:33 pfarm-069 kernel: __schedule+0x400/0x560 Dec 15 02:31:33 pfarm-069 kernel: schedule+0x43/0xd0 Dec 15 02:31:33 pfarm-069 kernel: schedule_timeout+0x88/0x150 Dec 15 02:31:33 pfarm-069 kernel: ? __bpf_trace_tick_stop+0x10/0x10 Dec 15 02:31:33 pfarm-069 kernel: io_schedule_timeout+0x4c/0x70 Dec 15 02:31:33 pfarm-069 kernel: wait_for_completion_io_timeout+0x84/0x100 Dec 15 02:31:33 pfarm-069 kernel: submit_bio_wait+0x72/0xb0 Dec 15 02:31:33 pfarm-069 kernel: vdo_synchronous_flush+0x76/0xd0 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: ? submit_bio_wait+0xb0/0xb0 Dec 15 02:31:33 pfarm-069 kernel: suspend_callback+0x265/0x2e0 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: invoke_vdo_completion_callback+0x60/0x70 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: complete_vdo_completion+0x2a/0x60 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: limiter_release_many+0x9c/0xb0 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: complete_many_requests+0x90/0xd0 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: return_data_vio_batch_to_pool+0xc3/0x170 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: batch_processor_work+0x31/0xa0 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: service_work_queue+0xf0/0x410 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: ? do_wait_intr_irq+0xa0/0xa0 Dec 15 02:31:33 pfarm-069 kernel: work_queue_runner+0x78/0x90 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: ? service_work_queue+0x410/0x410 [kvdo] Dec 15 02:31:33 pfarm-069 kernel: kthread+0x135/0x160 Dec 15 02:31:33 pfarm-069 kernel: ? set_kthread_struct+0x40/0x40 Dec 15 02:31:33 pfarm-069 kernel: ret_from_fork+0x22/0x30 Dec 15 02:31:33 pfarm-069 kernel: uds: kvdo1:journalQ: beginning save (vcn 4294967295) Dec 15 02:31:33 pfarm-069 kernel: uds: kvdo1:journalQ: finished save (vcn 4294967295) Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: device '253:4' suspended Dec 15 02:31:33 pfarm-069 kernel: dm-4: detected capacity change from 10487808 to 94373888 Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: resuming device '253:4'