Bug 2033457

Summary: "BUG: scheduling while atomic" sometimes when suspending VDO while writes are active
Product: Red Hat Enterprise Linux 9 Reporter: sclafani
Component: kmod-kvdoAssignee: sclafani
Status: CLOSED ERRATA QA Contact: Filip Suba <fsuba>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: awalsh, cwei, fsuba
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kmod-kvdo-8.1.1.360-12.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 15:49:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test case none

Description sclafani 2021-12-16 21:38:15 UTC
Created attachment 1846645 [details]
test case

Description of problem:
Suspending VDO does a synchronous flush, calling submit_bio_wait(). This is being called while a spin lock in the batch processor is held. Under some circumstances (so far only growing the logical volume when writes are happening) that flush can actually have to wait.

Version-Release number of selected component (if applicable):


How reproducible:
Intermittently, running our basic test of growing a logical volume, which creates a 5GB VDO volume and then grows the logical size by 40GB while concurrently writing 5000 blocks to VDO.

Steps to Reproduce:
See attached test case outline.

Actual results:
"BUG: scheduling while atomic" and a backtrace (see below) are logged.

Expected results:
The message and backtrace do not appear.

Additional info:

Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: preparing to modify device '253:4' 
Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: Preparing to resize logical to 11796736 
Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: Done preparing to resize logical 
Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: suspending device '253:4' 
Dec 15 02:31:33 pfarm-069 kernel: BUG: scheduling while atomic: kvdo1:cpuQ1/1725297/0x00000002 
Dec 15 02:31:33 pfarm-069 kernel: Modules linked in: kvdo(OE) uds(OE) lz4_compress ext4 mbcache jbd2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iscs\
i_target_mod target_core_mod nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel kvm cir\
rus drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 virtio_balloon cec joydev pcspkr vfat fat drm fuse permatest(POE) xfs libcr\
c32c ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ata_piix virtio_net net_failover libata failover serio_raw virtio_blk sunrpc dm_\
mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_tran\
sport_iscsi [last unloaded: uds] 
Dec 15 02:31:33 pfarm-069 kernel: Preemption disabled at: 
Dec 15 02:31:33 pfarm-069 kernel: [<0000000000000000>] 0x0 
Dec 15 02:31:33 pfarm-069 kernel: CPU: 1 PID: 1725297 Comm: kvdo1:cpuQ1 Kdump: loaded Tainted: P           OE    --------- ---  5.14.0-29.el9.x86_64 #1 
Dec 15 02:31:33 pfarm-069 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014 
Dec 15 02:31:33 pfarm-069 kernel: Call Trace: 
Dec 15 02:31:33 pfarm-069 kernel: dump_stack_lvl+0x34/0x44 
Dec 15 02:31:33 pfarm-069 kernel: __schedule_bug.cold+0x7d/0x8b 
Dec 15 02:31:33 pfarm-069 kernel: __schedule+0x400/0x560 
Dec 15 02:31:33 pfarm-069 kernel: schedule+0x43/0xd0 
Dec 15 02:31:33 pfarm-069 kernel: schedule_timeout+0x88/0x150 
Dec 15 02:31:33 pfarm-069 kernel: ? __bpf_trace_tick_stop+0x10/0x10 
Dec 15 02:31:33 pfarm-069 kernel: io_schedule_timeout+0x4c/0x70 
Dec 15 02:31:33 pfarm-069 kernel: wait_for_completion_io_timeout+0x84/0x100 
Dec 15 02:31:33 pfarm-069 kernel: submit_bio_wait+0x72/0xb0 
Dec 15 02:31:33 pfarm-069 kernel: vdo_synchronous_flush+0x76/0xd0 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: ? submit_bio_wait+0xb0/0xb0 
Dec 15 02:31:33 pfarm-069 kernel: suspend_callback+0x265/0x2e0 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: invoke_vdo_completion_callback+0x60/0x70 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: complete_vdo_completion+0x2a/0x60 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: limiter_release_many+0x9c/0xb0 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: complete_many_requests+0x90/0xd0 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: return_data_vio_batch_to_pool+0xc3/0x170 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: batch_processor_work+0x31/0xa0 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: service_work_queue+0xf0/0x410 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: ? do_wait_intr_irq+0xa0/0xa0 
Dec 15 02:31:33 pfarm-069 kernel: work_queue_runner+0x78/0x90 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: ? service_work_queue+0x410/0x410 [kvdo] 
Dec 15 02:31:33 pfarm-069 kernel: kthread+0x135/0x160 
Dec 15 02:31:33 pfarm-069 kernel: ? set_kthread_struct+0x40/0x40 
Dec 15 02:31:33 pfarm-069 kernel: ret_from_fork+0x22/0x30 
Dec 15 02:31:33 pfarm-069 kernel: uds: kvdo1:journalQ: beginning save (vcn 4294967295) 
Dec 15 02:31:33 pfarm-069 kernel: uds: kvdo1:journalQ: finished save (vcn 4294967295) 
Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: device '253:4' suspended 
Dec 15 02:31:33 pfarm-069 kernel: dm-4: detected capacity change from 10487808 to 94373888 
Dec 15 02:31:33 pfarm-069 kernel: kvdo1:lvresize: resuming device '253:4'

Comment 2 Andy Walsh 2022-02-15 14:04:01 UTC
Available in the most recent build.

Comment 5 Filip Suba 2022-02-28 11:34:23 UTC
Verified with kmod-kvdo-8.1.1.360-14.el9.

Comment 7 errata-xmlrpc 2022-05-17 15:49:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: kmod-kvdo), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3919