Bug 1765255

Summary: Crash if IO in progress during VDO suspend/stop
Product: Red Hat Enterprise Linux 7 Reporter: Sweet Tea Dorminy <sweettea>
Component: kmod-kvdoAssignee: Sweet Tea Dorminy <sweettea>
Status: CLOSED NEXTRELEASE QA Contact: vdo-qe
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.8CC: awalsh, corwin, vdo-internal, vdo-qe
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1765253 Environment:
Last Closed: 2020-05-12 19:26:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1765253    
Bug Blocks:    

Description Sweet Tea Dorminy 2019-10-24 16:04:23 UTC
+++ This bug was initially created as a clone of Bug #1765253 +++

Description of problem:
If IO is in progress when VDO receives a suspend request, for instance to grow logical, grow physical, or shut down, it is possible for some IO to run while or after VDO internally suspends. This can cause crashes, infinite loops, or assorted other chaos.

Version-Release number of selected component (if applicable):
6.1

How reproducible:
Very difficult. Relies on an unfenced read of a variable returning a old value.

Steps to Reproduce:
1. Write 15G unique data in 4k randwrites, to a VDO with 15T logical space and a 15G block map cache size, in sync mode.
2. dmsetup suspend vdo0 & dd if=/dev/urandom of=/dev/mapper/vdo0 oflag=direct bs=4k


Actual results:
Chaos, potentially a crashdump. One manifestation could be:

[1388725.540680] uds: kvdo0:journalQ: assertion "count to be initialized not in use" ((*journalValue == atomicLoad32(decrementCount))) failed at /builddir/build/BUILD/kvdo-a50744b1ca2aa461a761076d051a21612fd45aba/obj/./vdo/base/lockCounter.c:259
[1388725.540681] uds: kvdo0:journalQ: [backtrace]
[1388725.540683] CPU: 4 PID: 768 Comm: kvdo0:journalQ Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-145.el8.x86_64 #1
[1388725.540684] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2 01/16/2015
[1388725.540685] Call Trace:
[1388725.540692]  dump_stack+0x5c/0x80
[1388725.540703]  assertionFailedLogOnly+0x49/0x70 [uds]
[1388725.540717]  ? enqueueWorkQueue+0x3e/0x80 [kvdo]
[1388725.540725]  ? noDefaultAction+0x10/0x10 [kvdo]
[1388725.540734]  ? scheduleOperationWithContext+0xee/0x130 [kvdo]
[1388725.540743]  ? kvdoGetCurrentThreadID+0xa/0x20 [kvdo]
[1388725.540752]  initializeLockCount+0x6f/0x80 [kvdo]
[1388725.540759]  ? prepareToAssignEntry+0x1d0/0x1d0 [kvdo]
[1388725.540765]  prepareToAssignEntry+0x17a/0x1d0 [kvdo]
[1388725.540772]  assignEntries.part.6+0x47/0xb0 [kvdo]
[1388725.540782]  workQueueRunner+0x1b9/0x660 [kvdo]
[1388725.540784]  ? finish_wait+0x80/0x80
[1388725.540793]  ? kvdoCompareDataVIOs+0x90/0x90 [kvdo]
[1388725.540795]  kthread+0x112/0x130
[1388725.540796]  ? kthread_flush_work_fn+0x10/0x10
[1388725.540798]  ret_from_fork+0x35/0x40

Expected results:
No crashdumps, no hangs.

Additional info:

Comment 2 corwin 2020-05-12 19:26:56 UTC
This is fixed in RHEL 8, but will not be fixed on RHEL 7.