Bug 1737639

Summary: Suspending VDO while UDS is rebuilding can hang until the rebuild has finished
Product: Red Hat Enterprise Linux 8 Reporter: Andy Walsh <awalsh>
Component: kmod-kvdoAssignee: Matthew Sakai <msakai>
Status: CLOSED ERRATA QA Contact: Filip Suba <fsuba>
Severity: unspecified Docs Contact: Marek Suchánek <msuchane>
Priority: unspecified    
Version: 8.1CC: awalsh, corwin, msakai, msuchane, rhandlin
Target Milestone: rcFlags: msakai: needinfo-
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.2.2.67 Doc Type: Bug Fix
Doc Text:
.VDO can now suspend before UDS has finished rebuilding Previously, the `dmsetup suspend` command became unresponsive if you attempted to suspend a VDO volume while the UDS index was rebuilding. The command finished only after the rebuild. With this update, the problem has been fixed. The `dmsetup suspend` command can finish before the UDS rebuild is done without becoming unresponsive.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:43:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Walsh 2019-08-05 23:06:24 UTC
Description of problem:
With the fix implemented for BZ1659303, it was discovered that if a UDS index is in the middle of being rebuilt, it will hang the suspend request until the rebuild has completed.

Version-Release number of selected component (if applicable):
kmod-kvdo-6.2.1.134

How reproducible:
100% (assuming you try to suspend while UDS is rebuilding)

Steps to Reproduce:
1. Create a VDO volume with a large enough index that will cause the index replay to take some time (to allow the user to issue a suspend call).
2. Make sure the VDO device does not auto-start (systemctl disable vdo.service)
3. Write some data to the VDO device (500 unique 1M blocks, dd if=/dev/urandom bs=1M count=1000 oflag=direct)
4. Crash the system to make sure the index rebuilds when it is starting up (echo b > /proc/sysrq-trigger)
5. Once the machine is back online, clear dmesg (dmesg -c)
6. Start the VDO and try to suspend it immediately (vdo start --name vdo0 && time dmsetup suspend vdo0)
7. Check dmesg

Actual results:
The "device 'vdo0' suspended" message will not appear until after the save is complete, which won't happen until the rebuild has completed.
[  124.539074] kvdo0:dmsetup: suspending device 'vdo0'
[  125.055634] uds: kvdo0:dedupeQ: index could not be loaded: UDS Error: Index not saved cleanly (1069)
[  125.135880] uds: kvdo0:dedupeQ: Replaying volume from chapter 0 through chapter 1
[  125.380601] uds: kvdo0:dedupeQ: beginning save (vcn 0)
[  125.586142] uds: kvdo0:dedupeQ: finished save (vcn 0)
[  125.586210] kvdo0:dmsetup: device 'vdo0' suspended


Expected results:
The VDO device gets suspended without hanging.

Additional info:
This is hard to reproduce on some machines, and the effect is undesirable when trying to do something like a growLogical or growPhysical.  Otherwise, the effect isn't typically noticeable based on what I've seen in my attempts.

Comment 2 Andy Walsh 2019-08-06 00:30:59 UTC
A clarification that msakai pointed out: I should note that even with my changes, it may still take a few milliseconds to suspend the index, so if the total rebuild time is short, it may not be easy to tell the difference between an index that suspended during rebuild, and one that simply completed the rebuild first

Comment 5 Jakub Krysl 2019-10-15 14:38:54 UTC
Mass migration to Filip.

Comment 9 Filip Suba 2020-03-23 14:58:42 UTC
Verified with kmod-kvdo-6.2.2.117-65.el8.

Comment 16 errata-xmlrpc 2020-04-28 16:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1782