Bug 1737639 - Suspending VDO while UDS is rebuilding can hang until the rebuild has finished
Summary: Suspending VDO while UDS is rebuilding can hang until the rebuild has finished
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kmod-kvdo
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Matthew Sakai
QA Contact: Filip Suba
Marek Suchánek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-05 23:06 UTC by Andy Walsh
Modified: 2020-04-28 16:43 UTC (History)
5 users (show)

Fixed In Version: 6.2.2.67
Doc Type: Bug Fix
Doc Text:
.VDO can now suspend before UDS has finished rebuilding Previously, the `dmsetup suspend` command became unresponsive if you attempted to suspend a VDO volume while the UDS index was rebuilding. The command finished only after the rebuild. With this update, the problem has been fixed. The `dmsetup suspend` command can finish before the UDS rebuild is done without becoming unresponsive.
Clone Of:
Environment:
Last Closed: 2020-04-28 16:43:10 UTC
Type: Bug
Target Upstream Version:
msakai: needinfo-


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1782 None None None 2020-04-28 16:43:21 UTC

Description Andy Walsh 2019-08-05 23:06:24 UTC
Description of problem:
With the fix implemented for BZ1659303, it was discovered that if a UDS index is in the middle of being rebuilt, it will hang the suspend request until the rebuild has completed.

Version-Release number of selected component (if applicable):
kmod-kvdo-6.2.1.134

How reproducible:
100% (assuming you try to suspend while UDS is rebuilding)

Steps to Reproduce:
1. Create a VDO volume with a large enough index that will cause the index replay to take some time (to allow the user to issue a suspend call).
2. Make sure the VDO device does not auto-start (systemctl disable vdo.service)
3. Write some data to the VDO device (500 unique 1M blocks, dd if=/dev/urandom bs=1M count=1000 oflag=direct)
4. Crash the system to make sure the index rebuilds when it is starting up (echo b > /proc/sysrq-trigger)
5. Once the machine is back online, clear dmesg (dmesg -c)
6. Start the VDO and try to suspend it immediately (vdo start --name vdo0 && time dmsetup suspend vdo0)
7. Check dmesg

Actual results:
The "device 'vdo0' suspended" message will not appear until after the save is complete, which won't happen until the rebuild has completed.
[  124.539074] kvdo0:dmsetup: suspending device 'vdo0'
[  125.055634] uds: kvdo0:dedupeQ: index could not be loaded: UDS Error: Index not saved cleanly (1069)
[  125.135880] uds: kvdo0:dedupeQ: Replaying volume from chapter 0 through chapter 1
[  125.380601] uds: kvdo0:dedupeQ: beginning save (vcn 0)
[  125.586142] uds: kvdo0:dedupeQ: finished save (vcn 0)
[  125.586210] kvdo0:dmsetup: device 'vdo0' suspended


Expected results:
The VDO device gets suspended without hanging.

Additional info:
This is hard to reproduce on some machines, and the effect is undesirable when trying to do something like a growLogical or growPhysical.  Otherwise, the effect isn't typically noticeable based on what I've seen in my attempts.

Comment 2 Andy Walsh 2019-08-06 00:30:59 UTC
A clarification that msakai pointed out: I should note that even with my changes, it may still take a few milliseconds to suspend the index, so if the total rebuild time is short, it may not be easy to tell the difference between an index that suspended during rebuild, and one that simply completed the rebuild first

Comment 5 Jakub Krysl 2019-10-15 14:38:54 UTC
Mass migration to Filip.

Comment 9 Filip Suba 2020-03-23 14:58:42 UTC
Verified with kmod-kvdo-6.2.2.117-65.el8.

Comment 16 errata-xmlrpc 2020-04-28 16:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1782


Note You need to log in before you can comment on or make changes to this bug.