Description of problem:
With the fix implemented for BZ1659303, it was discovered that if a UDS index is in the middle of being rebuilt, it will hang the suspend request until the rebuild has completed.
Version-Release number of selected component (if applicable):
100% (assuming you try to suspend while UDS is rebuilding)
Steps to Reproduce:
1. Create a VDO volume with a large enough index that will cause the index replay to take some time (to allow the user to issue a suspend call).
2. Make sure the VDO device does not auto-start (systemctl disable vdo.service)
3. Write some data to the VDO device (500 unique 1M blocks, dd if=/dev/urandom bs=1M count=1000 oflag=direct)
4. Crash the system to make sure the index rebuilds when it is starting up (echo b > /proc/sysrq-trigger)
5. Once the machine is back online, clear dmesg (dmesg -c)
6. Start the VDO and try to suspend it immediately (vdo start --name vdo0 && time dmsetup suspend vdo0)
7. Check dmesg
The "device 'vdo0' suspended" message will not appear until after the save is complete, which won't happen until the rebuild has completed.
[ 124.539074] kvdo0:dmsetup: suspending device 'vdo0'
[ 125.055634] uds: kvdo0:dedupeQ: index could not be loaded: UDS Error: Index not saved cleanly (1069)
[ 125.135880] uds: kvdo0:dedupeQ: Replaying volume from chapter 0 through chapter 1
[ 125.380601] uds: kvdo0:dedupeQ: beginning save (vcn 0)
[ 125.586142] uds: kvdo0:dedupeQ: finished save (vcn 0)
[ 125.586210] kvdo0:dmsetup: device 'vdo0' suspended
The VDO device gets suspended without hanging.
This is hard to reproduce on some machines, and the effect is undesirable when trying to do something like a growLogical or growPhysical. Otherwise, the effect isn't typically noticeable based on what I've seen in my attempts.
A clarification that msakai pointed out: I should note that even with my changes, it may still take a few milliseconds to suspend the index, so if the total rebuild time is short, it may not be easy to tell the difference between an index that suspended during rebuild, and one that simply completed the rebuild first
Mass migration to Filip.
Verified with kmod-kvdo-22.214.171.124-65.el8.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.