Bug 1791056

Summary: "vdo stop" hangs in uds on systems with many cores
Product: Red Hat Enterprise Linux 8 Reporter: John Wiele <jwiele>
Component: kmod-kvdoAssignee: Matthew Sakai <msakai>
Status: CLOSED ERRATA QA Contact: Filip Suba <fsuba>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.4CC: awalsh, fsuba, raeburn
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.2.3.108 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 02:01:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Wiele 2020-01-14 18:34:43 UTC
Description of problem:

On PPC and AArch systems with more than 16 cores, "vdo stop" hangs.

Version-Release number of selected component (if applicable):


How reproducible:

Very

Steps to Reproduce:
1. Create and start VDO on a PPC or AArch system with >16 cores
2. Do "vdo disableDeduplication"
3. Do "vdo stop".

Actual results:

 vdo command hangs.

Expected results:

 vdo stop returns after vdo volume is stopped.

Additional info:

Comment 1 John Wiele 2020-01-14 18:37:47 UTC
*** Bug 1791046 has been marked as a duplicate of this bug. ***

Comment 2 Ken Raeburn 2020-05-27 11:05:49 UTC
It appears the core count may be a distraction. In the failures where we've got kernel logs so far, it always appears to be the case that a UDS index save is started by disabling deduplication, and then before the save finishes, the main part of the test concludes, and during cleanup, shuts down VDO, in the process triggering another save action. Various assertions fire because the first hasn't finished (though sometimes it does manage to finish just after the second save has been initiated; sometimes it doesn't ever finish, at least not before the test saves away the logs), and so the "dmsetup remove" thread never completes its action, hanging the test permanently.

Whether there's something about the core count that contributes to the time required for the save, or if it's just coincidence, is uncertain. More cores mean more UDS zones in the default configuration (default zone count = cores/2, up to MAX_ZONES=16), and the zones' data are saved sequentially, though with the buffering I'm not sure that makes much difference.

Comment 7 Filip Suba 2020-09-04 14:06:32 UTC
Verified with kmod-kvdo-6.2.3.114-74.el8.

Comment 10 errata-xmlrpc 2020-11-04 02:01:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kmod-kvdo bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4551