Bug 1540722

Summary: VDO statistics do not account for concurrent dedupe
Product: Red Hat Enterprise Linux 7 Reporter: sclafani
Component: kmod-kvdoAssignee: sclafani
Status: CLOSED ERRATA QA Contact: Jakub Krysl <jkrysl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: awalsh, jkrysl, limershe
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.1.1.117 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 09:38:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sclafani 2018-01-31 21:13:05 UTC
Description of problem:

The new concurrent dedupe handling (VDOSTORY-190) doesn't have separate VDO statistics for deduplication of concurrent writes, so it's no longer possible to correctly determine the total cumulative dedupe VDO has found in non-zero data.

Version-Release number of selected component (if applicable):

kmod-kvdo-6.1.0.124-11.el7.x86_64

How reproducible:

Somewhat timing-depending since it relies on concurrent writes of the same data, but with the right dataset it's easy to reproduce.


Steps to Reproduce:
1. Create a VDO device (/dev/dm-1 below)
2. Write many copies of the same block to the device. This is the incantation used 
   in one of the VDO automated tests:
   fio --minimal --bs=4096 --rw=write
     --name=generic_job_name  --filename=/dev/dm-1 --numjobs=4 --size=10737418240
     --thread --norandommap --randrepeat=1 --group_reporting
     --buffer_pattern=0xDeadBeef --unlink=0 --direct=1 --iodepth=128
     --ioengine=libaio --offset=0 --offset_increment=10737418240 
3. vdostats /dev/dm-1 --all

Actual results:

"dedupe advice valid" in the stats will be much much smaller than the number of blocks written despite (on a new volume) "data blocks used" being much smaller than the dataset size.

Expected results:

We need to add a new stat (probably "concurrent hash matches" along with "concurrent hash collisions" for complete coverage of all the cases) that will count the bios that deduped against others in memory. 

Additional info:

Comment 3 sclafani 2018-03-05 22:12:50 UTC
I have fixes ready to merge if this gets ack'ed.

Comment 5 Jakub Krysl 2018-07-11 09:59:25 UTC
I tested with kmod-kvdo-6.1.1.99 and vdo-6.1.1.99. I noticed 2 things using the reproducer:

There is no mention of the new fields in vdostats manpage.

Testing with kmod-kvdo-6.1.0.176 showed 'dedupe advice valid' 405 and 76 in 2 runs. Doing the same with kmod-kvdo-6.1.1.99 showed both 1 in 2 runs.
I got 10485760 'bios in/out' and 10485759 (10485758 in second run) on 'concurrent data matches'. 'saving percent' is 99 in all cases.

Comment 6 sclafani 2018-07-18 20:22:12 UTC
I missed that these were documented in the manpage. I'll remedy that. Please re-open this so it can merged back to 7.6.

6.1.0.176 has the changes to concurrent dedupe, but not the stats fix, so I think what you're seeing is exactly the problem the new stats are addressing: if you write many copies of the same block concurrently, they aren't counted in 'dedupe advice valid', nor anywhere else. In that version, if I write the same block over and over, but very slowly (not concurrently), all the dedupe should be accounted for in 'dedupe advice valid'. If I write them very quickly, some might be counted there, but most will not, which is why you're seeing '405' and '76'.

In 6.1.1.84+, if I write the same block over and over, non-concurrently, the dedupe will still be counted in 'dedupe advice valid' (because when each write arrives, we have to go to the UDS index to find the dedupe candidate and validate it). But if another write arrives with the same data while a write for that data is still pending, we don't have to use index advice at all, and it's counted in 'concurrent data matches'. It sounds like that's what you saw in your test.

Comment 7 Jakub Krysl 2018-07-23 11:38:54 UTC
Thanks for explanation, giving back to fix the manpage.

Comment 8 Jakub Krysl 2018-08-30 15:02:13 UTC
Tested with vdo-6.1.1.120-3.el7, the new values are explained in manpage now:
       concurrent data matches
              The number of writes with the same data as another in-flight write.

       concurrent hash collisions
              The number of writes whose hash collided with an in-flight write.

Comment 10 errata-xmlrpc 2018-10-30 09:38:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3094