Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1870660

Summary: High CPU load caused by indexW kernel threads while vdo volume is unused
Product: Red Hat Enterprise Linux 8 Reporter: Andy Walsh <awalsh>
Component: kmod-kvdoAssignee: Ken Raeburn <raeburn>
Status: CLOSED ERRATA QA Contact: Filip Suba <fsuba>
Severity: unspecified Docs Contact: Marek Suchánek <msuchane>
Priority: unspecified    
Version: 8.3CC: awalsh, bgurney, corwin, fsuba, nikolay, zinchukpavlo
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.2.4.11 Doc Type: Bug Fix
Doc Text:
Suggested text: In certain cases, the index kernel threads for a VDO volume would use a high amount of CPU time while idle. The behavior of the index threads have been adjusted to reduce CPU usage while idle.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 14:39:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Walsh 2020-08-20 14:27:48 UTC
Description of problem:
This issue was originally reported on github at https://github.com/dm-vdo/kvdo/issues/32

Hello team,

I have noticed that the indexW kernel threads consume a lot of CPU time while the VDO volume is idle (for example when it has just been started and has been unused since):

CPU time increase, as reported by 'top':

$ top -bw | grep indexW
6608 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6609 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6611 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6612 root      20   0       0      0      0 S   6.2   0.0   0:01.66 kvdo0:indexW
6613 root      20   0       0      0      0 S   6.2   0.0   0:01.68 kvdo0:indexW
6610 root      20   0       0      0      0 S   0.0   0.0   0:01.69 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.6   0.0   0:01.83 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.3   0.0   0:01.79 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.3   0.0   0:01.81 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.6   0.0   0:01.96 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.6   0.0   0:01.95 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:01.95 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:01.95 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.3   0.0   0:01.96 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.0   0.0   0:01.91 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.6   0.0   0:02.09 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:02.08 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:02.08 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.3   0.0   0:02.09 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.3   0.0   0:02.09 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.3   0.0   0:02.04 kvdo0:indexW
VDO statistics after collecting the usage above:

# vdostats --verbose | grep -e 'bios in\|bios out'
  bios in read                        : 0
  bios in write                       : 0
  bios in discard                     : 0
  bios in flush                       : 0
  bios in fua                         : 0
  bios in partial read                : 0
  bios in partial write               : 0
  bios in partial discard             : 0
  bios in partial flush               : 0
  bios in partial fua                 : 0
  bios out read                       : 0
  bios out write                      : 0
  bios out discard                    : 0
  bios out flush                      : 0
  bios out fua                        : 0
  bios out completed read             : 0
  bios out completed write            : 0
  bios out completed discard          : 0
  bios out completed flush            : 0
  bios out completed fua              : 0
  bios in progress read               : 0
  bios in progress write              : 0
  bios in progress discard            : 0
  bios in progress flush              : 0
  bios in progress fua                : 0
System information:
GNU/Gentoo Linux (amd64)
kvdo version: 6.2.3.114 on 5.8.0 kernel (amd64) (kvdo-corp)
GCC: 10.2

The VDO device is started on top of dm-crypt encrypted partition, ie:

    vdo_storage: !VDOService
      _operationState: finished
      ackThreads: 1
      activated: enabled
      bioRotationInterval: 64
      bioThreads: 4
      blockMapCacheSize: 128M
      blockMapPeriod: 16380
      compression: enabled
      cpuThreads: 2
      deduplication: enabled
      device: /dev/mapper/cryptstorage
      hashZoneThreads: 1
      indexCfreq: 0
      indexMemory: 0.25
      indexSparse: disabled
      indexThreads: 0
      logicalBlockSize: 4096
      logicalSize: 1T
      logicalThreads: 1
      maxDiscardSize: 4K
      name: vdo_storage
      physicalSize: 306742596K
      physicalThreads: 1
      slabSize: 2G
      uuid: null
      writePolicy: async
Hardware specs of the machine:
Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
32GB memory
disk partition on NVME disk

There is currently no virtualization in use, though KVM and linux-containers are compiled in as modules.

Anything else you may need to know, do let me know.


Version-Release number of selected component (if applicable):
6.2.3.114

How reproducible:
Easily

Steps to Reproduce:
1. Create a VDO volume
2. Don't write any data to it
3. run `top -bw | grep indexW` 
4. run `vmstat 1 10`

Actual results:
3-6% cpu usage on kernel threads that are supposed to be mostly idle.
Significant number of context switches as well.

Expected results:
A more acceptable level of CPU usage and fewer context switches on an idle volume.

Additional info:
Other comments in Github:

bgurney-rh:
> Hello,
> 
> Thanks for the report; I was able to reproduce this behavior on a system running Fedora 32 with a 5.7 kernel.
> 
> Here's some output from "vmstat 1" after creating a new VDO volume directly on a test block device (i.e.: no layers below the VDO volume), with the command `vdo create --name=vdo1 --device=<testdevice> --vdoLogicalSize=1T`:
> 
> ```
> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0 30105680  66792 1406196    0    0     0   388 85875 185522  0  1 99  0  0
>  1  0      0 30105680  66792 1406204    0    0     0     0 85105 184946  0  1 99  0  0
>  2  0      0 30105680  66792 1406204    0    0     0     0 93577 193874  0  1 99  0  0
>  1  0      0 30105680  66792 1406204    0    0     0     0 99219 199766  0  1 99  0  0
>  1  0      0 30105680  66792 1406204    0    0     0     0 99274 199817  0  1 99  0  0
>  1  0      0 30105680  66792 1406204    0    0     0     0 99055 199591  0  1 99  0  0
>  1  0      0 30105680  66792 1406204    0    0     0     0 99294 199809  0  1 99  0  0
>  1  0      0 30105680  66800 1406204    0    0     0    16 97547 197933  0  2 98  0  0
> ```
> 
> Note the high number of context switches per second ("cs"). If you run `vmstat 1` on your system with the VDO volume remaining idle, do you see something similar?
> 
> (In my case, the VDO volume's index also had 6 zones; the test system has a total of 12 CPUs.)

From nikichukov:
> Hello Bryan,
> 
> This is confirmed, I got immediate increase of the number of context switches with approximately +180000 just by bringing up the vdo device. Turning it off, brings number of CS back to normal.

Comment 5 Nikolay Kichukov 2020-10-07 08:11:14 UTC
Thanks for the patches.
I have compiled 6.2.4.14 and the indexW thread has got quiet.

So this seems to have been resolved.

However, now that I look at the number of context switches, starting VDO volumes that are not in use on an idle system increases the number of context switches with around 8000 per second.

VDO devices are not started: 90 context switches / second
VDO devices are started and not used (2 devices) : 8000 context switches / second

The threads that consume the CPU seem to be:


 [kvdo2:packerQ]
 [kvdo2:ackQ]
 [kvdo2:cpuQ0]
 [kvdo2:cpuQ1]
 [kvdo3:dedupeQ]
 [kvdo3:journalQ]
 [kvdo3:hashQ0]
 [kvdo3:bioQ0]
 [kvdo3:bioQ1]
 [kvdo3:bioQ2]
 [kvdo3:bioQ3]
 [kvdo3:cpuQ0]
 [kvdo3:cpuQ1]

Do we need to implement something similar for those too or would that be too much of a performance hit?

Comment 6 Filip Suba 2020-10-20 08:59:44 UTC
Verified with vdo-6.2.4.14-14.el8. Regression testing passed.

Comment 7 Pavel Zinchuk 2020-12-11 09:55:56 UTC
I've tested with package vdo-6.2.4.14-14.el8
Issue is still persist. Occurred high CPU context switching.
Filip, can you please recheck again, because issue is not fixed?

(In reply to Filip Suba from comment #6)
> Verified with vdo-6.2.4.14-14.el8. Regression testing passed.

Comment 8 Ken Raeburn 2020-12-11 11:04:38 UTC
(In reply to Pavel Zinchuk from comment #7)
> I've tested with package vdo-6.2.4.14-14.el8
> Issue is still persist. Occurred high CPU context switching.
> Filip, can you please recheck again, because issue is not fixed?

The context switches from the other threads have a different cause and are tracked in BZ1886738. They will cause a small CPU load but it should not be a high one -- I typically see something like 0.3% per thread depending on the platform.

The package vdo-6.2.4.14-14.el8 should fix the much higher context switch load that was previously being caused by the indexW thread, and the high CPU load that it triggered.

Are you seeing a high CPU load with this version of the package?

Comment 9 Pavel Zinchuk 2020-12-11 16:02:21 UTC
Yes,
I see high CPU load in the VM of oVirt virtualization. It doesn't matter how many CPU cores I allocate to the VM, 4 or 8.
oVirt always detect CPU Load in the range 50-80% when enabled VDO but not used. I see constant high CPU context switching from vdo. This cause high load in the oVirt Virtualization. High CPU load stop only when I disable VDO service.
It is shouldn't be like this I guess.

Comment 10 Ken Raeburn 2020-12-14 20:05:42 UTC
(In reply to Pavel Zinchuk from comment #9)
> Yes,
> I see high CPU load in the VM of oVirt virtualization. It doesn't matter how
> many CPU cores I allocate to the VM, 4 or 8.
> oVirt always detect CPU Load in the range 50-80% when enabled VDO but not
> used. I see constant high CPU context switching from vdo. This cause high
> load in the oVirt Virtualization. High CPU load stop only when I disable VDO
> service.
> It is shouldn't be like this I guess.

Just to clarify -- a high load reported *within* the virtual machine, or a high load in the hypervisor environment, caused by the high timer interrupt rate keeping virtualization threads busy? I've seen the latter happen (and it's part of the reason BZ11886738 needs fixing), but if it's the former, what threads within the VM are active and how busy are they?

Comment 11 Pavel Zinchuk 2020-12-18 08:28:24 UTC
(In reply to Ken Raeburn from comment #10)
> (In reply to Pavel Zinchuk from comment #9)
> > Yes,
> > I see high CPU load in the VM of oVirt virtualization. It doesn't matter how
> > many CPU cores I allocate to the VM, 4 or 8.
> > oVirt always detect CPU Load in the range 50-80% when enabled VDO but not
> > used. I see constant high CPU context switching from vdo. This cause high
> > load in the oVirt Virtualization. High CPU load stop only when I disable VDO
> > service.
> > It is shouldn't be like this I guess.
> 
> Just to clarify -- a high load reported *within* the virtual machine, or a
> high load in the hypervisor environment, caused by the high timer interrupt
> rate keeping virtualization threads busy? I've seen the latter happen (and
> it's part of the reason BZ11886738 needs fixing), but if it's the former,
> what threads within the VM are active and how busy are they?

Hi Ken,
High load reported by hypervisor environment. Inside VM i don't see high really CPU usage, inside VM I see only high amount of CPU context switching.
Hi CPU context switching cause load for hypervisor environment. This is reason why hypervisor report about high CPU usage for VM.

Comment 13 errata-xmlrpc 2021-05-18 14:39:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kmod-kvdo bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1588