RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1961299 - degraded thin lv performance while using nvme backing device
Summary: degraded thin lv performance while using nvme backing device
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Mikuláš Patočka
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-17 16:30 UTC by Rupesh Girase
Modified: 2021-09-03 12:38 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-30 20:04:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rupesh Girase 2021-05-17 16:30:46 UTC
Description of problem:

Thin lvm performance degrades as compared to the thick lv when using nvme device
but we do not see same performance difference between thin and thick lv while using ramdisks multipath devices.


Version-Release number of selected component (if applicable):
rhel7.9

How reproducible:
always

Steps to Reproduce:
1. create thin lv backed by nvme devices
2. run fio tests


Actual results:
performance drops significantly in case of thin lv

Expected results:
we should not see significant performance drops

Additional info:
rhel8 seems good as compared to rhel7.9

Comment 4 Zdenek Kabelac 2021-05-18 08:03:09 UTC
When you use bigger chunk size and there is need for best performance user has to disable zeroing.

lvcreate -Zn -T -Lpoolsize .....

When zeroing is enabled - each provisioned chunk is zeroing for all unwritten sectors.

If you need to keep zeroing - fio measuring should happen on provisioned blocks.

But in all cases the performance will be lower the plain striped target - however with chunksizes >=512K it should be pretty close.

Comment 5 Zdenek Kabelac 2021-05-18 08:17:12 UTC
Note - there is maybe 'a mistake' in your script? as usage of 512M per thin-pool chunk likely consumes a lot of space in thin-pool for written data (this can be useful only in some limited use cases).

8 stripes on nvme which typically goes with 512k optimal iosize should give you  8*0.5M =>   4M chunksize to keep best alignment and well usable discard. Although still such large chunksizes are not very practical with snapshots...

Finding the most optimal striping pattern may require several benchmarking rounds.

Also it also possible to use striped 'metadata' for highest performance - however in this case user has to create  data & metadata LV separately and join them into a thin-pool volume via lvconvert.  If you seek for best possible performance you may need to tune this part as well.

Comment 6 Zdenek Kabelac 2021-05-18 08:21:10 UTC
Forgot to mention:  --stripesize 512k    may help you to tune for the best striping size for each of your nvme device.
(but as said - depends on the use-case and hw what gives the best throughput)

Comment 7 Rupesh Girase 2021-05-21 09:53:53 UTC
Hello Zdenek,

Tried disabled zeroing after creating thinpool, as shown in internal comment #2

Is it same or need to disable zeroing while creating only?


** disabled zeroing

[root@hp-dl380g10-1 ~]# lvchange -Zn nvmevg/glide_thinpool
  Logical volume nvmevg/glide_thinpool changed.

Also we do not see such performance difference while using ramdisk devices. 
Seeing this behavior for thinpool/lv backed by nvme devices only.


Thanks,
Rupesh

Comment 13 Mikuláš Patočka 2021-08-26 13:07:40 UTC
In my opinion, the user has unrealistic performance expectations here.

dm-thin takes locks - it takes pmd->pool_lock for read for each I/O and then it takes the dm-bufio lock for each btree node. If you submit massively parallel I/O to dm-thin, the cache lines containing the locks will be bouncing between the CPU cores and this will cause performance degradation.

The faster the underlying device is - the more performance degradation due to lock bouncing will you see. Ramdisk has slow performance (the benchmarks here show about 360MiB/s), so you won't see much degradation on it. 8-leg NVMe RAID-0 has high performance (3GiB/s), so the performance degradation is high.

If you want to achieven 3GiB/s, you must make sure that the I/O path is lockless.

It is not easy to avoid the locks and fix this. It could be fixed by leveraging the Intel transaction instructions (TSX), but that would be a lot of work and it could certainly not be done in RHEL7.

Comment 14 Jonathan Earl Brassow 2021-08-30 20:04:36 UTC
(In reply to Mikuláš Patočka from comment #13)
> In my opinion, the user has unrealistic performance expectations here.
> 
> dm-thin takes locks - it takes pmd->pool_lock for read for each I/O and then
> it takes the dm-bufio lock for each btree node. If you submit massively
> parallel I/O to dm-thin, the cache lines containing the locks will be
> bouncing between the CPU cores and this will cause performance degradation.
> 
> The faster the underlying device is - the more performance degradation due
> to lock bouncing will you see. Ramdisk has slow performance (the benchmarks
> here show about 360MiB/s), so you won't see much degradation on it. 8-leg
> NVMe RAID-0 has high performance (3GiB/s), so the performance degradation is
> high.
> 
> If you want to achieven 3GiB/s, you must make sure that the I/O path is
> lockless.
> 
> It is not easy to avoid the locks and fix this. It could be fixed by
> leveraging the Intel transaction instructions (TSX), but that would be a lot
> of work and it could certainly not be done in RHEL7.

I would like to mention that there is ongoing work to improve thin-p performance and scalability, but that too would be too much to backport to RHEL7.  These changes are intended to land sometime in RHEL9.


Note You need to log in before you can comment on or make changes to this bug.