Bug 1961299
| Summary: | degraded thin lv performance while using nvme backing device | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Rupesh Girase <rgirase> |
| Component: | lvm2 | Assignee: | Mikuláš Patočka <mpatocka> |
| lvm2 sub component: | Thin Provisioning | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED DEFERRED | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, loberman, msnitzer, prajnoha, thornber, zkabelac |
| Version: | 7.9 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-30 20:04:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Rupesh Girase
2021-05-17 16:30:46 UTC
When you use bigger chunk size and there is need for best performance user has to disable zeroing. lvcreate -Zn -T -Lpoolsize ..... When zeroing is enabled - each provisioned chunk is zeroing for all unwritten sectors. If you need to keep zeroing - fio measuring should happen on provisioned blocks. But in all cases the performance will be lower the plain striped target - however with chunksizes >=512K it should be pretty close. Note - there is maybe 'a mistake' in your script? as usage of 512M per thin-pool chunk likely consumes a lot of space in thin-pool for written data (this can be useful only in some limited use cases). 8 stripes on nvme which typically goes with 512k optimal iosize should give you 8*0.5M => 4M chunksize to keep best alignment and well usable discard. Although still such large chunksizes are not very practical with snapshots... Finding the most optimal striping pattern may require several benchmarking rounds. Also it also possible to use striped 'metadata' for highest performance - however in this case user has to create data & metadata LV separately and join them into a thin-pool volume via lvconvert. If you seek for best possible performance you may need to tune this part as well. Forgot to mention: --stripesize 512k may help you to tune for the best striping size for each of your nvme device. (but as said - depends on the use-case and hw what gives the best throughput) Hello Zdenek, Tried disabled zeroing after creating thinpool, as shown in internal comment #2 Is it same or need to disable zeroing while creating only? ** disabled zeroing [root@hp-dl380g10-1 ~]# lvchange -Zn nvmevg/glide_thinpool Logical volume nvmevg/glide_thinpool changed. Also we do not see such performance difference while using ramdisk devices. Seeing this behavior for thinpool/lv backed by nvme devices only. Thanks, Rupesh In my opinion, the user has unrealistic performance expectations here. dm-thin takes locks - it takes pmd->pool_lock for read for each I/O and then it takes the dm-bufio lock for each btree node. If you submit massively parallel I/O to dm-thin, the cache lines containing the locks will be bouncing between the CPU cores and this will cause performance degradation. The faster the underlying device is - the more performance degradation due to lock bouncing will you see. Ramdisk has slow performance (the benchmarks here show about 360MiB/s), so you won't see much degradation on it. 8-leg NVMe RAID-0 has high performance (3GiB/s), so the performance degradation is high. If you want to achieven 3GiB/s, you must make sure that the I/O path is lockless. It is not easy to avoid the locks and fix this. It could be fixed by leveraging the Intel transaction instructions (TSX), but that would be a lot of work and it could certainly not be done in RHEL7. (In reply to Mikuláš Patočka from comment #13) > In my opinion, the user has unrealistic performance expectations here. > > dm-thin takes locks - it takes pmd->pool_lock for read for each I/O and then > it takes the dm-bufio lock for each btree node. If you submit massively > parallel I/O to dm-thin, the cache lines containing the locks will be > bouncing between the CPU cores and this will cause performance degradation. > > The faster the underlying device is - the more performance degradation due > to lock bouncing will you see. Ramdisk has slow performance (the benchmarks > here show about 360MiB/s), so you won't see much degradation on it. 8-leg > NVMe RAID-0 has high performance (3GiB/s), so the performance degradation is > high. > > If you want to achieven 3GiB/s, you must make sure that the I/O path is > lockless. > > It is not easy to avoid the locks and fix this. It could be fixed by > leveraging the Intel transaction instructions (TSX), but that would be a lot > of work and it could certainly not be done in RHEL7. I would like to mention that there is ongoing work to improve thin-p performance and scalability, but that too would be too much to backport to RHEL7. These changes are intended to land sometime in RHEL9. |