Bug 1973181
| Summary: | very poor performance of the random write to raid 1 lvolume based on NVMEoF block devices | ||
|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | vbponomarev |
| Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> |
| lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, ncroxon, prajnoha, vbponomarev, xni, zkabelac |
| Version: | 2.02.185 | Flags: | pm-rhel:
lvm-technical-solution?
pm-rhel: lvm-test-coverage? |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
vbponomarev
2021-06-17 11:25:08 UTC
Not having NVMeoF access at this pont, I tried to second (randwrite, 8K as you did) on iSCSI LUs but failed. With your fio job as of above, I get on two 10GiB in _parralel_ access (i.e. fio started on both legs in parallel): write: IOPS=20.2k, BW=158MiB/s (165MB/s)(325MiB/2065msec); 0 zone resets WRITE: bw=158MiB/s (165MB/s), 158MiB/s-158MiB/s (165MB/s-165MB/s), io=325MiB (341MB), run=2065-2065msec write: IOPS=24.8k, BW=194MiB/s (203MB/s)(389MiB/2008msec); 0 zone resets WRITE: bw=194MiB/s (203MB/s), 194MiB/s-194MiB/s (203MB/s-203MB/s), io=389MiB (408MB), run=2008-2008msec On a 'raid1' on top of those aforementioned two: write: IOPS=18.0k, BW=140MiB/s (147MB/s)(282MiB/2007msec); 0 zone resets WRITE: bw=140MiB/s (147MB/s), 140MiB/s-140MiB/s (147MB/s-147MB/s), io=282MiB (295MB), run=2007-2007msec Mind that raid1 duplicates writes, that's why I ran the fio job in parallel on the two iscsi devices and also stores write intent bitmap metadata hence throttling a bit which nicely shows in my 'raid1' fio measures above. Could be the transport throttles more drastically on NVMEeoF on parallel I/O to multiple targets than anticipated, please try running the 2 fio jobs in parallel on both your targets to tell if they still hold up. I ran 2 fio jobs in parallel on both targets and got results like below: Jobs: 100 (f=100): [w(100)][10.5%][w=443MiB/s][w=56.8k IOPS][eta 08m:58s] ... Jobs: 100 (f=100): [w(100)][14.6%][w=463MiB/s][w=59.3k IOPS][eta 08m:33s] ... dstat -rd --io/total- -dsk/total- read writ| read writ 0 117k| 0 917M 0 116k| 0 899M 0 115k| 0 897M 0 115k| 0 896M 0 116k| 0 904M 0 116k| 0 907M 0 109k| 0 854M .... Then I had a try with lvm raid1: Jobs: 100 (f=100): [w(100)][24.3%][w=25.5MiB/s][w=3262 IOPS][eta 07m:34s] ... psn -G syscall,wchan Linux Process Snapper v1.1.0 by Tanel Poder [https://0x.tools] Sampling /proc/stat, syscall, wchan for 5 seconds... finished. === Active Threads =================================================================================================== samples | avg_threads | comm | state | syscall | wchan ---------------------------------------------------------------------------------------------------------------------- 3348 | 95.66 | (fio) | Disk (Uninterruptible) | io_submit | rq_qos_wait 142 | 4.06 | (fio) | Disk (Uninterruptible) | io_submit | md_super_wait ... Kstack for the first is: 64_sys_io_submit() io_submit_one() aio_write() blkdev_write_iter() blk_finish_plug() blk_flush_plug_list() raid1_unplug() flush_bio_list() generic_make_request() nvme_ns_head_make_request() direct_make_request() blk_mq_make_request() __rq_qos_throttle() wbt_wait() rq_qos_wait() Also, the raid1 region size may be rather small taken the relatively small LV size hence throttling because of too many write-intent bitmap updates. Try to 'lvconvert -R 512M vgt/lvt' and retry your fio test. |