Bug 1613425
Summary: | Low sequential write throughput with VDO in RHHI 2.0 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Sahina Bose <sabose> | ||||||||
Component: | vdo | Assignee: | Andy Walsh <awalsh> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | vdo-qe | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.5 | CC: | awalsh, bgurney, dkeefe, esandeen, godas, guillaume.pavese, pasik, psuriset, rhandlin, rhs-bugs, rsussman, sabose, sasundar, ykaul | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | 1613389 | Environment: | |||||||||
Last Closed: | 2020-12-07 07:08:44 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1565467, 1613389 | ||||||||||
Attachments: |
|
Description
Sahina Bose
2018-08-07 14:09:42 UTC
I don't see details on hardware config, tuned profile or vdo thread config used in your test. Having that info associated with the BZ will be helpful. I cloned this bug, so adding needinfo on the original reporter tuned profile -> virtual-host vdo options -> blockmapcachesize=128M readcache=enabled readcachesize=20M emulate512=on writepolicy=auto slabsize=32G Threads are not reconfigured, they are with default with default count what VDO sets. Hardware: Attaching dmidecode output for that. Created attachment 1474880 [details]
Smerf_Machine_dmidecode
Please let me know if any other info needed. Small Correction in the FIO Command provided in the description. The throughput difference was obtained in FIO Sequential Write. This command with the correction is this one: fio --name=writetest --ioengine=sync --rw=write --direct=0 --create_on_open=1 --fsync_on_close=1 --bs=128k --directory=<test-dir-gluster> --filename_format=f.\$jobnum.\$filenum --filesize=16g --size=16g --numjobs=4 Are there any tunables to reduce the performance impact we're seeing? Do you have test results with 4K block writes? Do you see the same drop? Do you see a drop with a non-gluster setup for similar test? Since we don't know where the bottleneck is coming from it is difficult to say if VOD tunables will help increase performance. Do you know if there was an increase in aggregate performance when running this test on multiple VMs? If so, then the layers above VDO might not be filling VDO's queues. It is recommended to do a full suite of performance test on each layer of the stack, tuning along the way, to find the best performance. If not having enough threads in VDO is the issue, then you might see better performance by increasing the logical and physical threads to 2 or 4. We can do some test runs using this fio command on our non-gluster systems and provide results. Can you provide details as to why this fio command is interesting? What type of workload is it simulating? Ran some tests on a Samsung 850 Pro to compare the performance of VDO vs. baseline using the fio command listed above. The setup does not have Gluster involved or virtual machine, but was run directly on hardware. Here is the setup: 1. vdo create --name=vdo1 --device=$drive --emulate512=enabled 2. mkfs.xfs -K -i size=512 -n size=8192 <block device> -f 3. fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 --fsync_on_close=1 --bs=128k --directory=<dir> --filesize=10g --size=10g --numjobs=4 Here are the results basline test name <test|size|jobs|device name> vdo test name <test|size|jobs|physical threads|logical threads|device name> Test MB/s % diff baseline-10-4-dm-2 697 -- vdo-10-4-1-1-dm-2 545 21.81% Results when increasing the logical and physical threads vdo-10-4-1-2-dm-2 539 22.67% vdo-10-4-1-4-dm-2 516 25.97% vdo-10-4-2-1-dm-2 563 19.23% vdo-10-4-2-2-dm-2 550 21.09% vdo-10-4-2-4-dm-2 529 24.10% vdo-10-4-4-1-dm-2 540 22.53% vdo-10-4-4-2-dm-2 533 23.53% vdo-10-4-4-4-dm-2 540 22.53% I'm not seeing the large performance hit as this level. I'll stand up a VM to see how it performs on top of baseline and VDO. Could someone verify that the filesystem configuration is correct? For another perspective, running the test you provided on a laptop with a single Samsung SSD and two cores I see: 320 MB/s without VDO 166 MB/s with VDO That's with the virtual-host tuned profile. I'm pretty sure I'll never get the throughput that Dennis is seeing because my laptop only has two physical cores. My experience is that a physical core can process somewhere between 30K and 50K 4K blcks per second, which translates to 156 MB/s which is right in the range I'm seeing. I would like to see the full config+stats output for VDO and the logs. 1. Can you paste the output of vdo status? 2. Also, can you attach /var/log/messages? Thank you. Thanks, Sahina. I was looking for the VDO write policy configurations, which can change depending on the media used. While bug 1565467 has the system configuration I didn't see how VDO was configured with both SSD and HDD. This could be important if one device was in sync mode and the other was in async mode. # egrep "write policy|REQ_F" | /var/log/messages Apr 9 11:15:23 rhsqa-grafton8 kernel: kvdo0:dmsetup: starting device 'vdo_sdb' device instantiation 0 write policy auto Apr 9 11:15:23 rhsqa-grafton8 kernel: kvdo0:dmsetup: underlying device, REQ_FLUSH: not supported, REQ_FUA: not supported command vdo_status output: VDO status: Date: '2018-04-10 11:42:08+05:30' Node: <host name removed> Kernel module: Loaded: true Name: kvdo Version information: kvdo version: 6.1.0.153 Configuration: File: /etc/vdoconf.yml Last modified: '2018-04-09 11:15:25' VDOs: vdo_sdb: Acknowledgement threads: 1 Activate: enabled Bio rotation interval: 64 Bio submission threads: 4 Block map cache size: 128M Block map period: 16380 Block size: 4096 CPU-work threads: 2 Compression: enabled Configured write policy: auto Deduplication: enabled Device mapper status: 0 419430400000 vdo /dev/sdb albserver online cpu=2,bio=4,ack=1,bioRotationInterval=64 Emulate 512 byte: enabled Hash zone threads: 1 Index checkpoint frequency: 0 Index memory setting: 0.25 Index parallel factor: 0 Index sparse: disabled Index status: online Logical size: 200000G Logical threads: 1 Physical size: 20979200M Physical threads: 1 Read cache: enabled Read cache size: 20M Slab size: 32G Storage device: /dev/sdb VDO statistics: /dev/mapper/vdo_sdb: write policy: sync Testing results In testing VDO with a Samsung 850 the storage stack was configured with VDO, XFS, and then a RHEL 7.5 VM. [VM] 1. FIO was installed on the VM 2. a default virtio block device was created on the VDO storage device (Samsung 850) 3. virtio block device was formatted "mkfs.xfs -K -i size=512 -n size=8192 <block device> -f" 4. fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 --fsync_on_close=1 --bs=128k --directory=<dir> --filesize=10g --size=10g --numjobs=4 Software stack [VM] RHEL 7.5 server - virtio 100G block device qcow2 [XFS] mkfs.xfs -K -i size=512 -n size=8192 <block device> -f [VDO] vdo create --name=vdo1 --device=$drive --emulate512=enabled Here are the changes in tests Read Cache - Read cache was enabled with 20MB No Read Cache - Read cache was disabled L+P = 2 vm-vdo - VDO's threads were increased from 1 to 2 for logical and physical threads VM Cores 1 -> 4 - VM's cores were increased from 1 to 4 async -> sync - VD0's write policy was changed from async -> sync ( the default is auto, so if the device supports flush or fua or both then VDO will set the device to async. If the device doesn't support flush or fua VDO will be set to sync. Results: Test MB/s % diff VM's Baseline 507 Read Cache 347 31.56% (this is closest to the RHHI config, though without Gluster) No Read Cache 348 31.36% L+P = 2 vm-vdo 371 26.82% VM Cores 1 -> 4 351 30.77% async -> sync 151 70.22% (This result is interesting and the reason why I'm interested in knowing how the devices are configured regarding VDO's write policy) Looking at the difference between tests: Test description: baseline-10-4-dm-2 - baseline test result from comment 10 VM's Baseline - baseline result from with in a VM reported in this comment Read Cache - VDO test result from within a VM when the backing storage is VDO reported in this comment vdo-10-4-1-1-dm-2 - VDO test result from comment 10 Comparing baseline-10-4-dm-2 (referred to as baseline below) to he other results we see: Test MB/s % diff baseline-10-4-dm-2 697 VM's Baseline 507 27.26% vdo-10-4-1-1-dm-2 545 21.81% Read Cache 347 50.22% - A 21.81% performance drop was seen when comparing baseline to VDO - A 27.26% performance drop was seen when comparing baseline to baseline within a VM - A 50.22% performance drop was seen when comparing baseline to VM volume backed by VDO - however, this is a 36.33% performance drop from VDO's performance outside of a VM (vdo-10-4-1-1-dm-2) - and 31.56% performance drop from VM's Baseline from within the VM Current path of investigating performance is: 1. Understand whether write policy is impacting performance (effected by flushes or FUA) - Testing from the RHHI team will be needed to identify whether changing the write policy improves performance 2. Is sub 4K writes impacting performance? Louis has some questions out to the XFS team. 3. Some gains have been seen by increasing VDO's threads in the 1 VM case (4%), while this isn't much, when using multiple VM the increases should be greater. - RHHI team will have to test performance when increasing VDO threads Regarding the questions about sub-4k IO, if xfs's mkfs-time sector size is 512, it will do metadata IOs (at least to the log) in 512-byte units. What does: # lsblk -t </dev/vdo-device> say about the device topology? Also, what does xfs_info say about the slow xfs filesystem in question? Raising needinfo on the original reporter (In reply to Dennis Keefe from comment #15) > Current path of investigating performance is: > > 1. Understand whether write policy is impacting performance (effected by > flushes or FUA) > - Testing from the RHHI team will be needed to identify whether changing > the write policy improves performance Currently it is auto and it should be set to sync/async ? > 2. Is sub 4K writes impacting performance? Louis has some questions out to > the XFS team. > 3. Some gains have been seen by increasing VDO's threads in the 1 VM case > (4%), while this isn't much, when > using multiple VM the increases should be greater. > - RHHI team will have to test performance when increasing VDO threads What should be the increased thread_count ? Should set each thread to double ? (In reply to Nikhil Chawla from comment #19) > (In reply to Dennis Keefe from comment #15) > > Current path of investigating performance is: > > > > 1. Understand whether write policy is impacting performance (effected by > > flushes or FUA) > > - Testing from the RHHI team will be needed to identify whether changing > > the write policy improves performance > > Currently it is auto and it should be set to sync/async ? In comment 13 you can find how to find out what auto is configured to from logs or you can use vdo status command and look for "write policy". My suggestion is that you should find out what this is configured for then run another test after switching modes to see if it helps with performance. This can be done online by using the command "vdo changeWritePolicy --name=<volume name> --writePolicy=<sync|async>. > > > 2. Is sub 4K writes impacting performance? Louis has some questions out to > > the XFS team. > > 3. Some gains have been seen by increasing VDO's threads in the 1 VM case > > (4%), while this isn't much, when > > using multiple VM the increases should be greater. > > - RHHI team will have to test performance when increasing VDO threads > > What should be the increased thread_count ? Should set each thread to double > ? This can only be determined by running performance tests and scaling threads and VMs on the system configuration you expect to be used by customers. A guess as to what might work for SSD and NVMe in RHHI would be 4 logical, 4 physical, 5 CPU. Please let us know how those results work out for you. The other advantage of a higher number of threads can be that they spread the load better and thus have less impact on other processes. The downside is that if you have too many threads and not enough processor cores, it can actually slow you down. from comment 19 can you tell me what the write policy mode is on the SSD configuration of VDO? (In reply to Dennis Keefe from comment #22) > from comment 19 can you tell me what the write policy mode is on the SSD > configuration of VDO? For all the devices, currently it is auto. Nikhil, When configured for auto VDO checks the device before creating the volume to determine whether to set the write policy to sync or async mode. I would like to know what VDO selected for the test run with SSDs. The output of vdo status will show what the write policy is set to. Created attachment 1478031 [details]
vdo_status_ouput
Nikhil, Have you tried the tuning suggested by Dennis above yet of increasing the thread counts for VDO to 4 logical, 4 physical, and 5 CPU? Looking at the SSD numbers, that would seem like the next logical step here. We all knew going in that hard drive will take a substantial hit on sequential IO so spending time on that doesn't make much sense, at least until you've fully explored the SSD case. I also notice you've set a readcache which is not something we do by default? Have you checked to see if that is helping performance for your test cases? Nikhil, Thanks for the output of vdo status. I see that the vdo volume was configured to sync by VDO. The FIO command that you use generates compressible data (about 49%). You could also increase performance for this test case by changing the write policy to async. example of the fio test writing compressible data: # vdo create --name=vdo --device=/dev/dm-2 Creating VDO vdo Starting VDO vdo Starting compression on VDO vdo VDO instance 14 volume is ready at /dev/mapper/vdo # vdostats --hum Device Size Used Available Use% Space saving% /dev/mapper/vdo 171.9G 4.0G 167.9G 2% N/A # mkfs.xfs -K /dev/mapper/vdo ;mount /dev/mapper/vdo /vdo1 meta-data=/dev/mapper/vdo isize=512 agcount=4, agsize=10988974 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0 data = bsize=4096 blocks=43955896, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=21462, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # vdostats --hum Device Size Used Available Use% Space saving% /dev/mapper/vdo 171.9G 4.0G 167.9G 2% 99% # fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 --fsync_on_close=1 --bs=128k --directory=/vdo1 --filesize=3g --size=3g --numjobs=4 # vdostats --hum Device Size Used Available Use% Space saving% /dev/mapper/vdo 171.9G 10.1G 161.8G 5% 49% If you do not want to write compressible data in this testing then add the --refill_buffers=1 option to the fio command The space savings is 0 since the data is now unique and not compressible. # fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 --fsync_on_close=1 --bs=128k --directory=/vdo1 --filesize=3g --size=3g --numjobs=4 --refill_buffers=1 # vdostats --hum Device Size Used Available Use% Space saving% /dev/mapper/vdo 171.9G 16.0G 155.9G 9% 0% (In reply to Louis Imershein from comment #26) > Nikhil, > > Have you tried the tuning suggested by Dennis above yet of increasing the > thread counts for VDO to 4 logical, 4 physical, and 5 CPU? Looking at the > SSD numbers, that would seem like the next logical step here. We all knew > going in that hard drive will take a substantial hit on sequential IO so > spending time on that doesn't make much sense, at least until you've fully > explored the SSD case. I've not tried the recently suggested tunings because I've not any set of machines right now. But in 2 weeks, I'll be getting my reservation most probably. > I also notice you've set a readcache which is not something we do by > default? Have you checked to see if that is helping performance for your > test cases? readcache comes enabled by default in RHHI. So, I'll add this also to the list of experiments. (In reply to Dennis Keefe from comment #27) > Nikhil, > Thanks for the output of vdo status. I see that the vdo volume was > configured to sync by VDO. > The FIO command that you use generates compressible data (about 49%). You > could also increase > performance for this test case by changing the write policy to async. > > example of the fio test writing compressible data: > # vdo create --name=vdo --device=/dev/dm-2 > Creating VDO vdo > Starting VDO vdo > Starting compression on VDO vdo > VDO instance 14 volume is ready at /dev/mapper/vdo > > # vdostats --hum > Device Size Used Available Use% Space saving% > /dev/mapper/vdo 171.9G 4.0G 167.9G 2% N/A > > # mkfs.xfs -K /dev/mapper/vdo ;mount /dev/mapper/vdo /vdo1 > meta-data=/dev/mapper/vdo isize=512 agcount=4, agsize=10988974 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=0, sparse=0 > data = bsize=4096 blocks=43955896, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal log bsize=4096 blocks=21462, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > # vdostats --hum > Device Size Used Available Use% Space saving% > /dev/mapper/vdo 171.9G 4.0G 167.9G 2% 99% > # fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 > --fsync_on_close=1 --bs=128k --directory=/vdo1 --filesize=3g --size=3g > --numjobs=4 > > # vdostats --hum > Device Size Used Available Use% Space saving% > /dev/mapper/vdo 171.9G 10.1G 161.8G 5% 49% > > > If you do not want to write compressible data in this testing then add the > --refill_buffers=1 option to the fio command > > The space savings is 0 since the data is now unique and not compressible. > > # fio --name=write --ioengine=sync --rw=write --direct=0 --create_on_open=1 > --fsync_on_close=1 --bs=128k --directory=/vdo1 --filesize=3g --size=3g > --numjobs=4 --refill_buffers=1 > > # vdostats --hum > Device Size Used Available Use% Space saving% > /dev/mapper/vdo 171.9G 16.0G 155.9G 9% 0% Thanks for letting me know. I'll use this command for next round of runs. From the initial bug description, step number 2. 2. Provision a vm and add (RAID6/SSD) storage disk to VM (Virtio-blk,thin-provisioned) What is the disk configuration for the virtio-blk? Is there an xml available for this VM? (In reply to Dennis Keefe from comment #30) > From the initial bug description, step number 2. > > 2. Provision a vm and add (RAID6/SSD) storage disk to VM > (Virtio-blk,thin-provisioned) > > What is the disk configuration for the virtio-blk? Is there an xml > available for this VM? I am attaching the VM xml file in the bz, you can check for "vda", that is a "virtio-blk" device in a VM created over one of my setup. Created attachment 1481322 [details]
VM_sample.xml
Thanks. What is the size of "vda"? (In reply to Dennis Keefe from comment #33) > Thanks. What is the size of "vda"? Since, its an OS disk ,so usually I keep it 40-50GB . Correct. I think it is important to know whether there are performance changes when changing the write policy and or threads. We know that VDO does not give significant performance in RHHI-V.Right now we are looking only high priority customer bugs fix and also we don't have plan now for VDO fixes.So closing this bug for now will reopen if any customer comes with same issue. |