1691405 – Document how to change Hosted Engine I/O scheduler

Bug 1691405 - Document how to change Hosted Engine I/O scheduler

Summary: Document how to change Hosted Engine I/O scheduler

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-appliance
Classification:	oVirt
Component:	Documentation
Sub Component:
Version:	4.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.4.6
Target Release:	---
Assignee:	Steve Goodman
QA Contact:	Lukas Svaty
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-21 14:40 UTC by Strahil Nikolov
Modified:	2021-06-02 17:00 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2021-06-02 17:00:21 UTC
oVirt Team:	Docs
Embargoed:
Flags:	mlehrer: needinfo- sgoodman: needinfo- sbonazzo: needinfo- sgoodman: needinfo- sbonazzo: ovirt-4.4? sbonazzo: planning_ack? pm-rhel: devel_ack+ sbonazzo: testing_ack?

Attachments	(Terms of Use)

Description Strahil Nikolov 2019-03-21 14:40:01 UTC

Description of problem:
Hosted Engine has an unsuitable I/O scheduler. My observations show that 'mq-deadline' does not provide any benefits, while 'none' brings better performance.

Version-Release number of selected component (if applicable):
All


How reproducible:
Every time

Steps to Reproduce:
1.  Install a Hosted Engine
2.  Create a VM and check the time needed for adding a disk
3.  Change scheduler to 'none' and repeat step 2

Actual results:
With 'mq-deadline'  the engine feels slow, guest disk creation time  is longer than using 'none'

Expected results:
Default I/O scheduler  to be set to 'none' by default.

Comment 2 Sandro Bonazzola 2019-03-21 15:34:12 UTC

Roy, any insight from scale team?

Comment 3 Strahil Nikolov 2019-03-21 15:43:46 UTC

Just some info:
Setting 'elevator=none' is not setting the devices' scheduler to 'none'.
According to https://lists.debian.org/debian-kernel/2018/11/msg00141.html there is no equivalent to 'elevator=noop'.
Most probably an udev rule can do the trick.

Comment 4 Strahil Nikolov 2019-03-21 16:11:07 UTC

The following udev rules are working:
ACTION=="add|change", KERNEL=="sd*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="vd*", ATTR{queue/scheduler}="none"

Comment 5 Michal Skrivanek 2019-03-22 05:35:00 UTC

Tuning io shceduler is not the way how HE performance issues should be approached. I would not suggest changing it without full understanding why is it beneficial.

Comment 6 Strahil Nikolov 2019-03-22 05:56:08 UTC

Hi Michal,
So you would recommend to delay and queueour I/O in the mq-deadline scheduler, just to queue it again in the  Host ?
This didn't make sense.

I am pretty sure that I know why we got better performance -> noop & none  do not reorder any I/O , just stacking it. And then the Host will reorder and stack I/O from all  guests (which is an expected and wanted behaviour).

Best Regards,
Strahil Nikolov

Comment 8 Sandro Bonazzola 2019-03-27 07:13:07 UTC

(In reply to Strahil Nikolov from comment #6)
> I am pretty sure that I know why we got better performance -> noop & none 
> do not reorder any I/O , just stacking it. And then the Host will reorder
> and stack I/O from all  guests (which is an expected and wanted behaviour).

If this is the case, I guess that a bug should be filled to tuned package so all guests will be tuned correctly.
I'd like to get input from scale team anyway. Roy?

Comment 9 Roy Golan 2019-04-14 13:19:56 UTC

A suitable scheuduler is a matter of backing storage, and matter of use case. Setting to noop scheduler makes sense for fast disks
and with heavy writes. We just completed some testing with thoughput-performance tuned profile on the hosted engine, hosting 5000 vms. It works
well and shows improvements for some scenarios on our interal storages. ( the tuned disk plugin sets the i/o scheudler to deadline) . 
If you storage doesn't need reordering because you have almost no seek cost, good, then you need noop. 

I suggest to close this bug and to back up all our info with KB article with recommendations. (already few exists which are general)

Comment 10 Strahil Nikolov 2019-04-14 13:28:00 UTC

I have opened this bug as I saw benefits when running the engine & host with the default scheduler on top of fluster (bricks on data 3 consumer hardware) which is one of the slowest available . I doubt someone will use IDE.
Switching from mq-deadline to none brought better responsiveness and now with consumer sata3 ssd even better.
In general , mq-deadline & deadline are recommended for VMs hosting Databases, but my experience led me to the fact that no matter what kind of DB is running, performance is always better if we only stack I/O without reordering (noop/none).

If you see any benefits with mq-deadline - please share what kind  of backend storage  was used, so I can keep in mind.

The proposal for KB article is nice.

Comment 11 Strahil Nikolov 2019-04-14 13:30:39 UTC

It seems that my phone is correcting some of my words :
So, fluster bricks were sata3 rotational consumer disks, while now is consumer sata3 ssd.

Comment 16 David Vaanunu 2019-06-25 09:39:12 UTC

Results from scale testing showed no significant improvement for the RHV Engine in regards to application performance by using noop over deadline.  While higher disk utilization was seen in noop the overall resources used and overall RHV Engine performance did not improve in a significant way to recommend noop as the default IO scheduler for the Hosted Engine at this.

Scale env. tested both conditions 'none' & 'deadline' while running in background engine actions (Create VM, Copy/Move Disk, migrate VM).
Each test was run for 3hours, and each server was monitor using nmon (3sec interval)

Env.Description:
1 Host with HE (include DWH)
2 Hosts with 200VMs

During the tests, only HE was running over the 1st host.  All the exists VMs and the new VMs were run on 2nd&3rd hosts.

By comparing the results of the actions (Response Time) there is no major  gap  (~1-2seconds)
Regarding server results (CPU, Mem, Disk activity)
      Host level :
                CPU & Memory - results are almost the same
                Disk Activity - "deadline" consume ~10% more.

      HE level:
               CPU & Memory - results are almost the same
               Disk activity:
                       DiskRead (Kb/s) - 'none' is higher than 'deadline' (3.33 vs 1)
                       DiskWrite (Kb/s) - 'deadline' is higher than 'none' by ~10% (270 vs 240)
                       Disk Utiliziation - The major gap while using 'none' (80% vs 0.5%)

The extra disk utilization in noop was caused by postgres write activities.
HE engine disk came from FC storage domain.

Comment 17 Sandro Bonazzola 2019-07-11 07:02:16 UTC

Re-targeting to 4.3.6 not being identified as blocker for 4.3.5.

Comment 18 Sandro Bonazzola 2019-07-30 08:21:03 UTC

so looks like none is better since it cause less disk activity on host. Let's adopt it.

Comment 19 Sandro Bonazzola 2020-02-26 14:28:21 UTC

Since there's not clear gain using a scheduler over the others, we can document this in Documentation / KB Article.
Moving accordingly.

Comment 20 Steve Goodman 2020-07-12 09:00:29 UTC

What is the user trying to accomplish?

What exactly do you want to tell our users?

Is this something to adjust after installation?

Comment 21 Strahil Nikolov 2020-07-12 16:31:11 UTC

In this particular case, the HostedEngine's I/O scheduler is reordering I/O requests and then the Hypervisour that is hosting the HE is also doing it with it's  own I/O scheduler.
This double reordering is delaying I/O requests to the  storage layer and performance is not optimal.

The initial  request was to use  either 'noop' (no multiqueue) or 'none'(when multiqueue is  ised) scheduler that will only merge I/O requests  without any reorder (thus no delays).

Based on user's  experience adopting noop/none brings better preformance  on the engine,  but I guess it depends on the infrastructure and whole setup.

Comment 22 Sandro Bonazzola 2020-07-14 10:59:11 UTC

(In reply to Steve Goodman from comment #20)
> What is the user trying to accomplish?

Strahil Nikolov eplained in comment #21


> What exactly do you want to tell our users?

That depending on their datacenter they may gain performance by changing the I/O scheduler to noop/none, 


> Is this something to adjust after installation?

yes, after installation.

Comment 23 Steve Goodman 2020-10-22 17:08:57 UTC

Is this how to change the I/O scheduler to noop/none?



Add elevator=noop toGRUB_CMDLINE_LINUX in /etc/default/grub as shown below.
Raw

# cat /etc/default/grub
[...]
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=vg00/lvroot rhgb quiet elevator=noop"
[...]

    After the entry has been created/updated, rebuild the /boot/grub2/grub.cfg file to include the new configuration with the added parameter:
        On BIOS-based machines: ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
        On UEFI-based machines: ~]# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg


From https://access.redhat.com/solutions/5427

Comment 24 Strahil Nikolov 2020-10-22 17:57:16 UTC

Based on https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html , it seems that 'elevator' is no longer existing - thus I prefer to use the UDEV rules approach.

Comment 25 Steve Goodman 2020-10-28 15:49:24 UTC

Meital,

Who on your team can help me with this? How do you change the I/O scheduler to noop/none?

Comment 26 Sandro Bonazzola 2020-11-18 08:08:44 UTC

Strahil is right, elevator option has been deprecated, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.2_release_notes/deprecated_functionality#BZ-1665295 

Steve, I would recommend to reference RHEL documentation here: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance

avoiding to rewrite same documentation within oVirt / RHV.

Comment 27 Steve Goodman 2021-05-18 14:11:25 UTC

I'll stick the following note at the end of https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/installing_red_hat_virtualization_as_a_self-hosted_engine_using_the_command_line/index#Deploying_the_Self-Hosted_Engine_Using_the_CLI_install_RHVM

[NOTE]
====
Both the {engine-name}'s I/O scheduler and the Hypervisor that is hosting the {engine-name} are reordering I/O requests. This double reordering delays I/O requests to the  storage layer, impacting performance.

Depending on your data center, you might gain performance by changing the I/O scheduler to `none`. For more information, see "Available disk schedulers" [1] in _Monitoring and managing system status and performance_ for RHEL.
====

[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance

Comment 28 Steve Goodman 2021-05-18 14:19:45 UTC

Sandro,

Please review the merge request:
https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Virtualization/-/merge_requests/1980

Give feedback on:
- The location of the note
- The text of the note

Comment 29 Steve Goodman 2021-05-18 14:22:14 UTC

See the note in context in these preview builds:

https://cee-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/CCS/job/ccs-mr-preview/36814/artifact/assembly-Installing_Red_Hat_Virtualization_as_a_self-hosted_engine_using_the_Cockpit_web_interface/preview/index.html#Deploying_the_Self-Hosted_Engine_Using_Cockpit_install_RHVM
https://cee-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/CCS/job/ccs-mr-preview/36814/artifact/assembly-Installing_Red_Hat_Virtualization_as_a_self-hosted_engine_using_the_command_line/preview/index.html#Deploying_the_Self-Hosted_Engine_Using_the_CLI_install_RHVM

The note is at the very end of these topics, right before 5.4

Comment 30 Sandro Bonazzola 2021-05-24 14:40:46 UTC

Looks good to me, also the position of the note looks good.

Comment 31 Steve Goodman 2021-05-26 09:56:04 UTC

Richard,

Can you please do a peer review?

Comment 32 Richard Hoch 2021-05-31 08:03:51 UTC

(In reply to Steve Goodman from comment #31)
> Richard,
> 
> Can you please do a peer review?

Steve, done.

Donna -- one small issue and a suggestion. Aside from that, LGTM.

Comment 33 Steve Goodman 2021-06-01 12:51:50 UTC

Merged.

Note You need to log in before you can comment on or make changes to this bug.