Bug 1247135 - [RFE] Ensure quality of service for sanlock io when using file-based storage [NEEDINFO]
[RFE] Ensure quality of service for sanlock io when using file-based storage
Status: NEW
Product: vdsm
Classification: oVirt
Component: RFEs (Show other bugs)
---
Unspecified Unspecified
unspecified Severity medium (vote)
: ---
: ---
Assigned To: Nir Soffer
Gil Klein
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-27 08:15 EDT by Nir Soffer
Modified: 2018-07-18 10:41 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dfediuck: needinfo? (vbellur)
ylavi: ovirt‑future?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Nir Soffer 2015-07-27 08:15:53 EDT
Description of problem:

Sanlock is writing and reading 1MiB to every storage domain,
using 10 seconds io timeout. If writes/read to storage are too slow
for 40 seconds (4 retries based on 10 seconds timeout), sanlock will
fence the SPM (or in 3.6, any host holding the domain resource),
aborting the current vdsm operations on this host.

We are ok if storage become slow for a while, io opetaion taking many
minutes or hours will simply be slower. However, because of sanlock,
such operation running on the SPM (or any host holding a resource on
the domain) will fail when io is too slow.

We want to ensure that sanlock io have priority over other io in
the system. 

For block-based storage, we use ionice with ioclass=Idle when doing
io-heavy operations like copying and converting images (e.g using dd
and qemu-img). This helps only when using cfq scheduler, but we are
using the deadline scheduler which ignore io priority. However the
deadline scheduler should serve sanlock io request on time, since 
they are small. We probably should revisit the need for using ionice.

For file-based storage such as NFS and gluster, we are not doing 
anything to ensure heavy io operation are not delaying sanlock io.

If seems that we can use cgrougs for this, as described in:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Resource_Management_Guide/sec-prioritizing_network_traffic.html

Version-Release number of selected component (if applicable):
3.6
Comment 1 Nir Soffer 2015-07-27 08:18:48 EDT
David, can you take a look and add missing information?
Comment 2 Nir Soffer 2015-07-27 08:21:30 EDT
Sahina, can you add more details regarding glusterfs?
Comment 3 Nir Soffer 2015-07-27 08:23:05 EDT
This may be related to bug 1243935.
Comment 4 Sahina Bose 2016-12-06 09:13:00 EST
Vijay, is there a way to prioritize some I/O over other in gluster? Will QOS feature in gluster address this?
Comment 5 Vijay Bellur 2017-01-23 23:32:08 EST
(In reply to Sahina Bose from comment #4)
> Vijay, is there a way to prioritize some I/O over other in gluster? Will QOS
> feature in gluster address this?

Currently we do not have any internal mechanism in Gluster to prioritize I/O based on client or application. The proposed QoS feature will provide some means but we are yet to finalize the design.
Comment 6 Doron Fediuck 2018-05-27 09:06:56 EDT
Vijay, any updates?

Note You need to log in before you can comment on or make changes to this bug.