Bug 707757

Summary: cfq-iosched: Set group_isolation tunable 1 by default
Product: Red Hat Enterprise Linux 6 Reporter: Vivek Goyal <vgoyal>
Component: kernelAssignee: Vivek Goyal <vgoyal>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: jeder, juzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-163.el6 Doc Type: Bug Fix
Doc Text:
The default for CFQ's group_isolation variable has been changed from 0 to 1 (/sys/block/<device>/queue/iosched/group_isoaltion). After various testing and numerous user reports, it was found that having default 1 is more useful. When set to 0, all random I/O queues become part of the root cgroup and not the actual cgroup which the application is part of. Consequently, this leads to no service differentiation for applications.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 13:08:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vivek Goyal 2011-05-25 20:42:42 UTC
Description of problem:

Cfq has one tunable group_isolation which is 0 by default. That means for IO controller any processes doing random read/write, that IO will be accounted to root group and not the cgroup process is running in. 

Upstream has got rid of this tunable altogether and behaviour is now that all IO is accounted to the cgroup task is running in.

Majority of the people I have talked to, I always had to recommend to set group_isolation=1 to get any kind of isolation as lots of IO is random IO.

Hence setting group_isolation=1 by default is logical choice and make that choice so that using IO controller becomes easier.

We probably can not get rid of it like upstream due to ABI concerns. So change the default.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Vivek Goyal 2011-06-21 13:57:52 UTC
This is causing some issues with virtualization and virtio disks also. I exported a virtio disk and launched a simple dd in guest. If group_isolation=0, it reduces the overall throughput from 15MB/s to 4-5MB/s.

The reason being that qemu is many IO threads and some of the threads end up being in root group and some threads in guests's cgroup. In 6.1 libvirt has
started making use of blkio controller and started putting each guest in a
cgroup of his own. Because threads can end up in separate groups, we do not
preempt a thread which is idling and not doing IO. This leads to excessive
idling in cfq and leads to poor performance.

Hence it is important to set this tunable to 1 by default.

Comment 3 Vivek Goyal 2011-06-21 14:44:30 UTC
Patch posted to rhkernel list.

Comment 4 RHEL Program Management 2011-06-21 15:00:30 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 5 Vivek Goyal 2011-06-22 13:21:27 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The default for cfq's group_isolation variable has been changed from 0 to 1 (/sys/block/<device>/queue/iosched/group_isoaltion). After more testing and user reports it
was found that having default 1 is more useful. Otherwise
all random IO queues become part of root cgroup and not the
actual cgroup application is part of. And that leads to
no service differentiation for applications. So most of
the users were anyway setting it to 1 explicitly. Now it has been made the default. One can always set it back to 0 if need be.

Comment 6 Aristeu Rozanski 2011-06-27 19:15:28 UTC
Patch(es) available on kernel-2.6.32-163.el6

Comment 9 Mike Gahagan 2011-10-03 15:32:28 UTC
Confirmed that group_isolation is set to 1 by default on the -201 kernel.

Comment 10 Martin Prpič 2011-11-10 12:30:30 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,6 +1 @@
-The default for cfq's group_isolation variable has been changed from 0 to 1 (/sys/block/<device>/queue/iosched/group_isoaltion). After more testing and user reports it
+The default for CFQ's group_isolation variable has been changed from 0 to 1 (/sys/block/<device>/queue/iosched/group_isoaltion). After various testing and numerous user reports, it was found that having default 1 is more useful. When set to 0, all random I/O queues become part of the root cgroup and not the actual cgroup which the application is part of. Consequently, this leads to no service differentiation for applications.-was found that having default 1 is more useful. Otherwise
-all random IO queues become part of root cgroup and not the
-actual cgroup application is part of. And that leads to
-no service differentiation for applications. So most of
-the users were anyway setting it to 1 explicitly. Now it has been made the default. One can always set it back to 0 if need be.

Comment 11 errata-xmlrpc 2011-12-06 13:08:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html