Bug 173491 - cfq: Missing struct cfq_queue allocation
Summary: cfq: Missing struct cfq_queue allocation
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2005-11-17 16:05 UTC by Alasdair Kergon
Modified: 2011-02-10 00:45 UTC (History)
7 users (show)

Clone Of:
Last Closed: 2011-02-10 00:45:29 UTC

Attachments (Terms of Use)
cfq patch (827 bytes, patch)
2005-11-17 16:05 UTC, Alasdair Kergon
no flags Details | Diff

Description Alasdair Kergon 2005-11-17 16:05:18 UTC
[cloned from bug 151324]

__make_request() first calls get_request() with GFP_ATOMIC.
If that fails and without read ahead set, it calls

There is a bug in the CFQ scheduler here.
These functions call cfq_set_request() which in turn calls __cfq_get_queue().

The first time __cfq_get_queue() is called for a process, there is no existing 
queue and because  __GFP_WAIT is not set it always fails - not the intended  
behaviour.  This test for __GFP_WAIT should be removed.

However, __cfq_get_queue() is also called from cfq_enqueue() with GFP_ATOMIC.
To retain the existing (correct) behaviour here this call can be replaced with
a direct call to cfq_find_cfq_hash().

patch is included in upstream (2.6.12)..

As RHEL4 is based on older kernel, it needs additional fix

> The second part of this patch is a fix for cfq_enqueue() which uses
> __cfq_get_queue() just to find whether "struct cfq_queue" has already
> existed.  So it should be OK to call cfq_find_cfq_hash().
> Nobody calls __cfq_get_queue() just to search except this part.

For the combination of cfq patch and EWOULDBLOCK patch,
we've done the following test based on RHEL4 U1
and no problem was found:
  - Heavy I/O (parallel combination of dd, cp, mv, diff, rm) to
    SCSI 6 partitions + FC 4 partitions on 16CPUs IPF system for 80 hours.
  - Same type of heavy I/O under high memory pressure for 15 hours.

The performance test results are below.
I re-run the measurement to use exactly same disk partition for
both cases.
                  2.6.9-11.32.EL    2.6.9-11.32.EL + cfq-patch
        elapsed    447.44[sec]            423.19[sec]
        user        23.36[sec]             24.56[sec]
        system      64.44[sec]             64.75[sec]
Above values are the average of ten trials.
In the case with cfq fix, there is 5% performance improvement.
o How to test
1. mke2fs /dev/sdb1
2. mount /dev/sdb1 /mnt/0
3. cd /mnt/0
4. /usr/bin/time /tmp/mkdirtest.sh 10000
--------- mkdirtest.sh ---------
while [ $i -lt $numdir ]; do
        mkdir tmp${i}
        i=`expr $i + 1`

Comment 1 Alasdair Kergon 2005-11-17 16:05:18 UTC
Created attachment 121195 [details]
cfq patch

Comment 2 Alasdair Kergon 2011-02-10 00:45:29 UTC

Note You need to log in before you can comment on or make changes to this bug.