Bug 232170 - I/O time is too long at high offset on multi-threaded random read
I/O time is too long at high offset on multi-threaded random read
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Eric Sandeen
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-13 22:18 EDT by Watanabe Takashi
Modified: 2008-12-04 16:57 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-04 16:57:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test program for multi-threaded random read (2.17 KB, application/octet-stream)
2007-03-13 22:18 EDT, Watanabe Takashi
no flags Details

  None (edit)
Description Watanabe Takashi 2007-03-13 22:18:15 EDT
Description of problem:
When I use multi-threaded application, each threads call random pread(), some
pread() take too long time. These late-pread's argument has high offset.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-42.EL

How reproducible:
try attached test program.

Steps to Reproduce:
1. gcc -Wall -pthread rr.c -o rr  (attached)
2. ./rr /dev/sda 16               (device, threads)
3. wait 60 seconds
4. view each thread's max response time and late read's offset

Actual results:
some reads take too long response time

Expected results:
every reads take fair response time

Additional info:
I'm using CFQ schedular(default).
This problem is not appeared on some other environment(ex. Fedora Core
5(2.6.19-1.2288.2.4.fc5smp)'s CFQ schedular, RHEL4.4's deadline schedular)
Comment 1 Watanabe Takashi 2007-03-13 22:18:15 EDT
Created attachment 150020 [details]
test program for multi-threaded random read
Comment 2 Jason Baron 2007-06-21 11:10:33 EDT
this may be fixed in U5, ie kernel > 2.6.9-55.EL. Can you please try and
reproduce this problem on that kernel. thanks.
Comment 3 Watanabe Takashi 2007-06-22 04:29:03 EDT
this problem is not fixed in Update5 (kernel 2.6.9-55.ELsmp).

(cfq)
# dmesg | grep sched
Using cfq io scheduler
# uname -r
2.6.9-55.ELsmp
# ./rr /dev/sda 16
(large offset preads take >30 sec)
file=/dev/sda, maxoff=17179869184, sz=4096, tm=60.000000, nth=16
late: off=17163519926 time=31.459874
maxt[0xb6108ba0]=26.216165
maxt[0xb7f0bba0]=5.358786
maxt[0xb06ffba0]=2.237537
maxt[0xb1b01ba0]=0.431105
maxt[0xb1100ba0]=31.459874
maxt[0xb750aba0]=5.227217
maxt[0xb4305ba0]=11.970557
maxt[0xafcfeba0]=12.182925
maxt[0xaf2fdba0]=3.282174
late: off=17167991596 time=45.895693
maxt[0xb5707ba0]=45.895693
late: off=17164109599 time=56.973540
maxt[0xb2502ba0]=56.973540
late: off=17175656410 time=46.314144
maxt[0xb4d06ba0]=46.314144
maxt[0xb3904ba0]=16.404330
late: off=17172275062 time=48.144867
maxt[0xb2f03ba0]=48.144867
maxt[0xb6b09ba0]=14.050469
late: off=17178060908 time=41.265806
maxt[0xae8fcba0]=41.265806

(deadline)
# dmesg| grep sched
Using deadline io scheduler
# uname -r
2.6.9-55.ELsmp
# ./rr /dev/sda 16
(all pread take <0.3 sec)
file=/dev/sda, maxoff=17179869184, sz=4096, tm=60.000000, nth=16
maxt[0xb7550ba0]=0.204223
maxt[0xae942ba0]=0.183707
maxt[0xafd44ba0]=0.179676
maxt[0xb2f49ba0]=0.217834
maxt[0xb0745ba0]=0.194200
maxt[0xb7f51ba0]=0.184763
maxt[0xaf343ba0]=0.190673
maxt[0xb4d4cba0]=0.200575
maxt[0xb1146ba0]=0.202907
maxt[0xb574dba0]=0.218257
maxt[0xb394aba0]=0.216692
maxt[0xb434bba0]=0.179260
maxt[0xb614eba0]=0.199719
maxt[0xb6b4fba0]=0.244612
maxt[0xb1b47ba0]=0.207025
maxt[0xb2548ba0]=0.180778
Comment 4 Eric Sandeen 2007-10-16 23:38:51 EDT
The post-RHEL4 fix was likely part of "[PATCH] cfq-v2 I/O scheduler update"
which was a massive cfq rewrite in 2.6.10... extracting the relevant bits will
be interesting.
Comment 5 Eric Sandeen 2007-10-18 11:50:28 EDT
I feel that backporting the very large CFQV2 rewrite patch is a risky approach,
because it will change behavior of our default IO scheduler, well into the RHEL4
lifecycle.  It's more change than we would normally like to make in the RHEL series.

And, trying to extract only the part which limits this latency may not be
possible, since it is not easily extracted from the rest of the rewrite.

I would like to suggest that since the deadline scheduler does not exhibit the
problematic behavior, that may be an acceptable workaround to make this sort of
workload perform, rather than attempting to modify cfq.

Does this sound acceptable?

Thanks,
-Eric
Comment 6 Watanabe Takashi 2007-10-25 01:03:27 EDT
OK, I understand.

Thank you.
Comment 7 Eric Sandeen 2008-12-04 16:57:20 EST
Because a workaround for this exists - namely, other schedulers work better for this sort of workload, we do not plan to make changes to CFQ in RHEL4 to address this issue.  I hope this is acceptable to the reporter.

Thanks,
-Eric

Note You need to log in before you can comment on or make changes to this bug.