Bug 518287 - Performance problem moving application from RH6.2 -> RHEL4.7
Summary: Performance problem moving application from RH6.2 -> RHEL4.7
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Josef Bacik
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-19 18:15 UTC by Flavio Leitner
Modified: 2018-10-20 04:08 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-07 14:41:40 UTC


Attachments (Terms of Use)
iozone results RHEL3u9, 2.4.34.5, RHEL4, RHEL5 on UP,i686,512Mb,IDE (15.31 KB, application/x-bzip2)
2009-08-25 14:39 UTC, Flavio Leitner
no flags Details
iozone results RHEL5 2.6.18-164.el5 on UP,i686,512Mb,IDE (25.59 KB, application/octet-stream)
2009-08-25 22:17 UTC, Flavio Leitner
no flags Details
comparing results between customers and similar system (2.28 KB, application/x-bzip2)
2009-08-25 22:34 UTC, Flavio Leitner
no flags Details

Comment 12 Flavio Leitner 2009-08-19 18:28:01 UTC
ioz.results.data.ordered.081309
iozone -Ra -g 2G -i0 -i1 for regular ext3 RHEL4 mounted with data=ordered

ioz.results.2048
iozone -Ra -g 2G -i0 -i1 for regular ext3 RHEL4 blocksize of 2048 on filesystem 


IOZONE data collected from customer:

ioz.results.data.normal.081309
iozone -Ra -g 2G -i0 -i1 for regular ext3

ioz.results.data.normal.deadline.081309
iozone -Ra -g 2G -i0 -i1 for regular ext3 RHEL4 Deadline elevator

ioz.results.data.normal.noop.081309
iozone -Ra -g 2G -i0 -i1 for regular ext3 RHEL4 NOOP elevator

ioz.results.data.writeback.081309   
iozone -Ra -g 2G -i0 -i1 for regular ext3 RHEL4 mounted with data=writeback

Comment 13 Ric Wheeler 2009-08-19 19:12:33 UTC
Random writes to a file system should be a "seek" bound work load.

If you can get a blktrace/seekwatcher graph from this workload, it would be very interesting.

In a possibly similar customer app, we got a huge performance improvement when we modified the app to have its multiple threads start writing to a shared directory for a specific amount of time before they all jumped over to another shared directory. This let ext* do better allocation on disk.

One other scheme that normally helps with random write workloads is data journal mode since these random IO's will land in the fs journal and then get pushed out to their permanent locations in the background.

Flavio, have you been able to reproduce this in house on a similar box (similar storage is the most critical thing here)?

Comment 16 Issue Tracker 2009-08-20 18:48:49 UTC
Event posted on 08-20-2009 02:48pm EDT by jagee




This event sent from IssueTracker by jagee 
 issue 330821
it_file 248410

Comment 23 Josef Bacik 2009-08-24 13:57:27 UTC
Has somebody had them test against the disk directly?  iozone can use just the raw disk, so lets try that and see if its a fs specific problem or not.

Comment 24 Flavio Leitner 2009-08-24 14:19:45 UTC
Hi Josef,

In comment#19 you have some outputs available from a in-house system.
It's still missing some, though.

Also, I did a script to compare two iozone outputs and print how much 
it changed in percentage. It's available here:
http://people.redhat.com/~fleitner/iozone-cmp

Example:
  $ ./iozone-cmp ioz.results.data.normal.081309 redhat/iozone.rhel4u7.hpdl380g5.ext3.txt

  results in:
    vv Re-writer vv
    128       11.6  16.2  15.5  18.5  18.3
    256       16.1  17.7  18.1  18.5  17.5  27.5
    512       18.9  18.1  16.3  18.0  18.5  16.9  17.5
    <snipped>

I'm installing rhel3, rhel4, and rhel5 on an UP, i686, 512Mb of RAM with IDE 
to see if I it shows the same or proportional slowness problem.

Flavio

Comment 27 Flavio Leitner 2009-08-25 14:39:57 UTC
Created attachment 358578 [details]
iozone results RHEL3u9, 2.4.34.5, RHEL4, RHEL5 on UP,i686,512Mb,IDE

Hi,

I did some testing with another system here (UP, i686, 512MB ram, IDE)
on ext3 formatted with RHEL3u9.
See the general impressions and the 8k results in %.
Positive numbers: RHEL3u9 wins
Negative numbers: Other wins

RHEL3u9 stock X RHEL3u9 using 2.4.34.5 compiled with their config.
-Re-writer: very good results for RHEL3u9 stock with only some spots
            where 2.4.34.5 wins (mostly >=524288)
 8k: 10.7% 9.8% 4.7% 7.3% 5.6% 8.8% 7.6% 2.1% 3.7% 4.8% 1.1% 6.2%

-Writer: RHEL3u9 wins with a visual (not calculated) average of 20%
 8k: 23.0 22.8 19.5 19.2 18.8 20.8 19.7 16.9 17.7 17.9 15.5 17.8

-Re-Reader: few variations, 2.4.34.5 seems a bit faster.
 8k: 3.6 3.7 1.4 1.2 1.0 -3.8 0.4 -2.9 -2.6 -0.8 -5.4 1.2

-Reader: same as previous, bug more significant on some spots.
 8k: 3.6 3.7 1.4 1.2 1.0 -3.8 0.4 -2.9 -2.6 -0.8 -5.4 1.2

RHEL3u9 X RHEL4u (42.EL)
- Re-writer: RHEL3u9 stock is faster by visually estimated ~15%.
 8k: 4.3   5.9   4.7   5.0   8.5   8.2   8.7   8.5  12.3   7.6   6.3   3.3

- Writer: the same as previous one, but more significant:
 8k: 18.6  12.2  11.5  10.5  12.7  11.5  11.6  12.0  13.8  10.6   9.1   6.8

- Re-reader: close to each other.
 8k: 9.2   4.1   3.7   5.8   3.7   8.8   6.8   2.0   5.9   0.8  -0.4  -3.9

- Reader: RHEL4 seems faster is most spots, but RHEL3 wins with big writes.
 8k: -6.3  -7.4  -9.2 -10.8  -9.3  -3.5  -0.5  -1.5   3.2  -1.7  -3.3  -6.3

RHEL3u9 X RHEL5 (-8.el5)
- Re-writer: more constant, rhel3u9 still wins most spots
 8k: 5.8   7.3   4.3   4.0   9.3   9.6   7.1   7.4   8.4   3.6   5.9   4.2

- Writer: RHEL3u9 wins by ~15%
 8k: 16.4  15.7  13.3  11.5  15.4  15.2  12.6  13.1  13.7  10.0  11.7   9.8

- Re-Reader: more close to each other, RHEL3 still wins
 8k: 10.2   6.8   7.1   3.4   4.3  11.5   5.6   3.4   4.3  -1.2   1.8  -0.2

- Reader: more close to each other, RHEL5 wins.
 8k: -1.7  -5.1  -8.7 -10.4  -9.5  -0.4  -2.2   0.3   1.8  -4.0   0.1  -2.5

These results shows that writing is faster on RHEL3 stock, but not so
significant as their results. So, either the limited resources of this
box is not showing another additional problem or it could be more 
hardware related.

Perhaps testing on a system with more resources different from what 
customer is using would be good to understand this more.

Thoughts? Josef?

thanks,
Flavio

Comment 28 Josef Bacik 2009-08-25 15:02:48 UTC
yeah, will you test with the latest rhel5 kernel in brew?  I backported a huge re-writing of the write path which I think will give us numbers closer to RHEL3.

Comment 29 Flavio Leitner 2009-08-25 20:32:57 UTC
Their results:

rh62.2.4.34.5-1-i686 X RHEL4
Re-writer: Numbers of rh62.2.4.34.5-1-i686 are constant better in 45% or more.
8k: 45.3 45.9 45.1 37.9 43.5 43.7 37.0 42.8 43.7 48.2 35.6 32.3

Writer: even better than before, 55-60% or more.
8k: 56.6 63.9 63.4 57.8 63.4 63.2 58.4 63.2 62.9 61.0 55.9 52.5

Re-Reader: the same, ~25%
8k: 34.5 32.9 31.7 30.5 30.1 29.4 29.6 29.6 29.5 41.8 20.0 14.3

Reader: the same as above
8k: 32.0 29.9 28.6 27.1 26.1 25.7 26.1 26.3 26.8 38.7 19.5 13.1

Default io scheduler is better than using deadline or noop, but
almost no significant change except for 4096 with is worse.

Using data=ordered or writeback didn't improve anything either.
(almost no change, defaults are better)

I'm also running the test with latest rhel5 kernel
Flavio

Comment 30 Flavio Leitner 2009-08-25 22:17:59 UTC
Created attachment 358644 [details]
iozone results RHEL5 2.6.18-164.el5 on UP,i686,512Mb,IDE

RHEL5 -8.el5 X -164.el5 (latest on brewroot/packages/kernel/2.6.18/)

Re-writer:
8k: -2.1  -2.0  -0.7  -0.3  -1.0  -0.6  -0.3   2.2   4.2   1.2  -1.2  -2.8
Writer:
8k: 0.4   0.3   2.0   1.9   0.9   0.8   1.1   3.2   4.0   2.5   0.4  -0.5
Re-Reader:
8k: 4.6   3.9  -1.9   1.6   4.6  -0.1  -2.4   1.4   2.9   0.0  -2.1  -4.4
Reader:
8k: 1.4   1.6   1.1   1.9   3.2  -0.8  -2.0   1.0   2.7   0.1  -3.7  -4.6

Almost no change on my test system.

Comment 31 Flavio Leitner 2009-08-25 22:34:33 UTC
Created attachment 358646 [details]
comparing results between customers and similar system

(ext3 only)
On Re-writer test, we didn't reproduce the same issue because it shows
smalls numbers and sometimes faster and another slower.

The Writer results indicates the opposite. While in-house the performance
has improved, their results shows twice slower.

The Re-Reader results seems to be align, so both shows a performance degradation.

The Reader results are very interesting because with small writing blocks,
the in-house system shows half of performance degradation of their system does.
However, with large writing blocks the performance problem reduces
significantly.

I'm assuming the in-house system has the same HW or very close.

So, I'd say neither my system nor the lab's system has reproduced the problem
entirely. I'll run it again to make sure.

Flavio

Comment 32 Flavio Leitner 2009-08-28 17:56:20 UTC
e-mail copy&paste:

On Fri, Aug 28, 2009 at 12:00:44AM -0500, Carlson, Scott wrote:
>    Flavio,
>
>    I noticed in your updates to the Bug Report that you may be having
>    difficulty duplicating the hardware environment.  Is there any information
>    that I can gather for you that would help in duplicating the environment? 
>    Our numbers appear to be a consistent multiple slower in a couple of use
>    cases, and I’d like to make sure that we not only see the trend, but if
>    possible, the same factor.  
>
>    Can you please let me know if any more information would be useful?

The tests results show some performance variation which sometimes
reproduces a consistent multiple slower, but only on few outputs.
That means we are likely hitting some other bottleneck.

However, the last iozone.redhat_hp-dl380g5-paypal_rhl6.2img.ext2.txt
shows the following below when comparing with
iozone.rhel3u9.hp-dl380g5.nolvm.ext3.txt:

vv Re-writer vv
128       57.5  50.6  45.4  43.8  42.3  41.8
256       56.2  51.1  45.5  42.9  41.4  41.1  41.8
512       55.5  50.9  45.5  42.3  42.8  40.0  41.7  40.1
1024      53.5  50.6  45.1  43.0  41.7  40.8  41.3  39.9  39.2
2048      52.9  49.2  45.5  43.2  41.4  41.2  40.8  39.8  37.3  30.6
4096      51.4  46.3  42.5  40.4  39.2  38.2  38.2  37.8  34.6  29.7 29.3
8192      50.9  45.4  41.0  39.1  37.8  37.4  37.1  36.5  34.3  29.2 28.9  28.9
16384     50.3  44.7  40.5  38.6  37.5  35.0  37.5  35.9  34.2  28.9 28.4  29.3  28.3
32768      0.0   0.0   0.0   0.0  37.5  36.4  36.7  35.5  33.9  28.9 37.9  28.1  28.6
65536      0.0   0.0   0.0   0.0  36.4  35.6  35.8  35.3  33.7  28.5 28.4  28.3  27.7
131072     0.0   0.0   0.0   0.0  37.1  40.2  35.6  35.4  33.8  28.3 28.3  28.1  28.0
262144     0.0   0.0   0.0   0.0  35.8  33.6  34.8  33.7  32.5  25.5 25.1  26.5  26.4
524288     0.0   0.0   0.0   0.0  32.3  40.1  31.3  31.4  29.9  25.8 24.3  24.1  24.8
1048576    0.0   0.0   0.0   0.0  92.4  92.1  92.3  92.3  91.7  91.7 89.9  91.4  89.9
2097152    0.0   0.0   0.0   0.0  51.6  52.0  52.4  51.6  51.7  51.7 51.5  51.4  51.6
^^ Re-writer ^^

That above looks very close to the same issue customer is seeing which
is good.

But I see this is another disk, not sure what else changed.
Can you confirm? If changing disks reproduces the problem then we have a
good hint here.

Is it possible to run RHEL3u9 on the same disk/HW? I would like to compare
again and rule out any hardware differences before starting digging into the
software stack. If RHEL3u9 shows the same problem, we have finally
reproduced this and we can move forward.

Regards,
--
Flavio


Note You need to log in before you can comment on or make changes to this bug.