Bug 146772

Summary: RHEL 3.0 is a lot slower than RHEL 2.1 when accessing directories with lot of files
Product: Red Hat Enterprise Linux 3 Reporter: Paolo Campegiani <p.campegiani>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-10 15:47:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paolo Campegiani 2005-02-01 15:50:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041020

Description of problem:
We have a dual Xeon machine, 6 GB RAM, 5 U320 SCSI disks, software
RAID-5. We are using it for "Oracle Applications" (ERP application).

I've noticed this problem during the installation of the Oracle
software: whilst it takes 3-4 hour to do the installation with RHEL
3.0, it takes 1 hour with RHEL 2.1 (we had to downgrade the operating
systems for Oracle related issues, that do not affect this bug).

During the installation, I've noted that almost time was spent in
untarring (not gunzipping) the incredible number of very small files
the  Oracle Applications are made of, so I've decided to do some more
tests.

I wrote two small shell script (provided below):

The first creates 100k files, each with a name of 12 random characters
let's say "D1 D2 D12" then mkdir -p /oracle/D1/D2/D3 then touch
/oracle/D1/D2/D3/D1 D2 D12 (being /oracle the mount point for the
software RAID).

The second creates 10k files, each with random name, in the same
directory.

These are the results:

RedHat 2.1 (2.4.9-27+glibc 2.1.3) First test : 645 secs Second: 28
RedHat 3.0 (2.4.21-20+glibc 2.2)  First test : 1175 secs Second: 36


First script

#! /bin/sh

MATRIX="01234567890ABCDEFGHIJKLMNOPQRSTUVZabcdefghijklmnopqrstuvz"
LENGTH="12"
BASEDIR="/oracle/tmp/"

for i in `seq 1 100000`; do
  n=1
  while [ "$n" -le "$LENGTH" ]
  do
      PASS="$PASS${MATRIX:$(($RANDOM%${#MATRIX})):1}"
      let n+=1
  done
  D1=${PASS:0:1}
  D2=${PASS:1:1}
  D3=${PASS:2:1}
  mkdir -p $BASEDIR/$D1/$D2/$D3
  touch $BASEDIR/$D1/$D2/$D3/$PASS
  PASS=""
done


Second script
#! /bin/sh

MATRIX="01234567890ABCDEFGHIJKLMNOPQRSTUVZabcdefghijklmnopqrstuvz"
LENGTH="12"
BASEDIR="/oracle/tmp/"

for i in `seq 1 10000`; do
  n=1
  while [ "$n" -le "$LENGTH" ]
  do
      PASS="$PASS${MATRIX:$(($RANDOM%${#MATRIX})):1}"
      let n+=1
  done
  touch $BASEDIR$PASS
  PASS=""
done


Important note: I've also measured the performance of RAID-5 software
when it comes to very big files (bonnie++), and RHEL 3.0 is more or
less 30% faster than RHEL 2.1, so it seems to me something related to
directory accessing.







Version-Release number of selected component (if applicable):
kernel 2.4.21-20

How reproducible:
Always

Steps to Reproduce:
1. Use the two provided scripts

    

Actual Results:  RHEL 3.0 is slower than RHEL 2.1 when manipulating
high populated directories.

Additional info:

It could be related to bdflush tuning? I didn't tune nor touch it from
the installation.

Comment 1 Ernie Petrides 2005-02-02 00:11:34 UTC
Hello, Paolo.  I believe this problem has already been fixed in RHEL3 U4.
Please retry this with the latest RHEL3 kernel (which is 2.4.21-27.0.2.EL)
and let us know if the installation time is significantly reduced.

Thanks in advance.  -ernie


Comment 3 Paolo Campegiani 2005-02-03 11:14:14 UTC
Hello Ernie, I've installed 2.4.21-27.0.2.ELsmp and performed the two
tests again: things are better, but RHEL 2.1 kernel is still faster:

RHEL 3.0 with 2.4.21-27.0.2smp   First test: 881 secs  Second : 32 secs.
RedHat 2.1 (2.4.9-27+glibc 2.1.3) First test : 645 secs Second: 28 secs.
RedHat 3.0 (2.4.21-20+glibc 2.2)  First test : 1175 secs Second: 36 secs.


The only big difference in the system is that the RAID software is
getting filled (but we still have near 50 GB of free space), in the
first test (the 2.1) it was almost empty. I don't believe this can
affects performances so heavily.