Bug 121029 - "low memory" bug on Tiger4 with 32G
"low memory" bug on Tiger4 with 32G
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Baron
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-04-16 08:12 EDT by Tim Burke
Modified: 2013-03-06 00:56 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-02 00:31:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attached "readme" showing system config info (5.10 KB, text/plain)
2004-04-16 08:14 EDT, Tim Burke
no flags Details
test program (288.91 KB, text/plain)
2004-04-16 08:15 EDT, Tim Burke
no flags Details
syslog output (129.06 KB, text/plain)
2004-04-16 08:16 EDT, Tim Burke
no flags Details
meminfo (2.20 KB, text/plain)
2004-04-16 08:17 EDT, Tim Burke
no flags Details
/var/log/messages after machine HANG (def 121029) (474.81 KB, text/plain)
2004-06-17 05:27 EDT, Claude BRUNET
no flags Details
trace files after HANG (with "echo m > /proc/sysrq-trigger") (879.23 KB, text/plain)
2004-06-17 08:15 EDT, Claude BRUNET
no flags Details

  None (edit)
Description Tim Burke 2004-04-16 08:12:09 EDT
Tim
I  work in Pierre Salkazanov's team at Bull . I am contacting you to
provide to you all the necessary informations to reproduce and fix 
the "low memory" bug discussed during the  meeting between RedHat and
Bull held at Westford the 17&18th march .

The short behavior of the bug is :

    Running the test program dbgen   on Tiger machine with 32 GB of
memory , after a while ( about 2 hours )  the server is no more
accessible, only ping is answering but we can't logged in or ssh or
anything else .
     to the console we've messages such as:
           journal_write_metadata_buffer : ENOMEM at
get_unused_buffer_head, trying again.
          Out of Memory Killed process 6170 (nohup). ...
     the only thing to do is to reboot the machine.

Attached are some documents like readme to start the tescase and
informations after the "system hang" allowing you to reproduce and
debug the problem.
I hope that all these informations are sufficient to help you to fix
the bug nevertheless  feel free to ask me if you need more details.

I sent also to you 32 GB of memory mandatory  to quickly reproduce the
bug ( 16 Dimms of 2GB )
Comment 1 Tim Burke 2004-04-16 08:14:57 EDT
Created attachment 99475 [details]
Attached "readme" showing system config info

Attached "readme" showing system config info
Comment 2 Tim Burke 2004-04-16 08:15:52 EDT
Created attachment 99476 [details]
test program
Comment 3 Tim Burke 2004-04-16 08:16:44 EDT
Created attachment 99477 [details]
syslog output
Comment 4 Tim Burke 2004-04-16 08:17:28 EDT
Created attachment 99478 [details]
meminfo
Comment 5 Tim Burke 2004-04-16 08:20:48 EDT
Here's who the original problem description came from:

Serge CHABUEL             LINUX & HPC Project Manager
Bull S.A.                 1, Rue de Provence 38432 ECHIROLLES
Frec B1-247               Tel: +33 (0)4 76 29 77 36 / fax : 04 76 29 75 18
Comment 6 Tim Burke 2004-04-16 08:22:40 EDT
Oh, his email is serge.chaubel@bull.net.  Another Bull person on the
original mail is pierre.salkazanov@bull.net.  The sales person is
fmeyer@redhat.com.

I tried to add all these guys to the `cc` but apparently, none of the
above have established bugzilla accounts.
Comment 7 Tim Burke 2004-04-16 08:24:01 EDT
Bull claims that this problem can be reproduced on a stock Intel
Tiger4.  So we don't need to wait for a complete Bull system to test
it out; rather we can run it on our existing systems.  (Once the
memory arrives.)
Comment 8 Jason Baron 2004-04-16 15:25:08 EDT
We really should try this on with Larry's change to move the page
structs out of the DMA zone first.
Comment 9 Tim Burke 2004-04-16 16:21:40 EDT
Sure.  Good idea.
Comment 12 Jason Baron 2004-05-18 12:15:41 EDT
ok, i've run this test on a 32GB tiger system, saw no -ENOMEM in the
logs. Also, the dbgen processes are still running, going on 3 days now...
Comment 14 Larry Woodman 2004-05-21 11:24:26 EDT
Pierre, the kernel that includes the change which allocates the
physical pages for the virtual mem_map from the normal zone instead of
the DMAzone can be downloaded from here:

http://people.redhat.com/~lwoodman/.IA64/


BTW, this change will save ~250MB of DMA zone memory so this should
postpone the ENOMEM failures and the OOM killing of processes so that
you never see them.  Can you try out that kernel and let us know if it
fixes your peoblem?


Thanks, Larry Woodman


Comment 15 Jason Baron 2004-05-25 10:42:15 EDT
The kernel that Larry recommends has made a big difference for other
customers who were seeing memory exhuastion. Can you pleae evaluate
this kernel? Adding Tim to the 'cc list so that we can make some
progress on this...it seems like Bull is not seeing this bug...
Comment 16 Jason Baron 2004-06-01 11:16:43 EDT
If we don't get moving on this issue in the next week or so, a
resolution to this issue is in serious jeporday for U3.
Comment 17 Pierre Fumery 2004-06-02 13:04:01 EDT
Test is currently running to reproduce the original problem and to
verify that the RHEL3 Update 3 pre-beta kernel fixes it.
More details on configuration and test results (hopefully good !)
should be posted by tomorrow.
Comment 18 Jérôme ALEXANDRE 2004-06-03 12:30:39 EDT
Unfortunately, the test fails!
The test was stard at 9.30 am, was always running at 11.30 am, but at
12.30 the machine hangs.
With the new kernel (rpm : kernel-2.4.21-15.5.EL.ia64.rpm), the
machine cannot boot if the SCSI controler is an LSI one, but OK with
an Adaptec.
(LSI : LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual
Ultra320 SCSI (rev 07), ADAPTEC : Adaptec AHA-39160D / AIC-7899A
U160/m (rev 01)).

Comment 19 Susan Denham 2004-06-03 13:00:40 EDT
-----Forwarded Message-----
From: Pierre Fumery <pierre.fumery@bull.net>
To: Susan S. Denham <sdenham@redhat.com>
Cc: bnocera@redhat.com
Subject: (121029 good and ... bad) Re: RHN access keys (product IDs)
for RHEL 4 alpha and beta access
Date: 03 Jun 2004 17:33:38 +0200

Hi Sue,
.
.
.

About 121029, to let you directly know:

There are improvements: Machine hung have about three hours (RHEL3 
update 3) instead of one hour previously (same configuration with
RHEL3 update 1).

Memory doesn't seem to be exhausted as previously. So, things oviously 
have been changed but it doesn't seem to address the whole problem ?
Or there is another one.

I already requested guys to post more details directly in 121029. It 
should be done very soon.

We'll need RedHat help to figure out the best way to give you enough 
information to go further on this investigation. On our side, we'll
try to set up a Tiger configuration back with 32Gb here to reproduce
this problem in a similar configuration as yours in Red Hat. The
current test has been  done on an avialable NovaScale machine which
had already 32Gb installed/available.

Pierre.
Comment 20 Pierre Fumery 2004-06-03 13:31:37 EDT
Could you please explain which kind of fix has been integrated. No
much explanation above and http://people.redhat.com/~lwoodman/.IA64/
seems to me empty ?

Also, any recommendation to get further needed traces or logs would be
helpful. THen we could provide them to you to help investigating. 
Comment 21 Jason Baron 2004-06-03 17:33:28 EDT
the change was to not allocate the the 'page structs' out of lowmem.
This should leave a much larger set of lowmem to work with. 

So in this hang there is plenty of lowmem. Alt-Syrq-m provides should
provide this data. 

However, its been mentioned that without the fancyiommu patch, these
tests do not hang....so that is a bit confusing to me as to where to
attack this problem. I guess if we also got and Alt-Sysrq-t, and the
time of the hang that would be helpful.
Comment 22 Jason Baron 2004-06-07 10:35:07 EDT
So the first thing we need to determine is if this hang is still a
lowmem issue. This can found by doing:

1. echo 1 > /proc/sys/kernel/sysrq

then: echo m > /proc/sysrq-trigger, will then give us a good breakdown
of the memory free on the system. you can put this in a script.
Comment 23 Pierre Fumery 2004-06-08 04:43:01 EDT
Test has been launched yesterday evening. Unfortunately, a bad
configuration made it failed ! ... and we found nothing this morning
as test aborted by itself.
Test is currently being relaunched and we set up several traces
(memory, top, ...) to track pertinent information in order to catch
what goes wrong.
Comment 24 Pierre Fumery 2004-06-08 04:46:48 EDT
BTW, I requested kernel guys to have a look if/when machine will hang.
They asked for a kernel debugger or other tools, available on RHEL3.
We use KDB on our system but they ported it on 2.6 as we're mainly
using this kernel level.

Which kernel debugger/tools are you using on RHEL3 to debug such
problems ? Any information (other that memory traces as described
above) would be helpful.
Thanks.
Comment 25 Jason Baron 2004-06-08 10:47:22 EDT
As mentioned on the call today, a useful tool for getting system
information is sysreport. Its just /usr/sbin/sysreport, which creates
a .bz2 file, which contains, the 'proc' filesytems, system logs etc.
Comment 26 Pierre Fumery 2004-06-08 12:45:19 EDT
As mentioned on the call today, test is still running on a 8-way
system with 32GB. We passed 7 hours now and we'll let it run some more
time.

On this same system, the same test (dbgen) crashed RHEL3 Update 1 in
about one hour.
On this same system, dbgen crashed one time RHEL3 Update 3 in about 3
hours but without same memory problems which were seen with RHEL3
Update 1.

I will post more information in a next note.

We plan to let it run until tomorrow (our time) and we will then
restart it on this same machine but running with 16-way and 64 GB of
memory.
Comment 27 Pierre Fumery 2004-06-08 12:53:05 EDT
With RHEL Update 1, dbgen was crashing our machine (8-way, 32GB) after
about one hour.

Following traces were kept at the beginning and at the end (machine
hang) when running:

> meminfo.txt
while :
do
        mois=`date | awk '{print $2}'`
        jour=`date | awk '{print $3}'`
        heure=`date | awk '{print $4}'`
        HighFree=`cat /proc/meminfo | grep HighFree | awk '{print $2}'`
        LowFree=`cat /proc/meminfo | grep LowFree | awk '{print $2}'`
        echo "$mois $jour $heure $HighFree $LowFree" >> meminfo.txt
	sleep 300 ;
done

Jun 2 17:10:03 31109424 101136
...
Jun 2 18:06:31 3817472 20272
Comment 28 Pierre Fumery 2004-06-08 12:56:36 EDT
With RHEL Update 3, dbgen crashed one time our machine (8-way, 32GB)
after about three hours.

Same script (see previous note) was used to keep traces and we got
following traces at the beginning and at the end (machine hang):

Jun 3 09:28:15 30906672 1567648
...
Jun 3 12:33:30 29187264 1354816

Obviously, HighFree and LowFree values were still high and it seems
there was another problem we still not understand.
Comment 29 Pierre Fumery 2004-06-08 12:58:51 EDT
Today, with RHEL Update 3 again, dbgen is still running on our machine
(8-way, 32GB) after more than 7 hours.

Trace script had been improved with Jason's recommendation (thanks !)
as follow:

echo 1 > /proc/sys/kernel/sysrq
> meminfo.txt
while :
do
        mois=`date | awk '{print $2}'`
        jour=`date | awk '{print $3}'`
        heure=`date | awk '{print $4}'`
        HighFree=`cat /proc/meminfo | grep HighFree | awk '{print $2}'`
        LowFree=`cat /proc/meminfo | grep LowFree | awk '{print $2}'`
        echo "$mois $jour $heure $HighFree $LowFree" >> meminfo.txt
        echo m > /proc/sysrq-trigger
	sleep 300 ;
done
Comment 30 Pierre Fumery 2004-06-08 13:00:37 EDT
Today, with RHEL Update 3 again, dbgen is still running on our machine
(8-way, 32GB) after more than 7 hours.

Corresponding traces are the following ones:

Jun 8 10:43:32 30906656 1084320
...
Jun 8 13:44:05 29071696 753744
...
Jun 8 17:44:53 27146880 724208

Trace in /var/log/messages:

Jun  8 10:43:32 eroski kernel: SysRq : Show Memory
Jun  8 10:43:32 eroski kernel: Mem-info:
Jun  8 10:43:32 eroski kernel: Zone:DMA freepages: 67778 min:  1279
low:  3050 high:  4063
Jun  8 10:43:32 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun  8 10:43:32 eroski kernel: Zone:HighMem freepages:1931655 min:  
255 low: 30718 high: 46077
Jun  8 10:43:32 eroski kernel: Free pages:      1999433 (1931655 HighMem)
Jun  8 10:43:32 eroski kernel: ( Active: 17229/13431,
inactive_laundry: 2527, inactive_clean: 1829, free: 1999433 )
Jun  8 10:43:32 eroski kernel:   aa:0 ac:2107 id:11463 il:1911 ic:1824
fr:67778
Jun  8 10:43:32 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun  8 10:43:32 eroski kernel:   aa:8852 ac:6270 id:1968 il:616 ic:5
fr:1931655
Jun  8 10:43:32 eroski kernel: 13968*16kB 11263*32kB 5259*64kB
813*128kB 50*256kB 6*512kB 5*1024kB 3*2048kB 2*4096kB 1*8192kB
1*16384kB 0*32768kB 0*65536kB 0*131072kB 0*262144kB = 1084448kB)
Jun  8 10:43:32 eroski kernel: 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 1*2048kB 1*4096kB 0*8192kB 0*16384kB 1*32768kB
1*65536kB 1*131072kB 117*262144kB = 30906480kB)
Jun  8 10:43:32 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun  8 10:43:32 eroski kernel: 18538 pages of slabcache
Jun  8 10:43:32 eroski kernel: 252 pages of kernel stacks
Jun  8 10:43:32 eroski kernel: 0 lowmem pagetables, 404 highmem pagetables
Jun  8 10:43:32 eroski kernel: Free swap:       2040176kB
Jun  8 10:43:33 eroski kernel: 2095752 pages of RAM
Jun  8 10:43:33 eroski kernel: 14993 reserved pages
Jun  8 10:43:33 eroski kernel: 35383 pages shared
Jun  8 10:43:33 eroski kernel: 0 pages swap cached
Jun  8 10:43:33 eroski kernel: 245 pages in page table cache
Jun  8 10:43:33 eroski kernel: Buffer memory:    33712kB
Jun  8 10:43:33 eroski kernel: Cache memory:   389152kB
Jun  8 10:43:33 eroski kernel:   CLEAN: 754 buffers, 3016 kbyte, 72
used (last=754), 0 locked, 0 dirty 0 delay
Jun  8 10:43:33 eroski kernel:  LOCKED: 1 buffers, 4 kbyte, 1 used
(last=1), 0 locked, 0 dirty 0 delay
Jun  8 10:43:33 eroski kernel:   DIRTY: 51 buffers, 204 kbyte, 51 used
(last=51), 0 locked, 37 dirty 0 delay
Jun  8 10:44:12 eroski kernel: keyboard.c: can't emulate rawmode for
keycode 272
Jun  8 10:45:25 eroski last message repeated 4 times
Jun  8 10:46:28 eroski last message repeated 14 times
Jun  8 10:47:30 eroski last message repeated 20 times
Jun  8 10:48:06 eroski last message repeated 21 times
Jun  8 10:48:33 eroski kernel: SysRq : Show Memory



...



Jun  8 17:04:46 eroski kernel: SysRq : Show Memory
Jun  8 17:04:46 eroski kernel: Mem-info:
Jun  8 17:04:46 eroski kernel: Zone:DMA freepages: 45851 min:  1279
low:  3050 high:  4063
Jun  8 17:04:46 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun  8 17:04:46 eroski kernel: Zone:HighMem freepages:1717724 min:  
255 low: 30718 high: 46077
Jun  8 17:04:46 eroski kernel: Free pages:      1763577 (1717724 HighMem)
Jun  8 17:04:46 eroski kernel: ( Active: 38960/166207,
inactive_laundry: 48360, inactive_clean: 1813, free: 1763577 )
Jun  8 17:04:46 eroski kernel:   aa:0 ac:10784 id:11463 il:1882
ic:1808 fr:45851
Jun  8 17:04:46 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun  8 17:04:46 eroski kernel:   aa:10456 ac:17708 id:154744 il:46478
ic:5 fr:1717724
Jun  8 17:04:46 eroski kernel: 519*16kB 6954*32kB 5282*64kB 817*128kB
51*256kB 6*512kB 5*1024kB 3*2048kB 2*4096kB 1*8192kB 1*16384kB
0*32768kB 0*65536kB 0*131072kB 0*262144kB = 733616kB)
Jun  8 17:04:46 eroski kernel: 214*16kB 41*32kB 1*64kB 0*128kB 1*256kB
1*512kB 0*1024kB 1*2048kB 0*4096kB 0*8192kB 1*16384kB 0*32768kB
1*65536kB 1*131072kB 104*262144kB = 27483584kB)
Jun  8 17:04:46 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun  8 17:04:47 eroski kernel: 29698 pages of slabcache
Jun  8 17:04:47 eroski kernel: 310 pages of kernel stacks
Jun  8 17:04:47 eroski kernel: 0 lowmem pagetables, 541 highmem pagetables
Jun  8 17:04:47 eroski kernel: Free swap:       2040176kB
Jun  8 17:04:47 eroski kernel: 2095752 pages of RAM
Jun  8 17:04:47 eroski kernel: 14993 reserved pages
Jun  8 17:04:48 eroski kernel: 201630 pages shared
Jun  8 17:04:48 eroski kernel: 0 pages swap cached
Jun  8 17:04:48 eroski kernel: 815 pages in page table cache
Jun  8 17:04:48 eroski kernel: Buffer memory:   174256kB
Jun  8 17:04:48 eroski kernel: Cache memory:   3607024kB
Jun  8 17:07:54 eroski sshd(pam_unix)[15071]: authentication failure;
logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=129.183.160.3  user=root
Jun  8 17:08:01 eroski sshd(pam_unix)[15071]: session opened for user
root by (uid=0)
Jun  8 17:09:27 eroski sshd(pam_unix)[28237]: session opened for user
root by (uid=0)
Jun  8 17:09:47 eroski kernel: SysRq : Show Memory
Comment 31 Pierre Fumery 2004-06-08 13:15:40 EDT
Today, with RHEL Update 3 again, dbgen is still running on our machine
(8-way, 32GB) after more than 7 hours.

HighFree is still going down slowly from 30906656 to 27146880 (vs.
from 31109424 to 3817472 with RHEL3 Update 1).
LowFree is still going down slowly from 1084320 to 724208 (vs. from
101136 to 20272 with RHEL3 Update 1).

However, with current RHEL3 Update 3:
DMA freepages: 67778 is going down (after more than 7 hours) to 45851.
Zone:HighMem freepages:1931655 is going down (after more than 7 hours)
to 1717724
Comment 32 Pierre Fumery 2004-06-08 13:19:00 EDT
We'll see where we'll be tomorrow morning (our time) on these values.

But, please let us know if these values going slowly down is a normal
behaviour.

We're not sure to have a look at right triggers and any advice will be
welcomed. Thanks in advance for your analysis and your feedback.
Comment 33 Susan Denham 2004-06-08 13:22:29 EDT
Pierre and team:

Since 16-way support in U3 is a goal for Bull, may I suggest that you
also simultaneously kick off this testing (that is, immediately,
please!) on your 16-way system.   Running these tests in parallel on
the 8-way and 16-way -- and hopefully, they'll run successfully --
will be the most efficient way to gain confidence before our 15 June
U3 code freeze that the problem has been addressed.

Thanks very much,
Sue
Comment 34 Pierre Fumery 2004-06-08 13:26:29 EDT
We'll see where we'll be tomorrow morning (our time) on these values.

But, please let us know if these values going slowly down is a normal
behaviour.

We're not sure to have a look at right triggers and any advice will be
welcomed. Thanks in advance for your analysis and your feedback.
Comment 35 Pierre Fumery 2004-06-08 13:34:25 EDT
This is currently the same machine and we don't have (right now
available) a 8-way + a 16-way.

Our machine will be upgraded from 8-way/32GB to a 16-way/64GB machine
tomorrow morning.

But, in the mean time, I'm working to get another machine NS4040
(4-way) upgraded with 32GB too. And we will start another campaign on
this other configuration.

To summarize, I expect to run dbgen with RHEL3 Update 3 on both one
NS5160 (16-way/64GB) and one NS4040 (4-way/32GB) by tomorrow.
Comment 36 Jason Baron 2004-06-08 16:42:25 EDT
The freepages going down is simply that the caches are growing-page
cache and buffer caches as the test performs I/O. No abnormal behavior
there.
Comment 37 Pierre Fumery 2004-06-09 05:31:33 EDT
dbgen is a test that performs I/O. But I've been told it restarts
tests when previous ones completed.
So, we could expect to get all system ressources cleaned up by the
kernel before re-acquiring them for the next loop.

If I well understood, we should not see freepages going down in this case.

Our 8-way system is still running and we decided this morning to let
it run a little bit more, as freepages are still going down. We'd like
to see if something will happen (daemon killing other process ?) when
a mimimum trigger is reached ?
Comment 38 Pierre Fumery 2004-06-09 05:33:52 EDT
In the mean time, we're setting up another machine NS4040 with 32GB
(done) and with enough disks (> 200 GB, in progress) to run dbgen on
such a configuration for several days too.
Comment 39 Jason Baron 2004-06-09 08:34:55 EDT
The buffer and page caches are 'independent' of any specific
processes, so even if dbgen cleans itself up, the caches are not
completely flush. The system will only begin to aggresively scrub
these caches when there is memory pressure.
Comment 40 Pierre Fumery 2004-06-09 09:15:37 EDT
OK, I got it.

Test is currently running and we'll see if there is a problem when
"the system will begin to aggresively scrub these caches when there is
memory pressure."

If there is such a problem (to be confirmed !), we need to test on a
16-way system and 64GB to analyze if and when it could occur.

On a 8-way + 32GB memory, no problem so far (> 24 hours). Good thing !
Comment 41 Pierre Fumery 2004-06-09 09:46:42 EDT
Our 8-way system is still running and lowFree memory is going up and
down that confirms your explanation. And it seems to work well now.

So, we will upgrade our system as a 16-way. I just requested people to
do so but it involve specific setting.

Anyway, find below last traces results:

Jun 9 14:57:12 20229648 33440
Jun 9 15:02:15 20197088 29008
Jun 9 15:07:16 20181040 25568
Jun 9 15:12:18 20169440 23472
Jun 9 15:17:20 20158288 37712
Jun 9 15:22:22 20152496 36304
Jun 9 15:27:26 20146304 39056
Jun 9 15:32:29 20123056 34656
Jun 9 15:37:33 20117200 33504


Jun  9 15:37:33 eroski kernel: SysRq : Show Memory
Jun  9 15:37:33 eroski kernel: Mem-info:
Jun  9 15:37:33 eroski kernel: Zone:DMA freepages:  2101 min:  1279
low:  3050 high:  4063
Jun  9 15:37:33 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun  9 15:37:33 eroski kernel: Zone:HighMem freepages:1257325 min:  
255 low: 30718 high: 46077
Jun  9 15:37:33 eroski kernel: Free pages:      1259427 (1257325 HighMem)
Jun  9 15:37:33 eroski kernel: ( Active: 68914/516749,
inactive_laundry: 150760, inactive_clean: 4308, free: 1259427 )
Jun  9 15:37:33 eroski kernel:   aa:0 ac:37209 id:10673 il:1584
ic:1618 fr:2102
Jun  9 15:37:33 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun  9 15:37:33 eroski kernel:   aa:13363 ac:18347 id:506076 il:149176
ic:2690 fr:1257325
Jun  9 15:37:33 eroski kernel: 530*16kB 70*32kB 2*64kB 14*128kB
2*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB 0*8192kB 1*16384kB
0*32768kB 0*65536kB 0*131072kB 0*262144kB = 33632kB)
Jun  9 15:37:33 eroski kernel: 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB
1*512kB 1*1024kB 0*2048kB 1*4096kB 1*8192kB 1*16384kB 1*32768kB
0*65536kB 1*131072kB 76*262144kB = 20117200kB)
Jun  9 15:37:33 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun  9 15:37:35 eroski kernel: 49259 pages of slabcache
Jun  9 15:37:35 eroski kernel: 310 pages of kernel stacks
Jun  9 15:37:37 eroski kernel: 0 lowmem pagetables, 541 highmem pagetables
Jun  9 15:37:37 eroski kernel: Free swap:       2040176kB
Jun  9 15:37:37 eroski kernel: 2095752 pages of RAM
Jun  9 15:37:37 eroski kernel: 14993 reserved pages
Jun  9 15:37:37 eroski kernel: 575762 pages shared
Jun  9 15:37:37 eroski kernel: 0 pages swap cached
Jun  9 15:37:37 eroski kernel: 1389 pages in page table cache
Jun  9 15:37:37 eroski kernel: Buffer memory:   596864kB
Jun  9 15:37:38 eroski kernel: Cache memory:   10903008kB
Comment 42 Susan Denham 2004-06-09 10:14:10 EDT
Just to repeat:  claiming victory on the 8-way with 32Gb.  So the
problem is considered resolved on the 8-way for U3.  On to the 16-way
testing!
Comment 43 Pierre Fumery 2004-06-09 10:15:59 EDT
Our 8-way system is still running and lowFree memory is going up and
down that confirms your explanation. And it seems to work well now.

So, we will upgrade our system as a 16-way. I just requested people to
do so but it involve specific setting.

Anyway, find below last traces results:

Jun 9 14:57:12 20229648 33440
Jun 9 15:02:15 20197088 29008
Jun 9 15:07:16 20181040 25568
Jun 9 15:12:18 20169440 23472
Jun 9 15:17:20 20158288 37712
Jun 9 15:22:22 20152496 36304
Jun 9 15:27:26 20146304 39056
Jun 9 15:32:29 20123056 34656
Jun 9 15:37:33 20117200 33504


Jun  9 15:37:33 eroski kernel: SysRq : Show Memory
Jun  9 15:37:33 eroski kernel: Mem-info:
Jun  9 15:37:33 eroski kernel: Zone:DMA freepages:  2101 min:  1279
low:  3050 high:  4063
Jun  9 15:37:33 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun  9 15:37:33 eroski kernel: Zone:HighMem freepages:1257325 min:  
255 low: 30718 high: 46077
Jun  9 15:37:33 eroski kernel: Free pages:      1259427 (1257325 HighMem)
Jun  9 15:37:33 eroski kernel: ( Active: 68914/516749,
inactive_laundry: 150760, inactive_clean: 4308, free: 1259427 )
Jun  9 15:37:33 eroski kernel:   aa:0 ac:37209 id:10673 il:1584
ic:1618 fr:2102
Jun  9 15:37:33 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun  9 15:37:33 eroski kernel:   aa:13363 ac:18347 id:506076 il:149176
ic:2690 fr:1257325
Jun  9 15:37:33 eroski kernel: 530*16kB 70*32kB 2*64kB 14*128kB
2*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB 0*8192kB 1*16384kB
0*32768kB 0*65536kB 0*131072kB 0*262144kB = 33632kB)
Jun  9 15:37:33 eroski kernel: 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB
1*512kB 1*1024kB 0*2048kB 1*4096kB 1*8192kB 1*16384kB 1*32768kB
0*65536kB 1*131072kB 76*262144kB = 20117200kB)
Jun  9 15:37:33 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun  9 15:37:35 eroski kernel: 49259 pages of slabcache
Jun  9 15:37:35 eroski kernel: 310 pages of kernel stacks
Jun  9 15:37:37 eroski kernel: 0 lowmem pagetables, 541 highmem pagetables
Jun  9 15:37:37 eroski kernel: Free swap:       2040176kB
Jun  9 15:37:37 eroski kernel: 2095752 pages of RAM
Jun  9 15:37:37 eroski kernel: 14993 reserved pages
Jun  9 15:37:37 eroski kernel: 575762 pages shared
Jun  9 15:37:37 eroski kernel: 0 pages swap cached
Jun  9 15:37:37 eroski kernel: 1389 pages in page table cache
Jun  9 15:37:37 eroski kernel: Buffer memory:   596864kB
Jun  9 15:37:38 eroski kernel: Cache memory:   10903008kB
Comment 44 Susan Denham 2004-06-09 10:22:27 EDT
to further clarify:  This is where Bull is right now:  

Per Pierre:  To summarize, we will now run dbgen with RHEL3 Update 3
on both one NS5160 (16-way/64GB) and one NS4040 (4-way/32GB).
Comment 45 Susan Denham 2004-06-09 10:25:11 EDT
Pierre expects to have the NS5160 testing underway today, with results
we hope by tomorrow.
Comment 46 Pierre Fumery 2004-06-09 12:08:43 EDT
Following our teleconf., Jason will provide us the latest RHEL3 Update
3 available kernel. We may want to use it for the NS5160 testing.
Comment 47 Jason Baron 2004-06-09 19:53:58 EDT
let's just stick with the 15.5 kernel for now, since the later ones
might introduce additional variables at this point.
Comment 48 Pierre Fumery 2004-06-10 04:42:01 EDT
As we were waiting a new kernel, we didn't stop our 8-way machine yet
and it ran another night around without big problem. Only swap
problems seem to appear that prevent to get full traces when reading
trace files.

But here are traces information from this morning:
Jun 10 08:29:06 17440080 23920
Jun 10 08:36:02 17440880 23840
Jun 10 08:42:30 17439040 23504
Jun 10 08:49:46 17435584 24944
Jun 10 08:56:11 17434624 24496

Jun 10 03:05:40 eroski kernel: SysRq : Show Memory
Jun 10 03:05:40 eroski kernel: Mem-info:
Jun 10 03:05:40 eroski kernel: Zone:DMA freepages:  2401 min:  1279
low:  3050 high:  4063
Jun 10 03:05:40 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun 10 03:05:40 eroski kernel: Zone:HighMem freepages:1090496 min:  
255 low: 30718 high: 46077
Jun 10 03:05:40 eroski kernel: Free pages:      1092895 (1090496 HighMem)
Jun 10 03:05:40 eroski kernel: ( Active: 70710/644199,
inactive_laundry: 183090, inactive_clean: 10195, free: 1092895 )
Jun 10 03:05:40 eroski kernel:   aa:0 ac:37850 id:10673 il:1584
ic:1618 fr:2399
Jun 10 03:05:40 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun 10 03:05:40 eroski kernel:   aa:14516 ac:18346 id:633526 il:181506
ic:8577 fr:1090496
Jun 10 03:05:41 eroski kernel: 891*16kB 40*32kB 1*64kB 14*128kB
2*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB 0*8192kB 1*16384kB
0*32768kB 0*65536kB 0*131072kB 0*262144kB = 38384kB)
Jun 10 03:05:41 eroski kernel: 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 1*1024kB 1*2048kB 1*4096kB 1*8192kB 0*16384kB 0*32768kB
0*65536kB 1*131072kB 66*262144kB = 17447936kB)
Jun 10 03:05:41 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun 10 03:05:46 eroski kernel: 48565 pages of slabcache
Jun 10 03:05:49 eroski kernel: 310 pages of kernel stacks
Jun 10 03:05:53 eroski kernel: 0 lowmem pagetables, 554 highmem pagetables
Jun 10 03:05:55 eroski kernel: Free swap:       2040176kB
Jun 10 03:05:58 eroski kernel: 2095752 pages of RAM
Jun 10 03:06:04 eroski kernel: 14993 reserved pages
Jun 10 03:06:09 eroski kernel: 703255 pages shared
Jun 10 03:06:14 eroski kernel: 0 pages swap cached
Jun 10 03:06:16 eroski kernel: 286 pages in page table cache
Jun 10 03:06:20 eroski kernel: Buffer memory:   607120kB
Comment 49 Pierre Fumery 2004-06-10 04:43:21 EDT
If we don't expect a new kernel, let's move on setting our machine as
a 16-way/64GB machine right now.
Comment 50 Pierre Fumery 2004-06-10 04:56:04 EDT
Results on NS4040 (4 ways, 32 GB) : Still running since yesterday
afternoon, that means about 17 hours run, without all traces on.

HighFree and LowFree values are still going down and we're expecting
to reach a stable point as we got on our 8-way/32GB machine. Then
values should go up and down around a medium value, but without
hanging our machine. Let's see ...

Jun 9 17:51:45 30921008 1301648
Jun 9 17:52:15 30542112 1298704
Jun 9 17:52:45 30127408 1297824
Jun 9 17:53:15 29720000 1296976
Jun 9 17:53:45 29317440 1296016


Jun 10 10:16:26 24384 25280
Jun 10 10:19:16 29216 24912
Jun 10 10:21:18 28688 23472
Jun 10 10:23:37 28096 23296
Jun 10 10:25:21 27728 22784
Jun 10 10:27:38 31664 22144
Jun 10 10:29:22 31664 20880
Jun 10 10:32:14 26560 21808
Jun 10 10:34:51 26560 20992
Jun 10 10:36:49 24736 20592
Comment 51 Pierre Fumery 2004-06-10 13:01:15 EDT
After some problems to upgrade our system to 16 ways (encrypted
validation key to be renewed and to be downloaded), we succeeded to
start dbgen with the RHEL3 upgrade 3 (release 15.5 as mentionned by
Jason).
Comment 52 Pierre Fumery 2004-06-11 05:37:47 EDT
Results on NS5160 (16-way, 64GB): Started yesterday and still running
that means more than 18 hours run, with traces on.

Jun 10 16:59:21 64206496 1412272
Jun 10 17:04:23 64069088 1338640
Jun 10 17:09:24 64002208 1296800
Jun 10 17:14:26 63951952 1288128
Jun 10 17:19:28 63905200 1287472


Jun 11 10:27:09 55825088 72976
Jun 11 10:32:12 55778992 58560
Jun 11 10:37:15 55736608 48112
Jun 11 10:42:17 55733200 64320
Jun 11 10:47:19 55679680 54144
Jun 11 10:52:22 55653808 49696
Jun 11 10:57:26 55616496 48784
Jun 11 11:02:28 55610176 51664
Jun 11 11:07:33 55588816 53760
Jun 11 11:12:37 55552464 48208


Jun 10 16:59:21 eroski kernel: SysRq : Show Memory
Jun 10 16:59:21 eroski kernel: Mem-info:
Jun 10 16:59:21 eroski kernel: Zone:DMA freepages: 88275 min:  1279
low:  3050 high:  4063
Jun 10 16:59:21 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun 10 16:59:21 eroski kernel: Zone:HighMem freepages:4012883 min:  
255 low: 63486 high: 95229
Jun 10 16:59:21 eroski kernel: Free pages:      4101158 (4012883 HighMem)
Jun 10 16:59:21 eroski kernel: ( Active: 16571/1961, inactive_laundry:
629, inactive_clean: 0, free: 4101158 )
Jun 10 16:59:21 eroski kernel:   aa:0 ac:2102 id:0 il:9 ic:0 fr:88275
Jun 10 16:59:21 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun 10 16:59:21 eroski kernel:   aa:8393 ac:6076 id:1961 il:620 ic:0
fr:4012883
Jun 10 16:59:21 eroski kernel: 1*16kB 3*32kB 1*64kB 1*128kB 0*256kB
0*512kB 1*1024kB 1*2048kB 0*4096kB 0*8192kB 0*16384kB 1*32768kB
1*65536kB 0*131072kB 5*262144kB = 1412400kB)
Jun 10 16:59:21 eroski kernel: 1*16kB 1*32kB 0*64kB 0*128kB 1*256kB
0*512kB 1*1024kB 0*2048kB 1*4096kB 1*8192kB 0*16384kB 1*32768kB
1*65536kB 3*131072kB 243*262144kB = 64206128kB)
Jun 10 16:59:21 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun 10 16:59:21 eroski kernel: 5645 pages of slabcache
Jun 10 16:59:21 eroski kernel: 284 pages of kernel stacks
Jun 10 16:59:21 eroski kernel: 0 lowmem pagetables, 397 highmem pagetables
Jun 10 16:59:21 eroski kernel: Free swap:       2040176kB
Jun 10 16:59:23 eroski kernel: 4192882 pages of RAM
Jun 10 16:59:23 eroski kernel: 29463 reserved pages
Jun 10 16:59:23 eroski kernel: 24006 pages shared
Jun 10 16:59:23 eroski kernel: 0 pages swap cached
Jun 10 16:59:23 eroski kernel: 148 pages in page table cache
Jun 10 16:59:23 eroski kernel: Buffer memory:    34032kB
Jun 10 16:59:23 eroski kernel: Cache memory:   142448kB
Jun 10 16:59:23 eroski kernel:   CLEAN: 723 buffers, 2892 kbyte, 60
used (last=723), 0 locked, 0 dirty 0 delay
Jun 10 16:59:23 eroski kernel:   DIRTY: 171 buffers, 684 kbyte, 171
used (last=171), 0 locked, 150 dirty 0 delay



Jun 11 11:27:44 eroski kernel: SysRq : Show Memory
Jun 11 11:27:44 eroski kernel: Mem-info:
Jun 11 11:27:44 eroski kernel: Zone:DMA freepages:  4294 min:  1279
low:  3050 high:  4063
Jun 11 11:27:44 eroski kernel: Zone:Normal freepages:     0 min:     0
low:     0 high:     0
Jun 11 11:27:44 eroski kernel: Zone:HighMem freepages:3468949 min:  
255 low: 63486 high: 95229
Jun 11 11:27:44 eroski kernel: Free pages:      3473243 (3468949 HighMem)
Jun 11 11:27:44 eroski kernel: ( Active: 41523/419463,
inactive_laundry: 123401, inactive_clean: 2475, free: 3473245 )
Jun 11 11:27:44 eroski kernel:   aa:0 ac:23471 id:4992 il:702 ic:820
fr:4296
Jun 11 11:27:44 eroski kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Jun 11 11:27:44 eroski kernel:   aa:11000 ac:7064 id:414471 il:122699
ic:1655 fr:3468949
Jun 11 11:27:44 eroski kernel: 1070*16kB 420*32kB 36*64kB 2*128kB
1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB 0*8192kB 0*16384kB
1*32768kB 0*65536kB 0*131072kB 0*262144kB = 68704kB)
Jun 11 11:27:44 eroski kernel: 1347*16kB 149*32kB 2*64kB 0*128kB
0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB
1*32768kB 0*65536kB 1*131072kB 211*262144kB = 55503184kB)
Jun 11 11:27:44 eroski kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+0
Jun 11 11:27:44 eroski kernel: 57970 pages of slabcache
Jun 11 11:27:44 eroski kernel: 348 pages of kernel stacks
Jun 11 11:27:44 eroski kernel: 0 lowmem pagetables, 562 highmem pagetables
Jun 11 11:27:44 eroski kernel: Free swap:       2040176kB
Jun 11 11:27:45 eroski kernel: 4192882 pages of RAM
Jun 11 11:27:45 eroski kernel: 29463 reserved pages
Jun 11 11:27:45 eroski kernel: 465186 pages shared
Jun 11 11:27:45 eroski kernel: 0 pages swap cached
Jun 11 11:27:45 eroski kernel: 82 pages in page table cache
Jun 11 11:27:45 eroski kernel: Buffer memory:   455824kB
Jun 11 11:27:45 eroski kernel: Cache memory:   8760672kB
Comment 53 Pierre Fumery 2004-06-11 05:42:33 EDT
Results on NS4040 (4 ways, 32 GB) : Test started two days ago, still
running, that means more than 41 hours run, without all traces on.

HighFree and LowFree values are now stabilized and they go up and
down. System performance slow down but system seems stable.

Jun 9 17:51:45 30921008 1301648
Jun 9 17:52:15 30542112 1298704
Jun 9 17:52:45 30127408 1297824
Jun 9 17:53:15 29720000 1296976
Jun 9 17:53:45 29317440 1296016


Jun 11 11:04:44 25392 22464
Jun 11 11:06:53 27056 21616
Jun 11 11:09:01 27488 21344
Jun 11 11:11:08 27488 21392
Jun 11 11:13:06 26688 21792
Jun 11 11:15:45 26608 20944
Jun 11 11:17:42 24400 33424
Jun 11 11:20:18 24640 31936
Jun 11 11:22:41 28192 29952
Jun 11 11:24:53 26912 30016
Comment 54 Pierre Fumery 2004-06-11 05:56:33 EDT
As a reminder,
Results on NS5160 (configured as 8-way, 32GB): dbgen ran more than 48
hours without crash.
Comment 55 Pierre Fumery 2004-06-15 07:52:51 EDT
Good and bad news:

****** GOOD ******
NS4040 (4-way, 32GB) : still running (6 days), few traces on.

NS5160 (16-way, 64GB) : still running (about 21 hours), few traces on.


****** BAD ******
NS5160 (16-way, 64GB) (same as above) : crashed/hang over last
week-end after 5 hours run, few traces on.
Jun 11 15:18:42 63983232 1145136
...
Jun 11 20:23:50 60436976 932480

This hang/crash looks like same problem we got at first  on 2004-06-08.
Comment 56 Pierre Fumery 2004-06-15 07:56:16 EDT
We definitively have another *hidden* problem somewhere.

But this problem doesn't seem to be directly linked anymore on the
*well-known low memory* bug.

I'd like to close this defect and to open another one in which we will
focus on re-creating these unpredictible hangs/crashes.
Comment 57 Jason Baron 2004-06-15 10:07:05 EDT
ok, changing state to modified. Also, its important that we try and
reproduce the hang on the latest U3 candidate kernel, as the issue
might already be addressed.
Comment 58 Pierre Fumery 2004-06-15 12:28:34 EDT
Where can we get this latest U3 kernel ?
Previous one had been picked up from ftp partners site
(2.4.21-15.5.EL). Please give me a new pointer to get it.

Also, we just received our NS6160 and it's currently being unpacked on
our facilites. hoppefully, we should be able to use it soon.
Comment 59 Jason Baron 2004-06-15 16:27:42 EDT
here is a link to the latest U3 ia64 kernel, please test with this and
let me know if there are any problems. thanks.

http://people.redhat.com/~jbaron/.private/u3/ia64/
Comment 60 Pierre Fumery 2004-06-16 06:16:27 EDT
Got it. We will restart the dbgen test with this kernel on our NS5160
16 ways, 64GB.

FYI, we got another crash/hang last night after (apparently) only 10
minutes ... It was with your previous kernel. We'll see with this new
version.
Comment 61 Pierre Fumery 2004-06-16 13:23:05 EDT
Bad and very bad news !

Directly from Claude who performed tests and sent results to me (in
french):
- noyau 2.4.21-15.5.EL
      2 fois KO apres 10 minutes de test.
- noyau 2.4.21-15.11.EL
      KO apres 10 minutes de test.
      Je laisse la machine en l'etat jusqu'a demain matin

It seems there were uncomplete settings on first tests campaign. So,
less files as expected were created by dbgen. Claude did set them back
 and a full stessfull test is now running and ... machine crashes/hang
in about 10 minutes each time.
Same bad news with both kernel which were provided.

Currently, we were putting traces that gave information every 5
minutes (machines ran several hours) but it's obviously not enough
when machine crashes/hang after 10 minutes only.
We'll set more frequent traces (about 30 seconds) to try to better
figure out what's going on.

Comment 62 Susan Denham 2004-06-16 14:36:31 EDT
: (

Okay, we'll wait for additional info.  Can you also please send us (if
this is a hang) the altsysreq -t data?   And anything else that you
can throw at us (/var/log/messages, etc.).

Any chance that there's a firmware component to what you're seeing? 
What SCSI adapters are you using?
Comment 63 Claude BRUNET 2004-06-17 05:27:09 EDT
Created attachment 101210 [details]
/var/log/messages after machine HANG (def 121029)
Comment 64 Claude BRUNET 2004-06-17 05:47:09 EDT
Some more informations:

- the /var/log/messages file given in comment #63 comes from our
NS5160 16 ways 64GB after the HANG using the dbgen test

- SCSI Adapter= Adaptec
     Content of /proc/pci:

  Bus  4, device   1, function  0:
    SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 1).
      IRQ 54.
      Master Capable.  Latency=64.  Min Gnt=40.Max Lat=25.
      I/O at 0xc400 [0xc4ff].
      Non-prefetchable 64 bit memory at 0xfa6fe000 [0xfa6fefff].
  Bus  4, device   1, function  1:
    SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (#2)
(rev 1).
      IRQ 55.
      Master Capable.  Latency=64.  Min Gnt=40.Max Lat=25.
      I/O at 0xc800 [0xc8ff].
      Non-prefetchable 64 bit memory at 0xfa6ff000 [0xfa6fffff].
Comment 65 Claude BRUNET 2004-06-17 08:15:28 EDT
Created attachment 101217 [details]
trace files after HANG (with "echo m > /proc/sysrq-trigger")

The attached "compressed tar" file contents trace files about dbgen HANG with a
NS5160.

This last test has been done with traces taken every 30s and including the 
"echo m > /proc/sysrq-trigger".
The machine "broke" after 40 minutes.

The tar file includes:
- meminfo.sh: script that takes the traces
- meminfo.txt: ouput from meminfo.sh
- top.txt: result of the "top" command runned during the test
- messages: /var/log/messages saved after rebooting the machine.
Comment 66 Pierre Fumery 2004-06-23 09:11:57 EDT
Did you get a chance to have a look at previous traces ?
Did you find out something wrong ?
As already mentionned, we reproduced this problem several times now.
Comment 67 Pierre Fumery 2004-07-05 05:13:21 EDT
This bug could be set as "closed" and remaining problem is tracked
against 126998.
Comment 68 Bastien Nocera 2004-07-05 08:34:46 EDT
Thanks Pierre, closing now.
Comment 69 Ernie Petrides 2004-07-05 20:51:42 EDT
Thanks for confirming that the original problem has been
resolved.  The fix has already been committed to the RHEL3
U3 patch pool, but the bug should remain in MODIFIED state
until the U3 errata is "pushed" (released) on RHN (at which
time it will be set autmoatically to CLOSED/ERRATA).
Comment 70 John Flanagan 2004-09-02 00:31:21 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html

Note You need to log in before you can comment on or make changes to this bug.