Bug 204489

Summary: Computer slowing down and then freez
Product: Red Hat Enterprise Linux 4 Reporter: Frederic Medery <redhat-bugzilla>
Component: kernelAssignee: Konrad Rzeszutek <konradr>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: bbs2web, darrick, herrold, wilburn
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-17 17:30:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frederic Medery 2006-08-29 14:54:12 UTC
Description of problem:
Since 4.4. Everyday, my computer becomes very slow and then freeze. Nothing in
the dmesg or log/message. Every morning, when I try to unlock the station, it
slows down and the freeze. The problem could also happen during daywork time.
No problem when booting from 4.3 kernels

Version-Release number of selected component (if applicable):
kernel*-2.6.9-42.0*

How reproducible:


Steps to Reproduce:
1. booting with latest kernel
2. waiting several hours, most of the time more then 8 hours
3. Computer slows down and the freeze
4. Unable to do ctrl-sysrq-{t,m}  
Actual results:

Must do a hard rebood 

Expected results:


Additional info:
Application used : 
firefox
thunderbird
gnome-terminal

[mederyf@trieste ~]$ lspci
-bash: lspci: command not found
[mederyf@trieste ~]$ lspci
[mederyf@trieste ~]$ /sbin/lspci
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Control
ler/Host-Hub Interface (rev 01)
00:01.0 PCI bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP B
ridge (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U
SB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U
SB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U
SB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Co
ntroller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Br
idge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Cont
roller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH
4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV17GL [Quadro4 200/400 NV
S] (rev a3)
02:08.0 Ethernet control


I do not use any no RH kernel module

Comment 1 David Herselman 2006-09-13 08:36:52 UTC
We have over 30 servers running RHEL4.4 now and close to all of them exibit
a massive slow down on their I/O performance. Servers aren't necessarily
crashing but the load average is *MUCH* higher with 'top' showing massive
iowait.

This occured after completing the RHEL4.4 update and then booting off the
new kernel...

Comment 2 David Herselman 2006-09-13 09:17:35 UTC
Most dramatic is the slow down on Serial ATA drives, most of which are running
on nVidia chipsets so utilising the sata_nv module.

----------------------------------[ lspci ]----------------------------------
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
----------------------------------[ lspci ]----------------------------------

-------------------[ grep -i 'sata\|sda' /var/log/dmesg ]-------------------
sata_nv 0000:00:07.0: version 0.8
ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xD800 irq 169
ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xD808 irq 169
ata1: SATA link up 1.5 Gbps (SStatus 113)
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
scsi0 : sata_nv
ata2: SATA link up 1.5 Gbps (SStatus 113)
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
scsi1 : sata_nv
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
 sda:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
 sda1 sda2 sda3
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
 sdb:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC400 irq 177
ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC408 irq 177
ata3: SATA link down (SStatus 0)
scsi2 : sata_nv
ata4: SATA link down (SStatus 0)
scsi3 : sata_nv
-------------------[ grep -i 'sata\|sda' /var/log/dmesg ]-------------------


When idle:
---------------------[ hdparm -t -T /dev/sda /dev/sdb ]---------------------
/dev/sda:
 Timing cached reads:   3068 MB in  2.00 seconds = 1533.47 MB/sec
 Timing buffered disk reads:  156 MB in  3.01 seconds =  51.75 MB/sec

/dev/sdb:
 Timing cached reads:   3068 MB in  2.00 seconds = 1533.47 MB/sec
 Timing buffered disk reads:  156 MB in  3.01 seconds =  51.75 MB/sec
---------------------[ hdparm -t -T /dev/sda /dev/sdb ]---------------------


Comment 3 Jason Corley 2006-10-03 13:35:13 UTC
I have a nagging suspicion that this is related to:
    http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=207244
The updated kernel in the bug report fixed the issue on our production mail server.

Comment 4 Frederic Medery 2006-10-06 12:09:51 UTC
No luck for me, the updated kernel did not fixe the problem.
So I went back to 2.6.9.32.0.2
The problem is just with IBM NetVista stations. All HP and Dell are ok

Comment 5 Konrad Rzeszutek 2006-10-12 16:53:34 UTC
Have you tried to pass in 'noapic' during bootup for the RHEL4 U4 kernel?

Comment 6 wilburn 2006-10-12 18:01:49 UTC
I have tried 'noapic'. It did not help.

I have 5 IBM NetVista machines with this problem

Comment 7 Konrad Rzeszutek 2006-10-12 18:44:14 UTC
Can you pass the model type of the NetVista machine, please. Also, try 'nolapic'
 bootup argumnent. Thanks

Comment 8 Frederic Medery 2006-10-12 19:25:57 UTC
I also have this problem with IBM Netvista computers:
Modele : 8307-81U

I will try with nolapic and noapic

Comment 9 wilburn 2006-10-12 20:23:20 UTC
The type is 8310-47U.
I have rebooted with 'nolapic'. I should know by monday if there is any change.

Comment 10 wilburn 2006-10-13 21:31:57 UTC
'nolapic' does not help.

Comment 11 Konrad Rzeszutek 2006-10-16 14:47:00 UTC
Darrick,

Do you have any idea who would have tested this machine for RHEl4 U4? Thank you.

Comment 12 Darrick Wong 2006-10-16 22:55:12 UTC
Nope.  There weren't any reports of slowdowns with System X hardware, but
NetVistas are IBM PC Division/Lenovo products, which means they have different
motherboards and different BIOSes.

Comment 13 Konrad Rzeszutek 2006-12-04 16:32:45 UTC
Copying this from another BZ that deals also with NetVista machines:
"
This sounds like the BIOS bug seen on a number of Thinkcentre boxes (my desktop
included). Basically a chunk of SMM (BIOS) code runs and corrupts the local apic
registers that define the tick frequency, causing time to increase *very*
slowly. With the recent timeofday work (2.6.18+), time should continue to
increase properly, but increased latencies will be noticed. 

Booting w/ noapic will work around the problem, but the correct fix has been to
update the BIOS, but it seems the BIOS fix has not yet been implemented for this
hardware. The issue should be brought up w/ the hardware folks.

See OSDL bugs:
http://bugme.osdl.org/show_bug.cgi?id=2544
http://bugme.osdl.org/show_bug.cgi?id=6296

The last of which has a patch that functions as a workaround. I'm not sure
however if that patch should go mainline or not (the original developer of the
patch just blamed the BIOS and didn't want to push the patch).
"

However, using 'noapic' did not help you. I was wondering if you had
tried to use a more recent version of the kernel - 2.6.18 for example and see if
that makes the problem go away?

Comment 14 Konrad Rzeszutek 2007-07-17 14:26:22 UTC
ping?

Comment 15 Frederic Medery 2007-07-17 14:57:05 UTC
Hello,
Just to let you know that the noapci noacpi options resolved my problem.

Anyway we are migrating stations to RHEL5 now.

Comment 16 Konrad Rzeszutek 2007-07-17 17:30:43 UTC
OK. Closing BZ as WORKSFORME.

There was an update to RHEL5 U1 to solve a timer problem on the NetVista. I am
not sure of the BZ at this