Description of problem: Since 4.4. Everyday, my computer becomes very slow and then freeze. Nothing in the dmesg or log/message. Every morning, when I try to unlock the station, it slows down and the freeze. The problem could also happen during daywork time. No problem when booting from 4.3 kernels Version-Release number of selected component (if applicable): kernel*-2.6.9-42.0* How reproducible: Steps to Reproduce: 1. booting with latest kernel 2. waiting several hours, most of the time more then 8 hours 3. Computer slows down and the freeze 4. Unable to do ctrl-sysrq-{t,m} Actual results: Must do a hard rebood Expected results: Additional info: Application used : firefox thunderbird gnome-terminal [mederyf@trieste ~]$ lspci -bash: lspci: command not found [mederyf@trieste ~]$ lspci [mederyf@trieste ~]$ /sbin/lspci 00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Control ler/Host-Hub Interface (rev 01) 00:01.0 PCI bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP B ridge (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U SB UHCI Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U SB UHCI Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) U SB UHCI Controller #3 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Co ntroller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81) 00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Br idge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Cont roller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH 4-L/ICH4-M) AC'97 Audio Controller (rev 01) 01:00.0 VGA compatible controller: nVidia Corporation NV17GL [Quadro4 200/400 NV S] (rev a3) 02:08.0 Ethernet control I do not use any no RH kernel module
We have over 30 servers running RHEL4.4 now and close to all of them exibit a massive slow down on their I/O performance. Servers aren't necessarily crashing but the load average is *MUCH* higher with 'top' showing massive iowait. This occured after completing the RHEL4.4 update and then booting off the new kernel...
Most dramatic is the slow down on Serial ATA drives, most of which are running on nVidia chipsets so utilising the sata_nv module. ----------------------------------[ lspci ]---------------------------------- 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) ----------------------------------[ lspci ]---------------------------------- -------------------[ grep -i 'sata\|sda' /var/log/dmesg ]------------------- sata_nv 0000:00:07.0: version 0.8 ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xD800 irq 169 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xD808 irq 169 ata1: SATA link up 1.5 Gbps (SStatus 113) nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed scsi0 : sata_nv ata2: SATA link up 1.5 Gbps (SStatus 113) nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed scsi1 : sata_nv SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed sda1 sda2 sda3 nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 sdb:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC400 irq 177 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC408 irq 177 ata3: SATA link down (SStatus 0) scsi2 : sata_nv ata4: SATA link down (SStatus 0) scsi3 : sata_nv -------------------[ grep -i 'sata\|sda' /var/log/dmesg ]------------------- When idle: ---------------------[ hdparm -t -T /dev/sda /dev/sdb ]--------------------- /dev/sda: Timing cached reads: 3068 MB in 2.00 seconds = 1533.47 MB/sec Timing buffered disk reads: 156 MB in 3.01 seconds = 51.75 MB/sec /dev/sdb: Timing cached reads: 3068 MB in 2.00 seconds = 1533.47 MB/sec Timing buffered disk reads: 156 MB in 3.01 seconds = 51.75 MB/sec ---------------------[ hdparm -t -T /dev/sda /dev/sdb ]---------------------
I have a nagging suspicion that this is related to: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=207244 The updated kernel in the bug report fixed the issue on our production mail server.
No luck for me, the updated kernel did not fixe the problem. So I went back to 2.6.9.32.0.2 The problem is just with IBM NetVista stations. All HP and Dell are ok
Have you tried to pass in 'noapic' during bootup for the RHEL4 U4 kernel?
I have tried 'noapic'. It did not help. I have 5 IBM NetVista machines with this problem
Can you pass the model type of the NetVista machine, please. Also, try 'nolapic' bootup argumnent. Thanks
I also have this problem with IBM Netvista computers: Modele : 8307-81U I will try with nolapic and noapic
The type is 8310-47U. I have rebooted with 'nolapic'. I should know by monday if there is any change.
'nolapic' does not help.
Darrick, Do you have any idea who would have tested this machine for RHEl4 U4? Thank you.
Nope. There weren't any reports of slowdowns with System X hardware, but NetVistas are IBM PC Division/Lenovo products, which means they have different motherboards and different BIOSes.
Copying this from another BZ that deals also with NetVista machines: " This sounds like the BIOS bug seen on a number of Thinkcentre boxes (my desktop included). Basically a chunk of SMM (BIOS) code runs and corrupts the local apic registers that define the tick frequency, causing time to increase *very* slowly. With the recent timeofday work (2.6.18+), time should continue to increase properly, but increased latencies will be noticed. Booting w/ noapic will work around the problem, but the correct fix has been to update the BIOS, but it seems the BIOS fix has not yet been implemented for this hardware. The issue should be brought up w/ the hardware folks. See OSDL bugs: http://bugme.osdl.org/show_bug.cgi?id=2544 http://bugme.osdl.org/show_bug.cgi?id=6296 The last of which has a patch that functions as a workaround. I'm not sure however if that patch should go mainline or not (the original developer of the patch just blamed the BIOS and didn't want to push the patch). " However, using 'noapic' did not help you. I was wondering if you had tried to use a more recent version of the kernel - 2.6.18 for example and see if that makes the problem go away?
ping?
Hello, Just to let you know that the noapci noacpi options resolved my problem. Anyway we are migrating stations to RHEL5 now.
OK. Closing BZ as WORKSFORME. There was an update to RHEL5 U1 to solve a timer problem on the NetVista. I am not sure of the BZ at this