Bug 204489
Summary: | Computer slowing down and then freez | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Frederic Medery <redhat-bugzilla> |
Component: | kernel | Assignee: | Konrad Rzeszutek <konradr> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.4 | CC: | bbs2web, darrick, herrold, wilburn |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-07-17 17:30:43 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Frederic Medery
2006-08-29 14:54:12 UTC
We have over 30 servers running RHEL4.4 now and close to all of them exibit a massive slow down on their I/O performance. Servers aren't necessarily crashing but the load average is *MUCH* higher with 'top' showing massive iowait. This occured after completing the RHEL4.4 update and then booting off the new kernel... Most dramatic is the slow down on Serial ATA drives, most of which are running on nVidia chipsets so utilising the sata_nv module. ----------------------------------[ lspci ]---------------------------------- 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) ----------------------------------[ lspci ]---------------------------------- -------------------[ grep -i 'sata\|sda' /var/log/dmesg ]------------------- sata_nv 0000:00:07.0: version 0.8 ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xD800 irq 169 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xD808 irq 169 ata1: SATA link up 1.5 Gbps (SStatus 113) nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed scsi0 : sata_nv ata2: SATA link up 1.5 Gbps (SStatus 113) nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed scsi1 : sata_nv SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed sda1 sda2 sda3 nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 sdb:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC400 irq 177 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC408 irq 177 ata3: SATA link down (SStatus 0) scsi2 : sata_nv ata4: SATA link down (SStatus 0) scsi3 : sata_nv -------------------[ grep -i 'sata\|sda' /var/log/dmesg ]------------------- When idle: ---------------------[ hdparm -t -T /dev/sda /dev/sdb ]--------------------- /dev/sda: Timing cached reads: 3068 MB in 2.00 seconds = 1533.47 MB/sec Timing buffered disk reads: 156 MB in 3.01 seconds = 51.75 MB/sec /dev/sdb: Timing cached reads: 3068 MB in 2.00 seconds = 1533.47 MB/sec Timing buffered disk reads: 156 MB in 3.01 seconds = 51.75 MB/sec ---------------------[ hdparm -t -T /dev/sda /dev/sdb ]--------------------- I have a nagging suspicion that this is related to: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=207244 The updated kernel in the bug report fixed the issue on our production mail server. No luck for me, the updated kernel did not fixe the problem. So I went back to 2.6.9.32.0.2 The problem is just with IBM NetVista stations. All HP and Dell are ok Have you tried to pass in 'noapic' during bootup for the RHEL4 U4 kernel? I have tried 'noapic'. It did not help. I have 5 IBM NetVista machines with this problem Can you pass the model type of the NetVista machine, please. Also, try 'nolapic' bootup argumnent. Thanks I also have this problem with IBM Netvista computers: Modele : 8307-81U I will try with nolapic and noapic The type is 8310-47U. I have rebooted with 'nolapic'. I should know by monday if there is any change. 'nolapic' does not help. Darrick, Do you have any idea who would have tested this machine for RHEl4 U4? Thank you. Nope. There weren't any reports of slowdowns with System X hardware, but NetVistas are IBM PC Division/Lenovo products, which means they have different motherboards and different BIOSes. Copying this from another BZ that deals also with NetVista machines: " This sounds like the BIOS bug seen on a number of Thinkcentre boxes (my desktop included). Basically a chunk of SMM (BIOS) code runs and corrupts the local apic registers that define the tick frequency, causing time to increase *very* slowly. With the recent timeofday work (2.6.18+), time should continue to increase properly, but increased latencies will be noticed. Booting w/ noapic will work around the problem, but the correct fix has been to update the BIOS, but it seems the BIOS fix has not yet been implemented for this hardware. The issue should be brought up w/ the hardware folks. See OSDL bugs: http://bugme.osdl.org/show_bug.cgi?id=2544 http://bugme.osdl.org/show_bug.cgi?id=6296 The last of which has a patch that functions as a workaround. I'm not sure however if that patch should go mainline or not (the original developer of the patch just blamed the BIOS and didn't want to push the patch). " However, using 'noapic' did not help you. I was wondering if you had tried to use a more recent version of the kernel - 2.6.18 for example and see if that makes the problem go away? ping? Hello, Just to let you know that the noapci noacpi options resolved my problem. Anyway we are migrating stations to RHEL5 now. OK. Closing BZ as WORKSFORME. There was an update to RHEL5 U1 to solve a timer problem on the NetVista. I am not sure of the BZ at this |