| Summary: | Server frequently goes read only mode on Intel Corporation 5 Series/3400 Series Chipset | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | yolte <burak> | ||||
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 5.7 | ||||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-14 16:07:03 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 564423 [details]
lspci output
lspci
You're getting media errors from the disk drive(s) - this is a hardware issue not a software issue. |
Description of problem: We have 250+ Fujitsu RX100S6 servers running Centos 5.7 X64. These servers (i think on some high load) goes into read-only mode. Version-Release number of selected component (if applicable): 2.6.18-274.12.1.el5 How reproducible: It happens on all Centos 5.5, 5.6 or 5.7 based servers. These servers are web hosting servers. They are runnig, plesk, directadmin or cpanel control panels. Steps to Reproduce: 1. Not sure. I think it happens on some high server load. For example runnig a backup task, copying or moving files to somewhere. So it is related by disk I/O. Actual results: Feb 11 14:37:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 11 14:37:51 server kernel: ata1.00: irq_stat 0x40000008 Feb 11 14:37:51 server kernel: ata1.00: cmd 60/08:00:e7:4f:55/00:00:16:00:00/40 tag 0 ncq 4096 in Feb 11 14:37:51 server kernel: res 41/40:00:e7:4f:55/00:00:16:00:00/40 Emask 0x409 (media error) <F> Feb 11 14:37:51 server kernel: ata1.00: status: { DRDY ERR } Feb 11 14:37:51 server kernel: ata1.00: error: { UNC } Feb 11 14:37:51 server kernel: ata1.00: configured for UDMA/133 Feb 11 14:37:51 server kernel: ata1: EH complete Feb 11 14:37:51 server kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) Feb 11 14:37:51 server kernel: sda: Write Protect is off Feb 11 14:37:51 server kernel: SCSI device sda: drive cache: write back Feb 11 14:38:56 server kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Feb 11 14:38:56 server kernel: ata1.00: cmd 61/a8:00:4f:5d:02/03:00:00:00:00/40 tag 0 ncq 479232 out Feb 11 14:38:56 server kernel: res 40/00:00:e7:4f:55/00:00:16:00:00/40 Emask 0x4 (timeout) Feb 11 14:38:56 server kernel: ata1.00: status: { DRDY } Feb 11 14:38:56 server kernel: ata1: hard resetting link Feb 11 14:38:57 server kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Feb 11 14:38:57 server kernel: ata1.00: configured for UDMA/133 Feb 11 14:38:57 server kernel: ata1: EH complete Feb 11 14:38:57 server kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) Feb 11 14:38:57 server kernel: sda: Write Protect is off Feb 11 14:38:57 server kernel: SCSI device sda: drive cache: write back Expected results: Should not go read-only mode Additional info: As you see on attahcment of lspci, these servers has SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller. Maybe this controller had a problem with Centos. I also tried to turn of NCQ on servers with this command below, but it does not works; echo 1 > /sys/block/sda/device/queue_depth (also added to rc.local)