Bug 795397 - Server frequently goes read only mode on Intel Corporation 5 Series/3400 Series Chipset
Summary: Server frequently goes read only mode on Intel Corporation 5 Series/3400 Seri...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-20 12:20 UTC by yolte
Modified: 2013-02-14 16:07 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-14 16:07:03 UTC
Target Upstream Version:


Attachments (Terms of Use)
lspci output (42.37 KB, text/plain)
2012-02-20 12:21 UTC, yolte
no flags Details

Description yolte 2012-02-20 12:20:35 UTC
Description of problem:
We have 250+ Fujitsu RX100S6 servers running Centos 5.7 X64. These servers (i think on some high load) goes into read-only mode.


Version-Release number of selected component (if applicable):
2.6.18-274.12.1.el5

How reproducible:
It happens on all Centos 5.5, 5.6 or 5.7 based servers. These servers are web hosting servers. They are runnig, plesk, directadmin or cpanel control panels.

Steps to Reproduce:
1. Not sure. I think it happens on some high server load. For example runnig a backup task, copying or moving files to somewhere. So it is related by disk I/O.
  
Actual results:
Feb 11 14:37:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Feb 11 14:37:51 server kernel: ata1.00: irq_stat 0x40000008
Feb 11 14:37:51 server kernel: ata1.00: cmd 60/08:00:e7:4f:55/00:00:16:00:00/40 tag 0 ncq 4096 in
Feb 11 14:37:51 server kernel:          res 41/40:00:e7:4f:55/00:00:16:00:00/40 Emask 0x409 (media error) <F>
Feb 11 14:37:51 server kernel: ata1.00: status: { DRDY ERR }
Feb 11 14:37:51 server kernel: ata1.00: error: { UNC }
Feb 11 14:37:51 server kernel: ata1.00: configured for UDMA/133
Feb 11 14:37:51 server kernel: ata1: EH complete
Feb 11 14:37:51 server kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
Feb 11 14:37:51 server kernel: sda: Write Protect is off
Feb 11 14:37:51 server kernel: SCSI device sda: drive cache: write back
Feb 11 14:38:56 server kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Feb 11 14:38:56 server kernel: ata1.00: cmd 61/a8:00:4f:5d:02/03:00:00:00:00/40 tag 0 ncq 479232 out
Feb 11 14:38:56 server kernel:          res 40/00:00:e7:4f:55/00:00:16:00:00/40 Emask 0x4 (timeout)
Feb 11 14:38:56 server kernel: ata1.00: status: { DRDY }
Feb 11 14:38:56 server kernel: ata1: hard resetting link
Feb 11 14:38:57 server kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 11 14:38:57 server kernel: ata1.00: configured for UDMA/133
Feb 11 14:38:57 server kernel: ata1: EH complete
Feb 11 14:38:57 server kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
Feb 11 14:38:57 server kernel: sda: Write Protect is off
Feb 11 14:38:57 server kernel: SCSI device sda: drive cache: write back

Expected results:
Should not go read-only mode

Additional info:
As you see on attahcment of lspci, these servers has SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller. Maybe this controller had a problem with Centos.
I also tried to turn of NCQ on servers with this command below, but it does not works;
echo 1 > /sys/block/sda/device/queue_depth (also added to rc.local)

Comment 1 yolte 2012-02-20 12:21:36 UTC
Created attachment 564423 [details]
lspci output

lspci

Comment 2 Jes Sorensen 2013-02-14 16:07:03 UTC
You're getting media errors from the disk drive(s) - this is a hardware issue
not a software issue.


Note You need to log in before you can comment on or make changes to this bug.