Bug 618793 - Intel SSD resets
Intel SSD resets
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Anton Arapov
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-27 14:22 EDT by Florian La Roche
Modified: 2014-06-18 04:02 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-10 05:25:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Florian La Roche 2010-07-27 14:22:21 EDT
Description of problem:

/dev/sdb is the following Intel SSD:
[root@xen /]# hdparm -i /dev/sdb

/dev/sdb:

 Model=INTEL SSDSA2M080G2GC                    , FwRev=2CV102HD, SerialNo=CVPO9342002A080BGN
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=?1?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156299375
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 1:  ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode





Jul  3 06:40:31 xen kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul  3 06:40:31 xen kernel: ata3.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Jul  3 06:40:31 xen kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul  3 06:40:31 xen kernel: ata3.00: status: { DRDY }
Jul  3 06:40:31 xen kernel: ata3: hard resetting link
Jul  3 06:40:31 xen kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul  3 06:40:31 xen kernel: ata3.00: configured for UDMA/133
Jul  3 06:40:31 xen kernel: ata3: EH complete
Jul  3 06:40:31 xen kernel: SCSI device sdb: 156299375 512-byte hdwr sectors (80025 MB)
Jul  3 06:40:31 xen kernel: sdb: Write Protect is off
Jul  3 06:40:31 xen kernel: SCSI device sdb: drive cache: write back





Dr. google shows information to somehow change the setup to disable writeback
cache or disable command queuing or reduce transfer speed, but that all
didn't help.

Disabling smartd for this device made all problems go away, so maybe there
are some paths within smartd commands which still leed to these issues.

More reported as FYI and I'll be glad to test newer kernel patches,
otherwise disabled smard works ok for me to get a stable system.


regards,

Florian La Roche




This happens even if writeback cache is disabled or also if command queueing
is disabled



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Andrew Jones 2010-07-28 05:22:07 EDT
I see xen in the output. What distro / kernel is your host and also your guest? Is the problem on the host or the guest?

Thanks,
Drew
Comment 2 Florian La Roche 2010-07-28 10:38:51 EDT
This happens on the Xen host, with several xen guests being active and writing
data to the SSD. It happens with RHEL5.5, current RHEL5.5 update kernel and
also with the newest kernel from http://people.redhat.com/jwilson/el5/.

regards,

Florian La Roche
Comment 3 Andrew Jones 2010-07-28 11:34:15 EDT
Do you see the same problems if you boot this machine into the bare-metal kernel? Any interesting logs in 'xm dmesg'?
Comment 4 Florian La Roche 2010-07-28 13:50:07 EDT
This also happens just the same on non-xen bare-metal kernels.

regards,

Florian La Roche
Comment 5 Anton Arapov 2013-10-10 05:25:08 EDT
This is the SSD firmware issue. Update it - it will fix this issue.

Note You need to log in before you can comment on or make changes to this bug.