Description of problem: On an Intel S2600GZ4 system, if I run "smartctl -x /dev/sd[a-z]" on a SATA disk, it triggers a "hard resetting link" error Version-Release number of selected component (if applicable): smartmontools-5.43-1.fc16.x86_64 How reproducible: Run "smartctl -x /dev/sda", for example Steps to Reproduce: 1. Run "smartctl -x /dev/sda" on a SATA disk 2. 3. Actual results: You will see error messages in /var/log/messages such as this: Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2024.947186] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2024.954477] ata8.00: failed command: SMART Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2024.959130] ata8.00: cmd b0/d6:01:e0:4f:c2/00:00:00:00:00/00 tag 0 pio 512 out Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2024.959130] res d0/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x2 (HSM violation) Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2024.976330] ata8.00: status: { Busy } Oct 25 10:25:07 ti19 kernel: [ID kern.info] [ 2024.980525] ata8: hard resetting link Oct 25 10:25:07 ti19 kernel: [ID kern.info] [ 2025.143312] ata8.00: configured for UDMA/133 Oct 25 10:25:07 ti19 kernel: [ID kern.info] [ 2025.148175] ata8: EH complete Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2025.152192] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2025.159468] ata8.00: failed command: SMART Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2025.164140] ata8.00: cmd b0/d6:01:e0:4f:c2/00:00:00:00:00/00 tag 0 pio 512 out Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2025.164140] res d0/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x2 (HSM violation) Oct 25 10:25:07 ti19 kernel: [ID kern.err] [ 2025.181354] ata8.00: status: { Busy } Oct 25 10:25:07 ti19 kernel: [ID kern.info] [ 2025.185518] ata8: hard resetting link Oct 25 10:25:08 ti19 kernel: [ID kern.info] [ 2025.640134] ata8.00: configured for UDMA/133 Oct 25 10:25:08 ti19 kernel: [ID kern.info] [ 2025.645017] ata8: EH complete And if you were running a SMART self-test, it will fail with status "Interrupted (host reset)". Expected results: No error. Additional info: This system is running the latest kernel 3.6.2-1.fc16.x86_64. I have seen this on 4 different types of SATA disks.
Note: this problem does not occur when running "smartctl -a /dev/sda". Regards, Andy
"-x" is the same as "-H -i -g all -c -A -f brief -l xerror,error -l xselftest,selftest -l selective -l directory -l scttemp -l scterc -l devstat -l sataphy", see man page. According to above kernel logs, hard resets occur after a failing SMART WRITE LOG command to SCT COMMAND log address 0xe0. This command is used by the smartctl "-l sct..." options to READ(!) the info from the drive. Please test whether - the problem occurs if only "-l scterc" or "-l scttemp" is used, and - the problem does not occur if any or all of the other options (included in "-x" but not in "-a") are used: "-l xerror -l xselftest -l directory -l devstat -l sataphy". If this is the case, I presume there is a problem in the (PIO) DATA OUT pass-through implementation of this specific SATA driver. Smartmontools uses common code for generic Linux SATA: translate ATA commands into SAT PASS-THROUGH SCSI commands and post these via SG_IO ioctl.
1. Yes, "smartctl -l scterc /dev/sdb" causes the hard resetting link error. 2. Yes, "smartctl -l scttemp /dev/sdb" causes the hard resetting link error. 3. Correct, "smartctl -l xerror -l xselftest -l directory -l devstat -l sataphy /dev/sdb" does not cause any errors. Thanks, Andy
Hi Christian, thanks for looking at this. David: I've been told that you are our expert for this part of kernel. Could you look at it?
Would you please attach the output of dmesg after bootup? This will show the controller, driver, and drives you are using.
Created attachment 636281 [details] dmesg output from a system exhibiting the smartctl -x problem
Sorry for the delay, I was able to reproduce the problem on linux-3.6.6 using a Hitachi HDS721016CLA382. I verified this upstream commit fixes the issue commit 49bd665c5407a453736d3232ee58f2906b42e83c Author: Maciej Patelczyk <maciej.patelczyk> Date: Mon Oct 15 14:29:03 2012 +0200 [SCSI] isci: copy fis 0x34 response into proper buffer SATA MICROCODE DOWNALOAD fails on isci driver. After receiving Register Device to Host (FIS 0x34) frame Initiator resets phy. In the frame handler routine response (FIS 0x34) was copied into wrong buffer and upper layer did not receive any answer which resulted in timeout and reset. This patch corrects this bug. Signed-off-by: Maciej Patelczyk <maciej.patelczyk> Signed-off-by: Lukasz Dorau <lukasz.dorau> Cc: <stable.org> Signed-off-by: James Bottomley <JBottomley> diff --git a/drivers/scsi/isci/request.c b/drivers/scsi/isci/request.c index c1bafc3..9594ab6 100644 --- a/drivers/scsi/isci/request.c +++ b/drivers/scsi/isci/request.c @@ -1972,7 +1972,7 @@ sci_io_request_frame_handler(struct isci_request *ireq, frame_index, (void **)&frame_buff - sci_controller_copy_sata_response(&ireq->stp.req, + sci_controller_copy_sata_response(&ireq->stp.rsp, frame_header, frame_buffer);
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.