Created attachment 346009 [details] smartctl -a /dev/sda output ater a long test was run for over 50 minutes Description of problem: A long (full) test run with "smartctl -t long /dev/sda" on a Seagate 7200.12 ST31000528AS CC34 hangs showing 90% of the test remaining, even after 24 hours. The hang is at least somewhat drive-specific since 3 older Seagate drives complete their tests in the expected 1-5 hours. Version-Release number of selected component (if applicable): libatasmart.x86_64 0.12-3.fc11 smartmontools.x86_64 1:5.38-11.fc11 kernel.x86_64 2.6.29.4-167.fc11 How reproducible: always Steps to Reproduce: 1. smartctl -t long /dev/sda 2. wait a few hours 3. smartctl -a /dev/sda notice that the test still says "90% remaining" Actual results: 90% remaining shows after 24 hours. Expected results: 0% remaining after <4 hours Additional info: similar reports: freebsd: http://www.mail-archive.com/freebsd-hackers@freebsd.org/msg67741.html debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=503439 The debian reports indicates it may be related to ahci and that the tests run normally under the standalone seatools freedos-based smart tests. Note the attached file shows a long test after 50 / 200 minutes. It should be displaying at most 80% remaining.
hi, do you have scheduled any tests in smartd.conf or somewhere? Does short test works for you? Is this disk under heavy load or mostly idle? Can you try to test this disk and keep it idle? (use smartctl -X /dev/sda before test)
I have observed this with both scheduled tests and tests started from the command line. I do have scheduled tests in /etc/smartd.conf. I first noticed the problem when after 24 hrs the next scheduled test was aborted because the previous test didn't complete. The disk is under very light load, but I can't make it totally idle since it is the / filesystem disk (eg rootfs). The disk light on the computer case rarely flashes. Short tests at first also hung, but they seem to be finishing now. I don't know what changed. after 48 hours, this is what it now shows. (lifetime hours has incremented, but we are still at 90% remaining) SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Self-test routine in progress 90% 329 - # 2 Conveyance offline Completed without error 00% 278 - # 3 Short offline Completed without error 00% 278 - # 4 Conveyance offline Aborted by host 90% 277 - # 5 Short offline Aborted by host 90% 273 - # 6 Extended offline Aborted by host 90% 273 - # 7 Extended offline Interrupted (host reset) 90% 252 - # 8 Extended offline Interrupted (host reset) 90% 219 - # 9 Extended offline Interrupted (host reset) 90% 164 - #10 Extended offline Interrupted (host reset) 90% 48 - I've aborted the tests that have run for 48 hrs with smartctl -X /dev/sda and started a new test as requested.
I don't know what are you using this computer for, but is it possible to do smart test in (almost) idle or it's not possible? If the system is idle, it doesn't matter if root partition is on this disk.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
See also Wolfgang Rupprecht's thread on the Seagate Forum: http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=13219&view=by_date_ascending&page=1 Another user is experiencing this in Gentoo. Rupprecht: we have to stop meeting like this (in bugzilla).
So, to be clear: > The debian reports indicates it may be related to ahci and that > the tests run normally under the standalone seatools freedos-based > smart tests. did you personally tried seatools or smartctl with disabled ahci? Have you tried this disk in different machine or different disk in this machine?
> did you personally tried seatools or smartctl with disabled ahci? No. I'm not experiencing this problem: I don't even have one of these drives (Seagate 7200.12 series). I'm just trying to help figure out what is going on.
(In reply to comment #7) > > did you personally tried seatools or smartctl with disabled ahci? > > No. I'm not experiencing this problem: I don't even have one of these drives > (Seagate 7200.12 series). I'm just trying to help figure out what is going on. ok thanks, but this questions was targeted to wolfgang :)
Just to re-iterate, the computer is my main machine, www, smtp and nfs server. Taking it single-user for a 5 hour test is going to be painful. The disk is formatted ext4 and the machine has 8 Gigs of memory, so things tend to be answered out of memory and disk writes only happen very infrequently. I have to watch for a very long time to even see the disk light flash once. For all practical purposes it looks like the disk is 99 percent idle. I don't think it is normal disk IO activity that is interfering with the test.
can you answer my questions from comment #6 ? Thanks
I thought I did answer it. No. I did not and don't want to take the computer off-line to run seatools for 5 hrs. And no, that includes taking the disk out of the computer.
(In reply to comment #11) > I thought I did answer it. No. I did not and don't want to take the computer > off-line to run seatools for 5 hrs. And no, that includes taking the disk out > of the computer. Are you able to test another disk in this machine?
Created attachment 350568 [details] Output of smartctl /dev/sdb -a I appear to be hitting this bug as well. I started an extended self-test using palimpsest, and after several hours, the test had not complete. Running "smartctl /dev/sdb -l selftest" showed 90% of the test remaining, even after several hours running. Performing the long self-test using SeaTools completed in about 1h15m. Device Model: ST3500410AS Firmware Version: CC34 This is running on a Dell Inspiron 530. This is one of two drives that form part of a RAID-1 mirror (if that makes a difference). Let me know if there's any other testing I can do.
(In reply to comment #12) > (In reply to comment #11) > > I thought I did answer it. No. I did not and don't want to take the computer > > off-line to run seatools for 5 hrs. And no, that includes taking the disk out > > of the computer. > > Are you able to test another disk in this machine? Yes, all 3 other drives (older seagates) complete the tests as expected (within the test duration estimates the drives give.) Here are the model numbers/firmware version: ST31500341AS SD3B ST3750640AS 3.AAE ST3250824AS 3.AAD
Created attachment 350572 [details] Output of smartctl /dev/sda -a Interesting...looks like a long test on an almost identical drive (7200.12 but different Model No. and different Firmware Rev) is about to complete (only 10% remaining). Device Model: ST3500418AS Firmware Version: CC44
(In reply to comment #13) > Created an attachment (id=350568) [details] > Output of smartctl /dev/sdb -a > > I appear to be hitting this bug as well. I started an extended self-test using > palimpsest, and after several hours, the test had not complete. thats another point for not smartmontools fault. Palimpsest does not depend on smartmontools. > Running > "smartctl /dev/sdb -l selftest" showed 90% of the test remaining, even after > several hours running. > Performing the long self-test using SeaTools completed in about 1h15m. I guess it's firmware related and SeaTools just contain some workaround for this... (In reply to comment #14) > > Are you able to test another disk in this machine? > > Yes, all 3 other drives (older seagates) complete the tests as expected (within > the test duration estimates the drives give.) Here are the model > numbers/firmware version: > ST31500341AS SD3B > ST3750640AS 3.AAE > ST3250824AS 3.AAD and another thing pointing to firmware, since other disks are working it seems it's not mb/driver related SMART commands are quite simple there is not too much space for doing something wrong if other disks/models/firmwares are working (In reply to comment #15) > Interesting...looks like a long test on an almost identical drive (7200.12 but > different Model No. and different Firmware Rev) is about to complete (only 10% > remaining). and another one :) well... I'll try to ask upstream for some information, but I presume there will be no progress. If anyone here with not working disks is willing to test smartmontools cvs snapshot, please let me know.
(In reply to comment #16) > (In reply to comment #13) > > Created an attachment (id=350568) [details] [details] > > Output of smartctl /dev/sdb -a > > > > I appear to be hitting this bug as well. I started an extended > > self-test using palimpsest, and after several hours, the test had > > not complete. > > thats another point for not smartmontools fault. Palimpsest does not > depend on smartmontools. True. However, I've also reproduced the problem using just "smartctl /dev/sdb --test=long". > (In reply to comment #15) > > Interesting...looks like a long test on an almost identical drive > > (7200.12 but different Model No. and different Firmware Rev) is > > about to complete (only 10% remaining). > > and another one :) I think this is the most compelling evidence that this is a firmware issue. I've been searching for ways to upgrade the firmware on the problematic drive, but have been unsuccessful. > If anyone here with not working disks is willing to test smartmontools cvs > snapshot, please let me know. I can do that.
(In reply to comment #17) > > If anyone here with not working disks is willing to test smartmontools cvs > > snapshot, please let me know. > > I can do that. you can find new packages here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1465244
Thanks! Testing now...
No joy. Still several hours and stuck at 90%.
pity, but it was expected. On smartmontools mailing list is related problem (different seagate disk model). Answer is: > This is probably a firmware bug. > > If this disk supports selective self-tests, please try if this type of > test also hangs. > > This tests the first 25GB: > > # smartctl -t select,0-49999999 /dev/ice > > > If that finishes, you can test the next 25GB with: > > # smartctl -t select,50000000-99999999 /dev/ice > > or: > > # smartctl -t select,next /dev/ice original reporter does not replied to this, could you test if this works?
(In reply to comment #21) > original reporter does not replied to this, could you test if this works? I tested, but it didn't work: # smartctl -t select,0-49999999 /dev/sdb smartctl 5.39 2009-07-07 19:28 [x86_64-redhat-linux-gnu] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Selective self-test routine immediately in off-line mode". SPAN STARTING_LBA ENDING_LBA 0 0 49999999 Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful. Testing has begun. [wait several hours] # smartctl -l selftest /dev/sdb smartctl 5.39 2009-07-07 19:28 [x86_64-redhat-linux-gnu] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA # 1 Selective offline Self-test routine in progress 90% 352 -
this was somehow expected, S.M.A.R.T. commands are quite simple so there is not too much space for doing something bad way. Thanks for testing anyway, I'll reply to old email and see if upstream can advice something.
Unfortunately no good news. Nothing useful came from asking upstream nor from searching for any firmware update related information.
I wasn't able to find any info about firmware upgrade on seagete website. this is firmware issue, can't fix
I'm having the same issue with a Seagate Barracuda 7200.9 (ST3250624AS 3.AAE), where the long test gets stuck at 90% remaining. In comment #14, wolfgang rupprecht stated he had this exact same model and firmware version and the long test completed fine for him. maybe firmware is not the issue after all Seagate's site says my drive (based on serial number) does not require firmware update Is there anything you want me to try?
> Is there anything you want me to try? unfortunately there is nothing new what could help with this problem
My brother has seen this with a Hitachi Deskstar drive. Attaching SMART info. Dunno if reproducing it on another drive helps anyone, but here it is. The drive lost some of Windows' important files (this was generated from an Ubuntu CD). The system log isn't more informative.
Created attachment 479190 [details] output from smartctl -a /dev/sda
I believe I have found that this is not a bug in smartmontools and appears to be a drive anomaly that can come and go on its own (without applying updates, cold/hard rebooting, etc). I was experiencing the same problem as Wolfgang Rupprecht and found this bug entry via a Google search. It's certainly a rare but somewhat common problem on many Seagate drives... Out of all the drives I monitored over the years, this is the only one I've ever had with this issue. Suddenly a few weeks ago I noticed in the logwatch that the nightly tests were not completing from the previous night on my Seagate Maxtor DiamondMax 21 Model STM3320620AS. I manually ran the short, conveyance and extended tests one at a time, numerous times, and sure enough they did not complete even if I waited for several days. Last night I then ran selective offline tests, incrementing the range of LBAs to be tested on each run, to see where it hung, as this would help identify the LBA region with the problem. The selective tests stopped somewhere between LBA 4999-49999 with the 90% remaining issue. Today, after trying a couple dozen short tests (all of which hung at 90% remaining), I started the selective tests that I had tried last night over again in smaller increments to try to isolate the LBA region with the problem, and all tests completed successfully, instead of the "90% remaining" that had been the result for all attempted tests. I then tried a short test and it completed in the usual 1-minute. Interesting! I tried the conveyance, it completed in 6 minutes, and then I tried the extended "long" test (it took about 2 hrs) -- all completed successfully! But I changed nothing. So for three weeks all tests got stuck at 90% remaining, yet suddenly after the selective offline tests, all tests subsequently succeeded. I am not sure if the selective offline LBA range tests helped to resolve whatever the issue was or not (some sort of smart "wake up call" for the hard drive?), interesting coincidence. My gut feeling though, is that this is a symptom that is related to the individual drive (some type of drive anomaly) and likely has nothing to do with smartmontools. The drive looks to be very healthy per smart output and test results, but if this "getting stuck at 90% remaining" issue comes back, I'll probably just replace the drive for peace of mind. Hopefully this additional behavioural information is of some use to others wondering what's going on and how to proceed. I recommend a selective offline test using ranges to isolate the problem.. if all succeed, great, move on to short test, and so on. If all is well and the issue never happens again, you may want to keep the drive. If you cannot get the tests to complete afterwards or if the issue resurfaces, drive replacement should be considered. In all cases, backup your data. I'll continue to monitor this one drive and will report back with any other developments.
Based on the update from Joseph Pingenot a few days ago on 2/16/2011, his brother had this issue and lost some data. On that note, this might just be how some Seagate drives flake out when some component is failing. So on that note I am probably going to view this as a "pre failure" symptom and replace the drive in the near future. Yes, my data is backed up and if you're having this issue, you will want to do the same for yourself. In case you're wanting to try the selective offline mode to test your drive in LBA ranges, here is an example (man smartctl for more info): smartctl -t select,0-50000 /dev/sd<X>