Bug 169201 - SATA drives fail on laptop suspend (patch included)
SATA drives fail on laptop suspend (patch included)
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Garzik
Brian Brock
:
: 161712 162112 171524 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-09-24 13:24 EDT by Dana Canfield
Modified: 2014-01-21 17:52 EST (History)
18 users (show)

See Also:
Fixed In Version: FC5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-16 23:13:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dana Canfield 2005-09-24 13:24:35 EDT
Description of problem:

Various thinkpads with SATA controllers (actually, they have PATA hard drives
with PATA to SATA adapters, aparently) cannot recover from suspend to RAM due to
 lack of power management support in SATA.  There is discussion at
http://marc.theaimsgroup.com/?l=linux-kernel&m=111504542402455&w=2
and a patch available at
http://shamrock.dyndns.org/~ln/linux/sata_pm.2.6.13-rc5.diff

I would like to nominate this for inclusion in the fedora kernels. 
Unfortunately, I have no kernel experience so I have no idea what the
ramifications are.  The corresponding patch for .12 applied cleanly to the
fedora update kernels.

I assume this patch would be helpful to all users of SATA devices, and not just
thinkpad users.  I do not know if this patch is on the roadmap for upstream
inclusion nor why not if it isn't.

Thanks
Comment 1 Dave Jones 2005-09-30 02:10:01 EDT
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.
Comment 2 Dave Jones 2005-11-05 01:49:23 EST
*** Bug 161712 has been marked as a duplicate of this bug. ***
Comment 3 redhat-bugs2eran 2005-11-05 07:52:30 EST
This page summarizes the problem and tracks updated patches:

http://thinkwiki.org/wiki/Problems_with_SATA_and_Linux#Hang_on_resume_from_suspend_to_RAM
Comment 4 Dana Canfield 2005-11-09 00:19:36 EST
Quoting from Bug 161712:
"To clarify: on the ThinkPad T43, by default 2.6.13-1.1526_FC4 uses the IDE
driver to handle the disk (making it /dev/hda) so resume works because the SATA
system is not involved. But if I add "hda=noprobe" then the SATA system takes
over like in previous kernel (and the disk is /dev/sda as before), and in this
case resume is broken."

Does anyone know if there is something in particular that triggers the "old" IDE
driver instead of the SATA system.  I would be content with using IDE as a
workaround to get suspend working, but under kernel 1532 my ThinkPad T43 still
uses /dev/sda.  I'm curious what the difference is between our two systems.
Comment 5 Dave Jones 2005-11-09 01:36:40 EST
Dana, that bug was fixed in 2.6.13-1.1531_FC4 and all newer kernels.

It's also totally unrelated to this bug.
Comment 6 Dave Jones 2005-11-10 14:07:58 EST
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.
Comment 7 Paul W. Frields 2005-11-16 19:57:18 EST
I have a ThinkPad T43 (2687-D3U) with the SATA controller in question.  The
behavior still occurs with kernel 2.6.14-1.1637_FC4.  
Comment 8 Paul W. Frields 2005-11-16 21:08:45 EST
FWIW, I applied the patch referenced in the link in comment #3 above to the
2.6.14-1.1637_FC4 kernel (all hunks succeeded cleanly), and ACPI sleep now works
as desired.
Comment 9 Paul W. Frields 2005-12-03 09:58:05 EST
Amending comment #8, the same problem occurs in the 2.6.14-1.1644_FC4 kernel,
but the 2.6.14 patch referenced in comment #3 applies cleanly, and fixes the
SATA resume behavior in that kernel as well.
Comment 10 Paul W. Frields 2005-12-16 12:12:12 EST
I notice the reporter has still not provided any additional info.  (Dana, are
you there?)  At any rate, the same problem occurs for me in the
2.6.14-1.1653_FC4 kernel.  The aforementioned patch from comment #3 still
applies cleanly, and still fixes the SATA resume behavior, FWIW.  Can I perform
any additional testing to help nail down this bug?
Comment 11 Dana Canfield 2005-12-16 12:24:44 EST
Last time I visited this bug, the option to change the status wasn't there.  Now
that it's back, I'll switch it back to Assigned.  I don't know what else we can
do as end users except beg.  Everything I've read about the patch implies the
following:

- The patch is stable, but not the "best" way of doing things.
- The best way requires a lot of work in the SCSI code, which is on nobody's
immediate to-do list.  
- Linus OK'ed the first two points and asked someone to submit the patch.
- Two alternate patches then appeared.
- The conversation seems to have been dropped again at that point.

That said, the patch attached here seems to be the one "all the other"
distributions are using and doesn't seem to be causing trouble.  There is also a
note in the Thinkwiki that 2.6.15 fixes some of the linux SATA problems.  But,
it doesn't say what fixes are specifically included and I lack the skill to
figure it out myself.
Comment 12 Paul W. Frields 2005-12-16 12:42:35 EST
The fact that this bug is now blocking FCMETA_SATA is a good sign.  All
sycophancy aside, I'm sure Jeff et al. are working to make sure that when this
gets fixed, it gets fixed properly, which is almost always better than just
being fixed quickly.  I'm also certain he or others will comment when there's
something useful to say.
Comment 13 jonathan baron 2005-12-17 11:30:38 EST
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171524(In reply to comment #4)
> Quoting from Bug 161712:
> "To clarify: on the ThinkPad T43, by default 2.6.13-1.1526_FC4 uses the IDE
> driver to handle the disk (making it /dev/hda) so resume works because the SATA
> system is not involved. But if I add "hda=noprobe" then the SATA system takes
> over like in previous kernel (and the disk is /dev/sda as before), and in this
> case resume is broken."

I am using the 1.1526_FC4 kernel on a T43, and resume works, and it fails
with all subsequent kernels up to 1.1653, but I see no evidence of the disk
being /dev/sda, unless I'm looking at the wrong thing.  If I say df, it is
/dev/mapper/VolGroup00-LogVol00
                      35740376   7340592  26554948  22% /
which is just what it usually is.  Still, the story you tell makes sense.
It fits with what I report in bug 171524, which now seems totally redundant.
Comment 14 Paul W. Frields 2005-12-17 13:36:29 EST
*** Bug 171524 has been marked as a duplicate of this bug. ***
Comment 15 cam 2005-12-17 13:53:42 EST
Another data point.

On the Dell Inspiron 6000 the suspend/resume fails with kernel-2.6.14-1.1653_FC4

Having patched the kernel with a patch found through the thinkwiki site
(http://lkml.org/lkml/2005/9/23/97), ACPI suspend and resume now works.

The hardware is Intel 915PM and 128Mb Radeon X300 graphics. Dell BIOS A09 and
the latest fglrx driver (fglrx_6_8_0-8.20.8-1) are also significant.

-Cam
Comment 16 Paul W. Frields 2005-12-17 14:08:13 EST
Re: comment #13 - If you run:

su - -c 'dmsetup table'

...you'll see the device fields in the fourth position after the mapped label
(3:1, for example, would be major=3, minor=1, or /dev/hda1).  Alternately, you
could look at /proc/partitions.

Re: comment #15 - Looks like, from a quick scout, the Inspiron 6000 also has the
ICH6 SATA controller in question.  Does your 'lspci' agree?
Comment 17 Dave Jones 2005-12-24 15:38:29 EST
*** Bug 162112 has been marked as a duplicate of this bug. ***
Comment 18 Dave Jones 2006-02-03 01:32:11 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 19 jonathan baron 2006-02-03 05:32:22 EST
Still doesn't work on the IBM T43.  Took longer to crash
than previous versions.  I could use "ls" repeatedly after
resume, but no other commands I tried worked ("w" for
example).  I could also switch viewports with metacity
for a while.
Comment 20 Dana Canfield 2006-02-03 08:11:18 EST
I can't confirm one way or the other as I can no longer get video to restore on
my T43.  Doesn't matter if I suspend while X is running or not.  If I
suspend/resume at a non-x VT prompt and then blind-type after a resume, the hard
drive acts as though it's back up and running and (ls will run the drive for a
while, shutdown flashes things as I might expect, etc).  Beyond that I can't say
one way or another without video. :-)

 
Comment 21 Bertil Askelid 2006-02-04 15:01:28 EST
Dell Inspiron 9300 w/ Fujitsu MHV2100AH 100 GByte disk:

ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xBFA0 irq 14
ata1: dev 0 cfg 49:2b00 82:346b 83:5b29 84:6003 85:346b 86:9a09 87:6003 88:203f
ata1: dev 0 ATA-6, max UDMA/100, 195371568 sectors: LBA
ata1(0): applying bridge limits
ata1: dev 0 configured for UDMA/100
scsi0 : ata_piix
  Vendor: ATA       Model: FUJITSU MHV2100A  Rev: 0000
  Type:   Direct-Access                      ANSI SCSI revision: 05

Suspends to RAM, but resumes with disk lamp constantly lit and no disk access
allowed. Screen comes back in X allright, mouse moves, keyboard OK. Tried it on 
the latest 2.6.15-1.1830_FC4 kernel.
Comment 22 Dana Canfield 2006-02-23 14:27:30 EST
As of FC5T3, this works for me.  My video suspend issue was resolved by
commenting out DRI in the xorg.conf, and with that I can confirm that suspend
and resume works repeatedly with my T43 SATA-based laptop.

I'm marking the ticket as resolved, but there could potentially be issues with
other laptops, so others may want to weigh in if appropriate. 
Comment 23 Axel Thimm 2006-02-23 17:56:01 EST
Please reopen this bug. "Closed rawhide" means the intention is to have this fix
for rawhide and future releases implying that there will be no backport fix for
FC4. Dave will probably not care looking at closed bugs during the next FC4
kernel update.
Comment 24 Dana Canfield 2006-02-25 19:43:50 EST
Re-opening per Axel's request, but my understanding is that this is a mainline
kernel patch, so it should appear in FC4 whenever the next version bump happens
there.
Comment 25 Nick Lamb 2006-03-26 14:49:45 EST
I'm running FC5 w/ release kernel kernel-2.6.15-1.2054_FC5 on a (Thinkpad) Z60m

The above commentary is a little confusing, but I get the impression that SATA
is supposed to survive suspend and restore with this kernel + hardware
combination. If so, I have bad news.

About 50% of the time restore appears to work, but actually I/O is failing with
SATA (libata) errors. The laptop is then useless until I reboot.

Should I file a new bug, or track it here?
Comment 26 Klaus Weidner 2006-03-28 19:41:16 EST
I can confirm that the bug isn't fixed, or there's a new bug with similar
symptoms. On my Thinkpad T60, the SATA disk is always dead when waking up:

        ata1: handling error/timeout
        ata1: port reset, ...
        ata1: status=0x50 { DriveReady SeekComplete}

How to reproduce: suspend using Fn-F4 button, or "suspend" Gnome menu entry,
both behave the same

I've tried the 2.6.15-1.2054_FC5smp kernel shipped with FC5, and also
2.6.16-1.2088_FC6smp from rawhide.

Comment 27 Klaus Weidner 2006-03-29 01:03:11 EST
See also bug #183138, the drive appears to require special ACPI power on treatment.
Comment 28 Nick Lamb 2006-04-03 19:38:33 EDT
Updated kernel to 2.6.16-1.2080_FC5

Situation after attemping to restore is the same, or possibly worse. Sometimes I
see "kernel: journal commit I/O error" as a console message and almost always
dmesg shows endless I/O errors such as..

end_request: I/O error, dev sda, sector 121135224
sd 0:0:0:0: SCSI error return code: 0x4000

(copied from paper notes, the disk is no longer writable when the error happens)

There seem to be patches on the web that just turn up some or all of the
timeouts in the SATA layer. That's not ideal, but it's definitely better than
losing all my work, are any of these patches included in FC5 ? Will they be?
Comment 29 Klaus Weidner 2006-04-04 00:18:34 EDT
Here's a proposed patch by Greg KH:
https://bugzilla.novell.com/show_bug.cgi?id=162090
Comment 30 Jon Escombe 2006-04-13 13:24:21 EDT
I can also report that suspend/resume isn't working on T60 (AHCI) with
2.6.15-1.2080_FC5smp kernel. I don't think the patch in comment 29 is relevant,
as MSI isn't enabled in FC kernel?

I have applied the AHCI suspend/resume patches from the suse development tree,
and this fixes suspend to RAM, but suspend to disk still fails on resume.. (A
full suse kernel works for both ram & disk so I'm obviously missing something)...

For reference, suspend/resume is still working (for me) on a T43 (ATA_PIIX) with
2.6.15-1.2080_FC5 kernel.
Comment 31 Erik Webb 2006-06-06 11:11:06 EDT
I've added an identical bug report for RHEL4, since we have systems that need
the 5-year support window. If anyone has something to add to a RHEL4 version of
this bug, please help me out. Thanks.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193586
Comment 32 Nick Lamb 2006-07-19 14:42:29 EDT
Updated to 2.6.17-1.2157_FC5

Problem is gone or much less frequent (if there are no follow-ups from me in the
next few days assume gone) on this Thinkpad Z60m.

I don't know if this a result of improvements in the mainline (thanks Linus?) or
a vendor patch between 2080 and 2157. If you're reading this and are responsible
for the fix, thanks.

Can one of the T60 owners tell us whether this works for them too?
Comment 33 Dave Jones 2006-09-16 21:40:21 EDT
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.
Comment 34 jonathan baron 2006-09-16 21:56:49 EDT
This has been fixed for some time, at least on the IBM T43.

Comment 35 Dave Jones 2006-09-16 23:13:28 EDT
Ok, if anyone else who experienced this problem is still seeing it in
FC5+updates, please file a new bug.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.