Bug 473219 - Regression: Updated sata/acpi/undock causes crashes in undock, suspend, and hibernate at least Lenovo Thinkpad laptops (can be solved with workarounds).
Regression: Updated sata/acpi/undock causes crashes in undock, suspend, and ...
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2008-11-27 01:51 EST by Constantine Gavrilov
Modified: 2009-12-18 01:58 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-12-18 01:58:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
dock/undock event script (2.88 KB, text/plain)
2008-11-27 01:51 EST, Constantine Gavrilov
no flags Details
UDEV rules for dock/undock (103 bytes, text/plain)
2008-11-27 01:53 EST, Constantine Gavrilov
no flags Details
dock/undock helper script for CRT (6.51 KB, text/plain)
2008-11-27 01:54 EST, Constantine Gavrilov
no flags Details
a working dock/undock event script (3.56 KB, text/plain)
2009-02-01 07:01 EST, Constantine Gavrilov
no flags Details
rc.local script that conditionally disables immediate undocking (349 bytes, text/plain)
2009-02-01 07:06 EST, Constantine Gavrilov
no flags Details
updated rc.local script to set undock type and save CDROM SCSI host (1.41 KB, text/plain)
2009-02-17 02:08 EST, Constantine Gavrilov
no flags Details
updated dock/undock event handler (4.33 KB, text/plain)
2009-02-17 02:10 EST, Constantine Gavrilov
no flags Details
suspend/hibernate/resume script -- takes care to reset SCSI host of cdrom device (3.15 KB, text/plain)
2009-02-17 02:12 EST, Constantine Gavrilov
no flags Details
updated suspend/hibernate/resume script (3.15 KB, text/plain)
2009-02-17 02:24 EST, Constantine Gavrilov
no flags Details
updated dock/undock event handler (3.24 KB, text/plain)
2009-02-17 02:36 EST, Constantine Gavrilov
no flags Details

  None (edit)
Description Constantine Gavrilov 2008-11-27 01:51:30 EST
Created attachment 324839 [details]
dock/undock event script

AFAIK undock code stopped working as of kernel 2.6.24. Searching the Internet, many other laptops are affected (IBM and Dell). It seems to be a generic Linux kernel problem -- I could not see undock work on a number of distributions (Ubuntu, Fedora, Debian) that use a kernel above 2.6.23.

I have first seen this in Fedora 9 and did not use Fedora 9 because of that.

Fedora 7 ( kernel works without a glitch). I have written a script that I hooked to udev and it is run on dock/undock events. The script (attached) disables/enables CDROM/DVD and enables/disables CRT.

I want to stress that works fine in all situations and combinations with respect to power on state (it does not matter if laptop is docked or undocked when I power it up), hotswap dock/undock, and hibernation (that is I can hibernate undocked and power on docked and it still gets dock event). I have been using it for half a year without a single glitch, regularly (each day) docking/undocking and hibernating multiple times.

I have tried fedora 10 recently. What happens if that I press the undock button, in default kernel configuration, the laptop locks up immediately. If I specify the immediate_undock=N parameter to the docking driver, the undock event is handled, and the script that disables DVD/CRT works properly as before. 
However, most of the times when I dock, the dock event is not generated. Even when the dock event is generated, the USB ports on the docking station (keyboard and mice) are not reconnected (CDROM/DVD and CRT though work, thanks to the docking script).

When undocking with  immediate_undock=N, the flashing green led never stops to blink (even if my undock script has finished running). With, the led has stopped to blink shortly after pressing the undock button.

I have also tried writing to /sys/devices/platform/dock.0/undock file from the undock script before exiting. This, however, seems to  generate undock event in a loop and the script is called again and again.
Comment 1 Constantine Gavrilov 2008-11-27 01:53:02 EST
Created attachment 324840 [details]
UDEV rules for dock/undock
Comment 2 Constantine Gavrilov 2008-11-27 01:54:28 EST
Created attachment 324841 [details]
dock/undock helper script for CRT
Comment 3 Constantine Gavrilov 2008-11-27 02:06:01 EST
I was concerned about stability of laptop features in fedora 10, but I wanted the new intel driver (to support DRI and suspend/resume). Optionally, I also wanted to test the new kernel (better wireless, wireless leds, etc.) So, I selectively upgraded the pieces (to Fedora 10 versions) to try just the new kernel and X

* pciaccess, drm, mesa, Xorg and drivers
* mkinitrd, initscripts, upstart, udev, hal,  util-linux-ng SysVinit-tools alsa mdadm
* some other minot dependencies

This has allowed me to stay with Fedora 7 while testing Fedora 10 kernel and X.

While this is not a complete Fedora 10 system, I do not think it matters, judging from the Internet posts.

Again, everything works fine with Fedora 10 components and Fedora 7 kernel.
Comment 4 Constantine Gavrilov 2008-11-27 02:11:29 EST
After seemingly successful dock/undock operations  (either just undocking, or undocking and then docking) resuming after suspend does not work (black screen with no reaction to keyboard presses).

Same works fine with Fedoara 7 kernel and the same Xorg binaries.
Comment 5 Constantine Gavrilov 2009-02-01 06:56:20 EST
I was able to find a workaround. immediate_undock=N parameter must be used and undocking script must write to /sys/devices/platform/dock.0/undock file before exiting. This will generate a new undock event, and the next invocation of undocking script must take care not to do anything except returning 0 status.

Failure to write to to /sys/devices/platform/dock.0/undock  file will not complete undock properly and following dockings will result in not-operational HW as was previously described. 

On the other hand, failure to take care not to execute the undocking script on the following undocking events will result in a stuck system -- undock script will run again and again.

I wonder why this undocking behavior/requirement is not documented.

I also wonder what happened with Linux undock/hotplug after kernel 2.6.23 that immediate undocking has stopped working with IBM/Lenovo thinkpads?

On my Thinkpad X61, ata_generic and pata_acpi drivers are used (not ahci) because the way the BIOS is configured. Maybe the changes are due to SATA hotplug and AHCI driver would work? I will continue testing.

It is still a regression bug, since machine locks up and it used not to.

Attached new dock/undock script makes dock/undock work again. It checks whether immediate_undock=N is set and uses lock files with time stamp to decide whether it is a first run or second "spurious" run.

Tested with kernel

Attached is also rc.local script that sets immediate_undock=N parameter depending on the kernel version.
Comment 6 Constantine Gavrilov 2009-02-01 07:01:49 EST
Created attachment 330546 [details]
a working dock/undock event script
Comment 7 Constantine Gavrilov 2009-02-01 07:06:14 EST
Created attachment 330547 [details]
rc.local script that conditionally disables immediate undocking
Comment 8 Constantine Gavrilov 2009-02-08 10:25:39 EST
Correction to the comment #5 -- the built-in ata_piix driver is used by the kernel when immediate_undock=Y does not work.

I have set the BIOS to ahci mode. This caused the kernel to choose a built-in ahci driver instead. With AHCI driver, both immediate_undock=Y and immediate_undock=N work.

So, the regression is probably due to "hotplug ACPI" sata thing. If ata_piix driver is used, we get a crash with immediate_undock=Y, which is a default.
Comment 9 Constantine Gavrilov 2009-02-17 02:06:28 EST
Additional regression problem with undock is that suspend and hibernate stop working after undock. Suspend locks machine after resume and hibernate locks machine while doing the hibrentate.

This happens unless a scsi host rescan is performed before hibernation on the host to which CDROM was connected before undocking. This happens both for immediate undocking (when kernel detaches cdrom before udev is called) and for non-immediate undocking, when the acpi callback powers off cdrom and detaches it (and ultrabay if necesssary) in a proper way.

Hibernation and suspend used to work without a hitch after undock at least with kernel, so it is a regression.

Attached are updated scripts that solve the hibernate and suspend problems after undock.

rc.local -- we save what the CDROM host is if we boot docked. kernel version and use of ahci driver set either immediate or non-immediate undocking.

dock/undock handler -- we support both immediate and non-immediate undocking. For non-immediate undocking, we try to power off the device and we detach it. We also try to save the CDROM scsi host before undock and after dock.

suspend/hibernate/resume acpi handler (new): a sample script to suspend/hibernate/resume -- rescans the scsi host of CDROM device before suspend/hibernate.
Comment 10 Constantine Gavrilov 2009-02-17 02:08:33 EST
Created attachment 332182 [details]
updated rc.local script to set undock type and save CDROM SCSI host
Comment 11 Constantine Gavrilov 2009-02-17 02:10:01 EST
Created attachment 332183 [details]
updated dock/undock event handler
Comment 12 Constantine Gavrilov 2009-02-17 02:12:29 EST
Created attachment 332184 [details]
suspend/hibernate/resume script -- takes care to reset SCSI host of cdrom device
Comment 13 Constantine Gavrilov 2009-02-17 02:24:49 EST
Created attachment 332185 [details]
updated suspend/hibernate/resume script
Comment 14 Constantine Gavrilov 2009-02-17 02:25:51 EST
in suspend/hibernate/resume script we call to reset_cdrom_device() before doing hibernate or suspend.

The script relied on an additional lock/unlock utility to serialize calls (as a workaround against buggy ACPI that generated additional events after resume).

New attached version will run even if utility is not present.
Comment 15 Constantine Gavrilov 2009-02-17 02:32:03 EST
Changed name of the bug to be more descriptive.
Comment 16 Constantine Gavrilov 2009-02-17 02:36:26 EST
Created attachment 332186 [details]
updated dock/undock event handler

Ooops, the previous update was the same version.
Comment 17 Chuck Ebbert 2009-02-17 23:33:20 EST
The docking code has been rewritten in 2.6.29. Can you try a kernel from koji? You will have to manually download the kernel and kernel-firmware packages and install them with rpm:

Comment 18 Constantine Gavrilov 2009-02-23 01:45:03 EST
I have already tested  kernel-PAE-2.6.29-0.99.rc4.git1.fc11.i686.rpm from rawhide.

It has the same problems.

It seems the problem is not in the undock code but in the (s)ata hotplug layer.

We have two problems here:

* immediate undock locks up machine if ata_piix is used
  (bios is in non-ahci mode)
* removal of device with 
  echo 1 > /sys/class/scsi_device/${DEV}/device/delete
  followed by physical device removal will cause hibernate
  to lock if scsi host was not rescanned after physical device removal

While ata_piix may be fixed by latest updates to work with immediate undocking, I believe the second problem is not related to docking. The fact that it happens with both drivers (ahci and ata_piix) and with both immediate undocking (when kernel removes the device and callbacks do not even see it) and with non-immediate undocking (when callbacks have a chance to power off the device and remove it) and the fact that hibernate locks up in "core" code (as far as I can see) indicate a generic problem with sata subsystem. It seems that something causes suspend handlers to lock up after device was removed and scsi host was not rescanned. I do believe it stopped working after hotplug sata support. It is probably easy to fix, too.

If you think I am wrong, and there is a specific 2.6.29 kernel that fixes it, I am willing to try it. I have tried multiple times koji updates in the past looking for a solution, so it should not be a problem.
Comment 19 Daniel Gnoutcheff 2009-06-10 15:11:56 EDT
Constantine Gavrilov, I suspect that you are encountering this bug:

Comment 20 Bug Zapper 2009-11-18 02:59:26 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
Comment 21 Bug Zapper 2009-12-18 01:58:08 EST
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.