Bug 98849 - (ACPI) ACPI oops on ThinkPad T30, T40
(ACPI) ACPI oops on ThinkPad T30, T40
Status: CLOSED RAWHIDE
Product: Red Hat Linux Beta
Classification: Retired
Component: kernel (Show other bugs)
alpha 3
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Garzik
Brian Brock
:
: 100667 102581 (view as bug list)
Depends On:
Blocks: CambridgeBlocker
  Show dependency treegraph
 
Reported: 2003-07-09 11:16 EDT by Bill Nottingham
Modified: 2014-03-16 22:37 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-10-21 16:11:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
look, it blew up (12.60 KB, text/plain)
2003-07-09 11:17 EDT, Bill Nottingham
no flags Details
lspci output (5.61 KB, text/plain)
2003-07-09 11:32 EDT, Bill Nottingham
no flags Details
acpidmp output (210.52 KB, text/plain)
2003-07-11 14:56 EDT, Bill Nottingham
no flags Details
another oops (2.88 KB, text/plain)
2003-07-16 21:49 EDT, Bill Nottingham
no flags Details
patch for the bug (2.01 KB, patch)
2003-07-31 21:37 EDT, shaohua li
no flags Details | Diff

  None (edit)
Description Bill Nottingham 2003-07-09 11:16:53 EDT
Description of problem:

Load ac module. Kaboom.

Similarly for some of the other modules (processor, battery.)
Comment 1 Bill Nottingham 2003-07-09 11:17:17 EDT
Created attachment 92833 [details]
look, it blew up
Comment 2 Bill Nottingham 2003-07-09 11:32:19 EDT
Created attachment 92834 [details]
lspci output
Comment 3 Bill Nottingham 2003-07-09 11:33:04 EDT
SMBIOS 2.33 present.
DMI 0.0 present.
56 structures occupying 1943 bytes.
DMI table at 0x000E0010.
Handle 0x0000
        DMI type 0, 20 bytes.
        BIOS Information Block
                Vendor: IBM
                Version: 1RET32WW (1.03 )
                Release: 03/04/2003
                BIOS base: 0xDC000
                ROM size: 960K
                Capabilities:
                        Flags: 0x000000007D09DF80
Comment 4 Bill Nottingham 2003-07-09 12:44:54 EDT
I upgraded the BIOS to:

SMBIOS 2.33 present.
DMI 0.0 present.
56 structures occupying 1943 bytes.
DMI table at 0x000E0010.
Handle 0x0000
        DMI type 0, 20 bytes.
        BIOS Information Block
                Vendor: IBM
                Version: 1RET34WW (1.05 )
                Release: 05/15/2003
                BIOS base: 0xDC000
                ROM size: 960K
                Capabilities:
                        Flags: 0x000000007D09DF80

ACPI behaves similarly in regards to loading the modules (still oopses) with the
added benefit that if you run /sbin/hwclock, the machine locks up. (hwclock runs
fine with acpi=off)
Comment 5 Len Brown 2003-07-11 12:49:47 EDT
From the 1st attachment:

ACPI-0165: *** Warning: The ACPI AML in your computer contains errors, please 
nag the manufacturer to correct it.
ACPI-0168: *** Warning: Allowing relaxed access to fields; turn on 
CONFIG_ACPI_DEBUG for details.

Okay, we're out on a limb on this box...

ACPI: Embedded Controller [EC] (gpe 28)
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed 
[\_SB_.PCI0.LPC_.EC__.PUBS._STA] (Node dff55884), AE_TIME
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed 
[\_SB_.PCI0.LPC_.EC__.BAT0._STA] (Node dff5435c), AE_TIME
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed 
[\_SB_.PCI0.LPC_.EC__.BAT1._STA] (Node dff5453c), AE_TIME

Looking not good.  Possible that the later crash is related to this failure.  
Maybe we should take evasive action when fed methods that don't run?

Bill, which IBM laptop is this?  I'm wondering if it is one that UnitedLinux 
already blacklisted.  Can you attach the DSDT?

Comment 6 Bill Nottingham 2003-07-11 14:53:24 EDT
IBM T40p is the laptop. I'll attach the DSDT at some point when I boot back into
ACPI.
Comment 7 Bill Nottingham 2003-07-11 14:56:37 EDT
Created attachment 92886 [details]
acpidmp output
Comment 8 Bill Nottingham 2003-07-11 15:17:35 EDT
2.5.75 oopses as well, but does *not* have the bad interactions with hwclock.

I'm 99% sure that these modules (processor, ac, etc.) worked in a previous 2.5
release (2.5.5x? 2.5.6x?)
Comment 9 Andy Grover 2003-07-11 16:39:34 EDT
cool, we have a T40p so we should be able to duplicate this.

I might need help with hwclock issues, though.
Comment 10 Bill Nottingham 2003-07-16 21:49:22 EDT
Created attachment 92974 [details]
another oops

This just happened in the course of normal use... since it was keventd, it took
the keyboard with it.
Comment 11 Len Brown 2003-07-21 13:03:29 EDT
Where can we get a copy of the kernel that is failing?
dmesg shows 2.4.21-1.2023, including ACPICA 20030522, built 7/7/2003 --
which is much newer than Cambridge alpha-3.
Comment 12 Bill Nottingham 2003-07-21 13:16:29 EDT
Cambridge B1 has something slightly newer than that.
Comment 13 Andy Grover 2003-07-24 20:03:05 EDT
I can duplicate this on my T40 (2373-72U). Yeah, not good. I didn't get an oops
but I got a hang by pressing the power button. There are a lot of symptoms
listed in this bug, but I'm going to start with the EC errors on boot. Upgrade
to BIOS 1.07 didn't fix anything in this area. I have a msg to the FreeBSD ACPI
folks too -- maybe they can shed some light.

Comment 14 shaohua li 2003-07-30 23:38:49 EDT
I duplicate it in my T40, and find oops emerge only when ACPI_DEBUG option is 
on. Ec's errors were generated because read/write on EC's operation region is 
wrong. Accurately to say, parameter 'handler_context'(I call it context below) 
of routine 'acpi_ec_space_handler' is wrong, which leads to read/write on EC's 
operation region fails. If the context is correct(In my test, I use a 
temporary var to get the goal), everything is ok. Incorrect context only 
emerge after removing an operation region's handler and reinstalling it. So, I 
guess routine 'acpi_remove_address_space_handler' has some bugs.
Comment 15 shaohua li 2003-07-31 21:37:48 EDT
Created attachment 93319 [details]
patch for the bug

I made a patch. With this patch, kernel can boot correctly under my T40. All
ACPI modules can be loaded without error. '/proc/acpi' contains correct info.
Maybe this is what we need.
Comment 16 Andy Grover 2003-08-01 01:48:10 EDT
great debugging!

So, it doesn't look like your patch fixes acpi_remove_address_space_handler. 
So is it just a workaround? How much trouble would a real fix be? I'm on 
vacation right now (HAHA) so I can't look at the code but if it's just some 
incorrect dereferencing then we should fix it.

Comment 17 Len Brown 2003-08-06 00:19:27 EDT
*** Bug 100667 has been marked as a duplicate of this bug. ***
Comment 18 Len Brown 2003-08-21 20:51:14 EDT
*** Bug 102581 has been marked as a duplicate of this bug. ***
Comment 19 Robert de Rooy 2003-08-23 22:37:41 EDT
As requested, I tried the patch on my T30.

Good news all round.
I no longer get the EC errors when booting, and the system no longer panics on halt.
In effect the patch solves the bug I reported (102581)
Comment 20 Andy Grover 2003-08-24 08:02:50 EDT
looks to me like the evregion.c part of the patch has no effect. Can you 
verify that the modifications to ec.c are succifient to fix things? Thanks.

BTW when we apply this to ec.c we will want to fix the whitespace.
Comment 21 Robert de Rooy 2003-08-24 15:25:49 EDT
with just the ec.c patch I again get these errors on boot

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Embedded Controller [EC] (gpe 28)
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed
[\_SB_.PCI0.LPC_.EC__.PUBS._STA] (Node f7ff560c), AE_TIME
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed
[\_SB_.PCI0.LPC_.EC__.BAT0._STA] (Node f7ff5ecc), AE_TIME
    ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME
    ACPI-1121: *** Error: Method execution failed
[\_SB_.PCI0.LPC_.EC__.BAT1._STA] (Node f7ff429c), AE_TIME
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGP_._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
PCI: Probing PCI hardware
ACPI: PCI Interrupt Link [LNKF] enabled at IRQ 10
ACPI: PCI Interrupt Link [LNKG] enabled at IRQ 9
ACPI: PCI Interrupt Link [LNKH] enabled at IRQ 5
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'

I have not yet tried to do a halt, but I suspect it will panic again.
Comment 22 Robert de Rooy 2003-08-24 15:30:35 EDT
confirmed, I get the panic on halt again. If you want the logs, let me know and
I will hook up the serial termain again to capture the output like I did before.
Comment 23 Thomas M Steenholdt 2003-09-05 02:56:55 EDT
FYI/A
Problem hasn't changed at all on my IBM ThinkPad-T30 with the latest rawhide
kernel 2.4.22-20.1.2024.2.36.nptl

let me know if you need any logs or need me to try something out!
Comment 24 Bill Nottingham 2003-09-05 11:02:11 EDT
I don't think we've integrated that patch yet, as that patch has yet to be
integrated into upstream ACPI code.
Comment 25 Andy Grover 2003-09-05 18:06:23 EDT
I must be experiencing code-blindness :) Can someone take the time to tell me 
how the evregion.c part of the patch fixes things for people, as it clearly 
does but I don't see why. Either I have a ** or a * which I then take the 
address of when I use it - should be the same, eh?
Comment 26 shaohua li 2003-09-05 23:41:31 EDT
in file evregion.c: acpi_ev_detach_region
  line 397: region_context = region_obj2->extra.region_context;
  line 452:status = region_setup (region_obj, 
ACPI_REGION_DEACTIVATE,           handler_obj->address_space.context, 
&region_context);
   these codes seem to change 'region_obj2->extra.region_context', 
but 'region_context '  was defined as 'void *'. so these codes can do nothing. 
In my patch, I just let it do the right thing, that is, codes can 
change  'region_obj2->extra.region_context'.
Comment 27 Bill Nottingham 2003-10-21 16:11:00 EDT
Works for me now with .208x or later.
Comment 28 Thomas M Steenholdt 2003-10-21 17:05:54 EDT
Well, ACPI was completely removed from the kernel, right(at least for these
machines or something)... My acpid does no longer start but apmd does. The
system doesn't crash though and thats nice!
If I didn't miss something completely, I wouldn't say that circumventing a
problem means it's resolved. Could someone please comment on this???
Comment 29 Bill Nottingham 2003-10-21 17:24:00 EDT
ACPI is included, it just must be explicitly enabled with acpi=on. The fix for
the crash *is* included in the latest ACPI code.
Comment 30 Thomas M Steenholdt 2003-10-22 01:20:46 EDT
Ahh, I see, thanks for clearing that up for me!

Note You need to log in before you can comment on or make changes to this bug.