Bug 146457

Summary: Toshiba Tecra M2: panic after long APM suspend in cpufreq_resume -> set_rtc_mmss
Product: [Fedora] Fedora Reporter: Jesse Glick <typrase>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-03-06 16:53:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jesse Glick 2005-01-28 14:59:08 UTC
Description of problem:

I have a Toshiba Tecra M2 laptop running Fedora Core 3 (all updates
applied). I use APM, not ACPI, since I need suspend to RAM to work.
Running apm -s does work to suspend the laptop and if it is turned
back on again within a short time (a few minutes? an hour? I'm not
sure), Linux resumes correctly. However if the machine is left in
suspend mode for a longer time, it does not resume correctly and I
have to cold boot it.

Normally it just comes up with a blank screen and no apparent activity
after a resume failure, but this time I suspended when the console was
active, rather than X, and I saw a kernel panic displayed when I
resumed, so I copied down most of it.

Version-Release number of selected component (if applicable):

kernel-2.6.10-1.741_FC3, apmd-3.0.2-24

How reproducible:

Always - for me, that is.

Steps to Reproduce:
1. Suspend machine with apm -s.
2. Leave suspended for a while.
3. Resume.
  
Actual results:

Screen blank, machine unresponsive, Caps Lock light flashing.

Expected results:

Regular resume, as after a short time in suspension.

Additional info:

I have the cpufreq daemon running, which normally seems to work,
though I seem to recall error messages on console even after a working
resume.

Upon first boot after the crash, dmesg reads:

---%<---
Linux version 2.6.10-1.741_FC3 (bhcompile.redhat.com) (gcc
version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 Thu Jan 13 16:38:22
EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 00000000000eee00 (reserved)
 BIOS-e820: 00000000000eee00 - 00000000000ef000 (ACPI NVS)
 BIOS-e820: 00000000000ef000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003ffc0000 (usable)
 BIOS-e820: 000000003ffc0000 - 000000003ffd0000 (reserved)
 BIOS-e820: 000000003ffd0000 - 000000003ffe0000 (ACPI data)
 BIOS-e820: 000000003ffe0000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000feda0000 - 00000000fedc0000 (reserved)
 BIOS-e820: 00000000ffb00000 - 00000000ffc00000 (reserved)
 BIOS-e820: 00000000ffe80000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
On node 0 totalpages: 262080
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 225280 pages, LIFO batch:16
  HighMem zone: 32704 pages, LIFO batch:7
DMI 2.3 present.
Built 1 zonelists
Kernel command line: ro root=LABEL=/ quiet apm=on acpi=off
Initializing CPU#0
CPU 0 irqstacks, hard=c03cb000 soft=c03ca000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 1696.137 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1034808k/1048320k available (2011k kernel code, 12696k
reserved, 653k data, 164k init, 130816k highmem)
Checking if this processor honours the WP bit even in supervisor
mode... Ok.
Calibrating delay loop... 3358.72 BogoMIPS (lpj=1679360)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: a7e9f9bf 00000000 00000000 00000000
CPU: After vendor identify, caps:  a7e9f9bf 00000000 00000000 00000000
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 1024K
CPU: After all inits, caps:        a7e9f1bf 00000000 00000000 00000040
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel(R) Pentium(R) M processor 1700MHz stepping 05
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
checking if image is initramfs... it is
Freeing initrd memory: 386k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfd67c, last bus=5
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20041105
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: ACPI disable
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/24cc] at 0000:00:1f.0
PCI: IRQ 0 for device 0000:02:0b.0 doesn't match PIRQ mask - try
pci=usepirqmask
PCI: Found IRQ 11 for device 0000:02:0b.0
PCI: Sharing IRQ 11 with 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:02:0d.0
PCI: IRQ 0 for device 0000:02:0b.1 doesn't match PIRQ mask - try
pci=usepirqmask
PCI: Found IRQ 11 for device 0000:02:0b.1
PCI: Sharing IRQ 11 with 0000:00:1d.1
apm: BIOS version 1.2 Flags 0x02 (Driver version 1.16ac)
audit: initializing netlink socket (disabled)
audit(1106922626.058:0): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key 613E889161DD0D7B
- User ID: Red Hat, Inc. (Kernel Module GPG key)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected an Intel 855PM Chipset.
agpgart: Maximum main memory to use for agp memory: 941M
agpgart: AGP aperture is 256M @ 0xe0000000
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 76 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PCI: Found IRQ 11 for device 0000:00:1f.6
PCI: Sharing IRQ 11 with 0000:00:1f.5
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
divert: not allocating divert_blk for non-ethernet device lo
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
ICH4: IDE controller at PCI slot 0000:00:1f.1
PCI: Found IRQ 11 for device 0000:00:1f.1
PCI: Sharing IRQ 11 with 0000:00:1d.2
PCI: Sharing IRQ 11 with 0000:02:05.0
ICH4: chipset revision 3
ICH4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: TOSHIBA MK6026GAX, ATA DISK drive
elevator: using anticipatory as default io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
hda: max request size: 128KiB
hda: 117210240 sectors (60011 MB), CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: PS/2 Generic Mouse on isa0060/serio1
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 2048 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 37449)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
speedstep-centrino: obtaining ACPI data failed
Freeing unused kernel memory: 164k freed
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: hda6: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 384182
ext3_orphan_cleanup: deleting unreferenced inode 384025
ext3_orphan_cleanup: deleting unreferenced inode 1212731
ext3_orphan_cleanup: deleting unreferenced inode 1212730
EXT3-fs: hda6: 4 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
floppy0: no floppy controllers found
ath_hal: module license 'Proprietary' taints kernel.
ath_hal: 0.9.12.14 (AR5210, AR5211, AR5212)
wlan: 0.8.4.5 (EXPERIMENTAL)
ath_rate_onoe: 1.0
ath_pci: 0.9.4.12 (EXPERIMENTAL)
PCI: Found IRQ 11 for device 0000:02:05.0
PCI: Sharing IRQ 11 with 0000:00:1d.2
PCI: Sharing IRQ 11 with 0000:00:1f.1
divert: allocating divert_blk for ath0
ath0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
24Mbps 36Mbps 48Mbps 54Mbps
ath0: turbo rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
ath0: mac 5.6 phy 4.1 5ghz radio 1.7 2ghz radio 2.3
ath0: 802.11 address: 00:90:96:c8:ea:8f
ath0: Use hw queue 0 for WME_AC_BE traffic
ath0: Use hw queue 1 for WME_AC_BK traffic
ath0: Use hw queue 2 for WME_AC_VI traffic
ath0: Use hw queue 3 for WME_AC_VO traffic
ath0: Atheros 5212: mem=0xfcef0000, irq=11
Intel(R) PRO/1000 Network Driver - version 5.5.4-k2-NAPI
Copyright (c) 1999-2004 Intel Corporation.
PCI: Found IRQ 11 for device 0000:02:09.0
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Found IRQ 11 for device 0000:00:1f.5
PCI: Sharing IRQ 11 with 0000:00:1f.6
PCI: Setting latency timer of device 0000:00:1f.5 to 64
intel8x0_measure_ac97_clock: measured 49266 usecs
intel8x0: clocking to 48000
PCI: Found IRQ 11 for device 0000:00:1f.6
PCI: Sharing IRQ 11 with 0000:00:1f.5
PCI: Setting latency timer of device 0000:00:1f.6 to 64
hw_random hardware driver 1.0.0 loaded
PCI: Found IRQ 11 for device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: EHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: irq 11, pci mem 0xfcfffc00
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: USB 2.0 initialized, EHCI 1.00, driver 26 Oct 2004
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
USB Universal Host Controller Interface driver v2.2
PCI: Found IRQ 11 for device 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:02:0b.0
PCI: Sharing IRQ 11 with 0000:02:0d.0
uhci_hcd 0000:00:1d.0: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: irq 11, io base 0xefe0
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
PCI: Found IRQ 11 for device 0000:00:1d.1
PCI: Sharing IRQ 11 with 0000:02:0b.1
uhci_hcd 0000:00:1d.1: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: irq 11, io base 0xef80
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
PCI: Found IRQ 11 for device 0000:00:1d.2
PCI: Sharing IRQ 11 with 0000:00:1f.1
PCI: Sharing IRQ 11 with 0000:02:05.0
uhci_hcd 0000:00:1d.2: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: irq 11, io base 0xef60
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Linux Kernel Card Services
  options:  [pci] [cardbus] [pm]
PCI: Found IRQ 11 for device 0000:02:0b.0
PCI: Sharing IRQ 11 with 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:02:0d.0
Yenta: CardBus bridge found at 0000:02:0b.0 [1179:0001]
Yenta: ISA IRQ mask 0x00b8, PCI irq 11
Socket status: 30000007
PCI: Found IRQ 11 for device 0000:02:0b.1
PCI: Sharing IRQ 11 with 0000:00:1d.1
Yenta: CardBus bridge found at 0000:02:0b.1 [1179:0001]
Yenta: ISA IRQ mask 0x00b8, PCI irq 11
Socket status: 30000007
ieee1394: Initialized config rom entry `ip1394'
ohci1394: $Rev: 1223 $ Ben Collins <bcollins>
PCI: Found IRQ 11 for device 0000:02:07.0
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[11] 
MMIO=[fceef800-fceeffff]  Max Packet=[2048]
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00003900005d0982]
EXT3 FS on hda6, internal journal
device-mapper: 4.3.0-ioctl (2004-09-30) initialised: dm-devel
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ReiserFS: hda3: found reiserfs format "3.6" with standard journal
ReiserFS: hda3: using ordered data mode
ReiserFS: hda3: journal params: device hda3, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: hda3: checking transaction log (hda3)
ReiserFS: hda3: Using r5 hash to sort names
NTFS driver 2.1.22 [Flags: R/W MODULE].
NTFS volume version 3.1.
Adding 1124540k swap on /dev/hda2.  Priority:-1 extents:1
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (8190 buckets, 65520 max) - 360 bytes per
conntrack
cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x170-0x177 0x1e0-0x1e7
0x370-0x377 0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.
divert: not allocating divert_blk for non-ethernet device cipsec0
Cisco Systems VPN Client Version 4.0.4 (B) kernel module loaded
---%<---

I wrote down most of the stack trace, I hope enough (skipped some hex
numbers for less-interesting-looking parts):

---%<---
Badness in i8042_panic_blink at drivers/input/serio/i8042.c:914
[<c0230c97>] i8042_panic_blink+0x124/0x1f2
[<c011ac50>] panic+0xed/0x138
[<c010734e>] set_rtc_mmss+0x6c/0x204
[<c0107600>] timer_interrupt+0x177/0x199
             handle_IRQ_event
             __do_IRQ
             do_IRQ
=============================
             common_interrupt
[<c0107722>] get_cmos_time+0x9a/0x267
             cpufreq_cpu_put
             cpufreq_resume
             timer_resume
             sysdev_resume
             device_power_up
[<c0112d65>] suspend+0x38f/0x4c8
             do_ioctl
             recalc_task_prio
             schedule
             sys_ioctl
             syscall_call
---%<---

Comment 1 Matthew Saltzman 2005-02-07 15:43:39 UTC
This looks a whole lot like #144415.

Comment 2 Jesse Glick 2005-03-06 16:53:39 UTC
Yes, it does indeed, thanks for finding this!

Still reproducible on same machine using stock kernel-2.6.10-1.766_FC3.

*** This bug has been marked as a duplicate of 144415 ***

Comment 3 Jesse Glick 2005-03-06 16:59:15 UTC
Added note: I reproduced it today after unmounting all non-root FSs, unloading
all possible kernel modules, and killing off most noncritical daemons (incl.
cpuspeed and apmd but leaving X running in a GDM chooser), so I don't guess the
problem is specific to some buggy running software outside the kernel.

Comment 4 Jesse Glick 2005-03-06 17:27:57 UTC
Note: appears that the root problem is something to do with the spin lock
rtc_lock in arch/i386/kernel/time.c being misused, though I am not sure why this
would be - it is used only from set_rtc_mmss() (cause of error here) and
get_cmos_time(), both of which lock and unlock it with no other control flow.

That said, there looks to be a secondary buglet in the panic handling:

i8042.c:914 is in fact the i8042_panic_blink method and seems to be the line

	while (i8042_read_status() & I8042_STR_IBF)
		DELAY; // <-- HERE

Apparently we are in_irq() at this point (see delay.h:32; confirmed by the stack
trace) and mdelay() does not want to be called inside an IRQ. Meaning, I guess,
that calling panic() from inside an IRQ was not a good idea, or that
i8042_panic_blink is wrong to be using DELAY in this way, or something.