Bug 312871

Summary:

System exception when recovering from sleep [firewire_ohci]

Product:

[Fedora] Fedora

Reporter:

Ignacio Cárdenas <iakynet>

Component:

kernel

Assignee:

Jarod Wilson <jarod>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

low

Version:

CC:

chris.brown, stefan-r-rhbz

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

powerpc

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-01-26 14:46:19 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
fw-ohci: PPC PMac platform code	none
fw-ohci: PPC PMac platform code	none

Description Ignacio Cárdenas 2007-09-30 09:03:48 UTC

Hello.

Description of problem:
When I take to sleep my PowerBook G4, closing the lid (with pmud installed), 
and then open the lid, it appears this exception:

Vector: 200 (Machine Check) at [d96cfd30]
       pc: f20cb03c: software_reset+0x3c/0x80 [firewire_ohci]
        lr: f20cb38c: ohci_enable+0x2c/0x318 [firewire_ohci]
       sp: d96cfde0
current = 0xdcddd590
       pid = 3053, comm=pmud
WARNING: exception is not recoverable, can't continue
enter ? for help
[d96cfdf0] f20cb38c ohci_enable+0x2c/0x318 [firewire_ohci]
[d96cfe10] c014b940 pci_device_resume+0x38/0x7c
[d96cfe20] c01e1bf4 resume_device+0x70/0x10c
[d96cfe40] c01e1ce8 dpm_resume+0x58/0xa8
[d96cfe60] c01e1d70 device_resume+0x38/0x54
[d96cfe80] c005bd14 suspend_devices_and_enter+0xc8/0xe4
[d96cfea0] c005be70 enter_state+0x140/0x1d4
[d96cfec0] c01e9394 pmu_ioctl+0x9c/0x21c
[d96cfed0] c00ad95c do_ioctl+0x6c/0x84
[d96cfee0] c00add44 vfs_ioctl+0x3d0/0x404
[d96cff10] c00adde0 sys_ioctl+0x68/0x98
[d96cff40] c0012d14 ret_from_syscall+0x0/0x38
--- Exception: c00 (System Call) at 0ff44c98
SP (7ff4a770) is in userspace
mon>_

Version-Release number of selected component (if applicable):
I think this is a kernel problem (with the firewire_ohci module)... So my 
kernel version is 2.6.23-0.214.rc8.git2.fc8 in f8test2.

My computer:
processor       : 0
cpu             : 7455, altivec supported
clock           : 867.000000MHz
revision        : 0.2 (pvr 8001 0302)
bogomips        : 86.51
timebase        : 33331438
platform        : PowerMac
machine         : PowerBook3,5
motherboard     : PowerBook3,5 MacRISC2 MacRISC Power Macintosh
detected as     : 80 (PowerBook Titanium IV)
pmac flags      : 0000001b
L2 cache        : 256K unified
pmac-generation : NewWorld

How reproducible:
Closing the lid, with the 'firewire_ohci' module loaded, and then resuming.
  
Actual results:
Ugly and not recoverable exception :-)

Workaround:
Removing the 'firewire_ohci' module before go to sleep solves the problem.

Additional info:
This bug is similar to one I filled in OpenSuse 10.2, which was solved with a 
kernel update. It was a problem with the firewire module, and the warkaround 
was also to remove it.

https://bugzilla.novell.com/show_bug.cgi?id=227404

And this is all... If you need more info, please ask for it.

Regards,
Ignacio.

Comment 1 Stefan Richter 2007-10-03 23:19:26 UTC

Looks like we need to copy all the "#ifdef CONFIG_PPC_PMAC"/"#endif" blocks from
ohci1394 to firewire-ohci.  I'll try to post a patch at the weekend.

Comment 2 Stefan Richter 2008-01-03 23:18:12 UTC

Re comment #1:
I was held up, then forgot about this, then remembered but was distracted
again... Will try to post something here RSN.

Comment 3 Christopher Brown 2008-02-03 22:30:37 UTC

(In reply to comment #2)
> Re comment #1:
> I was held up, then forgot about this, then remembered but was distracted
> again... Will try to post something here RSN.

RSN eh Stefan? :)

Okay, re-assigning to Jarod as per triage page...

Comment 4 Stefan Richter 2008-02-25 10:24:29 UTC

Jarod, can you point Ignacio to the latest and greatest kernel package to test?
 There was another suspend/resume bug fixed lately and I would like to know
whether this bug here is really platform specific or not.

Comment 5 Jarod Wilson 2008-02-25 14:30:35 UTC

Ignacio, can you still reproduce this problem with the latest kernel in Fedora 8
updates-testing? You should be able to install kernel 2.6.24.2-7.fc8 from there,
by simply running:

# yum --enablerepo=updates-testing upgrade kernel

Comment 6 Ignacio Cárdenas 2008-02-26 09:05:06 UTC

Hello.

I tried with the kernel from updates-testing (2.6.24.2-7.fc8), but the problem
is still the same. The workaround is also the same.

Comment 7 Jarod Wilson 2008-02-28 03:46:28 UTC

Okay, finally got around to doing a few suspend/resume cycles on my own
powerbook. Works just fine with firewire modules loaded on 2.6.23.15, 2.6.24.2
and 2.6.25-rc3-git1, so it looks like a very hardware-specific bug. Ignacio, can
you provide the output of:

lspci -v

lspci -v -n

(can trim that to just the parts for the FireWire controller).

Particularly curious to find out if its the device ID 0x0018 UniNorth controller...

Comment 8 Jarod Wilson 2008-02-28 03:49:17 UTC

Just for the record, my powerbook is a c.2004 15" Aluminum, 1.67GHz G4 with an
Apple UniNorth 2 (rev 81) FireWire controller (which appears to be a
Lucent/Agere FW323 under the covers).

Comment 9 Ignacio Cárdenas 2008-02-28 20:36:34 UTC

Well, this is the output of the "lspci -v" command (only the firewire and 
UniNorth related parts):

000:00:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 AGP
        Flags: bus master, 66MHz, medium devsel, latency 16
        Capabilities: [80] AGP version 1.0
        Kernel driver in use: agpgart-uninorth

0001:10:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 PCI
        Flags: bus master, 66MHz, medium devsel, latency 16

0002:24:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 Internal PCI
        Flags: bus master, 66MHz, medium devsel, latency 16

0002:24:0e.0 FireWire (IEEE 1394): Agere Systems FW323 (prog-if 10 [OHCI])
        Subsystem: Agere Systems FW323
        Flags: medium devsel, IRQ 40
        Memory at f5000000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [44] Power Management version 2
        Kernel modules: firewire-ohci

And the same for "lspci -v -n":

0000:00:0b.0 0600: 106b:002d
        Flags: bus master, 66MHz, medium devsel, latency 16
        Capabilities: [80] AGP version 1.0
        Kernel driver in use: agpgart-uninorth

0001:10:0b.0 0600: 106b:002e
        Flags: bus master, 66MHz, medium devsel, latency 16

0002:24:0e.0 0c00: 11c1:5811 (prog-if 10 [OHCI])
        Subsystem: 11c1:5811
        Flags: medium devsel, IRQ 40
        Memory at f5000000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [44] Power Management version 2
        Kernel modules: firewire-ohci

Comment 10 Jarod Wilson 2008-02-28 21:53:23 UTC

So not the known-goofy controller, that's apparently found in the Pismo
PowerBook G3. However, on the bright side, I think I found a PowerBook G4 here
in the office w/the same chipset as you, so I'll see if I can reproduce the problem.

Comment 11 Stefan Richter 2008-02-29 09:42:24 UTC

Created attachment 296318 [details]
fw-ohci: PPC PMac platform code

I am attaching an untested patch which adds all of ohci1394's PPC_PMAC platform
feature calls to firewire-ohci.  Jarod, if you don't find out something else on
the PPC machines available to you, could you validate that this doesn't add
runtime regressions to PPC_PMAC, and prepare a test package for Ignacio?

Comment 12 Stefan Richter 2008-03-01 01:55:19 UTC

Created attachment 296439 [details]
fw-ohci: PPC PMac platform code

Previous patch was bogus, didn't compile.

This one compiles and runs OK and is definitely necessary --- hopefully also
sufficient --- to fix machine check exceptions on PPC PMac/ PBook.

Comment 13 Jarod Wilson 2008-03-01 19:33:51 UTC

I was able to reproduce the panic-on-resume with a PowerBook G4/667, which is
(according to /proc/cpuinfo) a 3rd-gen Titanium, with the same devices as
Ignacio lists in comment #9, and have verified the patch in comment #12 does
indeed resolve the problem.

Comment 14 Jarod Wilson 2008-03-02 06:09:53 UTC

Patch added to rawhide, building in koji right now:

http://koji.fedoraproject.org/koji/taskinfo?taskID=483352

Ignacio, if you'd be so kind, please give that build a try once its done to
verify it fixes suspend/resume on your end as well.

Comment 15 Jarod Wilson 2008-03-03 14:52:08 UTC

Hrm, it seems installing rawhide kernels is requiring more and more supporting
rawhide bits these days. Understandable if you'd rather wait for a Fedora 8
kernel w/this patch. Pretty sure this will fix your suspend/resume issues though.

Comment 16 Ignacio Cárdenas 2008-03-03 15:48:59 UTC

I installed yesterday the kernel version 2.6.25-0.81.rc3.git2.fc9. It requires
some dependencies from rawhide... but enabling the "experimental" repo yum
resolved successfully all dependencies.

Now, I have two news, one good and one bad. The good news are that it solves the
suspend/resume problem in almost all the cases. The bad news are that it solves
the problem in _almost_ all the cases. I noticed two resume exceptions after the
kernel installation... but most of the tests I did works fine, and I don't know
how to reproduce it.

I will try more suspend/resume tests this night, to see if I can reproduce the
problem. Maybe can you try some test also in your tiBook III? (do something,
suspend, resume, and repeat).

Anyway, my current situation is much better than before. Thank you for the help :-)

Comment 17 Jarod Wilson 2008-03-03 22:01:36 UTC

I've done a couple of suspend/resume iterations on the tibook III now, and it
has successfully resumed every time so far. What exactly were the nature of your
resume failures? Did you have to hard-reset the system, or were they less severe
(i.e., annoying spew that may have de-stabilized something, but still let you
try to cleanly reboot). Also, did you have any sort of peripherals connected?
(such as some firewire devices).

Comment 18 Ignacio Cárdenas 2008-03-03 22:49:37 UTC

The failure was the same I had at the begining of the thread: system exception 
and hard-reset needed. And do not have any peripheral connected.

While I'm writing this text I'm testing some more suspend/resume cicles, and 
right now I've reproduce the problem! Is almost the same trace at the first 
comment on this thread, but slightly different (smaller):

Vector: 300 (Data Access) at [eed97d80]
    pc: f20889c4: ohci_enable+0x2f8/0x3f0 [firewire_ohci]
    pr: f20889d4: ohci_enable+0x208/0x3f0 [firewire_ohci]
    sp: eed97e30
   msr: 200b032
   dar: 0
 dsisr: 40000000
  current = 0xef19f020
    pid   = 1807, comm = pmud
enter ? for help
[eed97e50] c015b220 pci_device_resume+0x38/0x80
[eed97e50] c01e7e98 device_resume+0x94/0x1f8
[eed97e50] c006209c suspend_devices_and_enter+0x164/0x19c
[eed97e50] c0062254 enter_state+0x138/0x1b0
[eed97e50] c01ef6dc pmu_ioctl+0x78/0x1d4
[eed97e50] c00bd418 vfs_ioctl+0x68/0x80
[eed97e50] c00bd7ec do_vfs_ioctl+0x3bc/0x3f4
[eed97e50] c00bd87c sys_ioctl+0x58/0x88
[eed97e50] c0012ae4 ret_from_syscall+0x0/0x38
--- Exception: c00 (System Call) at 0ff09798
SP (bf9106c0) is in userspace
mon>_

These are the steps that I have followed:

 - Reboot the system.
 - Waiting while starting KDM
 - Close the lid before entering user or password.
 - Wait some seconds...
 - Open the lid.

This do not happends all the times I try it, but it's the second time I see 
this exception following this steps (after the kernel update)... So this is 
the most reproducible way I know at the moment... It seems also that, if the 
first resume works fine, then there is no problem in the rest of the session: 
I mean, resume only fails the first time after boot the system (when it 
fails).

Comment 19 Jarod Wilson 2008-03-04 05:30:39 UTC

Ah, I'd not tried rebooting between any suspend/resume cycles. I'll have to try
again with some reboots mixed in. Also, fwiw, kernel-2.6.24.3-17.fc8 is
currently building in koji, and carries this fix (and then some) as well.

Comment 20 Jarod Wilson 2008-03-14 19:19:06 UTC

So I actually did try a good number of suspend/resume cycles last week,
intermixed with ten or so reboots, and never hit the system exception problem. I
haven't opened the lid on this thing in about a week, and when I did just now...
There's the exception. Huh. My trace looks nearly identical, but PID is pm-pmu
instead of pmud and some of the addresses are a bit different, but same call chain.

Comment 21 Bug Zapper 2008-11-26 07:53:18 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Bug Zapper 2009-01-09 07:17:30 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 23 Jarod Wilson 2009-01-09 18:42:29 UTC

Needs to be retested w/a current F10 kernel (or rawhide).

Comment 24 Ignacio Cárdenas 2009-01-25 19:32:31 UTC

Hi all.

I have been trying Fedora 10 for some weeks and it seems that there are no suspend/resume problems anymore. The laptop is the same as in my original post... so I guess the problem is solved. IMO, the bug can be closed.

Thank you and regards,
Ignacio.

Comment 25 Jarod Wilson 2009-01-26 14:46:19 UTC

Ignacio,

Excellent, glad to hear it. There have been a number of assorted race condition fixes that have gone into the firewire stack in the past few months, I'd wager one of them had a positive effect here. :)