Bug 312871
Summary: | System exception when recovering from sleep [firewire_ohci] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ignacio Cárdenas <iakynet> | ||||||
Component: | kernel | Assignee: | Jarod Wilson <jarod> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 10 | CC: | chris.brown, stefan-r-rhbz | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | powerpc | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-01-26 14:46:19 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Ignacio Cárdenas
2007-09-30 09:03:48 UTC
Looks like we need to copy all the "#ifdef CONFIG_PPC_PMAC"/"#endif" blocks from ohci1394 to firewire-ohci. I'll try to post a patch at the weekend. Re comment #1: I was held up, then forgot about this, then remembered but was distracted again... Will try to post something here RSN. (In reply to comment #2) > Re comment #1: > I was held up, then forgot about this, then remembered but was distracted > again... Will try to post something here RSN. RSN eh Stefan? :) Okay, re-assigning to Jarod as per triage page... Jarod, can you point Ignacio to the latest and greatest kernel package to test? There was another suspend/resume bug fixed lately and I would like to know whether this bug here is really platform specific or not. Ignacio, can you still reproduce this problem with the latest kernel in Fedora 8 updates-testing? You should be able to install kernel 2.6.24.2-7.fc8 from there, by simply running: # yum --enablerepo=updates-testing upgrade kernel Hello. I tried with the kernel from updates-testing (2.6.24.2-7.fc8), but the problem is still the same. The workaround is also the same. Okay, finally got around to doing a few suspend/resume cycles on my own powerbook. Works just fine with firewire modules loaded on 2.6.23.15, 2.6.24.2 and 2.6.25-rc3-git1, so it looks like a very hardware-specific bug. Ignacio, can you provide the output of: lspci -v lspci -v -n (can trim that to just the parts for the FireWire controller). Particularly curious to find out if its the device ID 0x0018 UniNorth controller... Just for the record, my powerbook is a c.2004 15" Aluminum, 1.67GHz G4 with an Apple UniNorth 2 (rev 81) FireWire controller (which appears to be a Lucent/Agere FW323 under the covers). Well, this is the output of the "lspci -v" command (only the firewire and UniNorth related parts): 000:00:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 AGP Flags: bus master, 66MHz, medium devsel, latency 16 Capabilities: [80] AGP version 1.0 Kernel driver in use: agpgart-uninorth 0001:10:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 PCI Flags: bus master, 66MHz, medium devsel, latency 16 0002:24:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 Internal PCI Flags: bus master, 66MHz, medium devsel, latency 16 0002:24:0e.0 FireWire (IEEE 1394): Agere Systems FW323 (prog-if 10 [OHCI]) Subsystem: Agere Systems FW323 Flags: medium devsel, IRQ 40 Memory at f5000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Kernel modules: firewire-ohci And the same for "lspci -v -n": 0000:00:0b.0 0600: 106b:002d Flags: bus master, 66MHz, medium devsel, latency 16 Capabilities: [80] AGP version 1.0 Kernel driver in use: agpgart-uninorth 0001:10:0b.0 0600: 106b:002e Flags: bus master, 66MHz, medium devsel, latency 16 0002:24:0e.0 0c00: 11c1:5811 (prog-if 10 [OHCI]) Subsystem: 11c1:5811 Flags: medium devsel, IRQ 40 Memory at f5000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Kernel modules: firewire-ohci So not the known-goofy controller, that's apparently found in the Pismo PowerBook G3. However, on the bright side, I think I found a PowerBook G4 here in the office w/the same chipset as you, so I'll see if I can reproduce the problem. Created attachment 296318 [details]
fw-ohci: PPC PMac platform code
I am attaching an untested patch which adds all of ohci1394's PPC_PMAC platform
feature calls to firewire-ohci. Jarod, if you don't find out something else on
the PPC machines available to you, could you validate that this doesn't add
runtime regressions to PPC_PMAC, and prepare a test package for Ignacio?
Created attachment 296439 [details]
fw-ohci: PPC PMac platform code
Previous patch was bogus, didn't compile.
This one compiles and runs OK and is definitely necessary --- hopefully also
sufficient --- to fix machine check exceptions on PPC PMac/ PBook.
I was able to reproduce the panic-on-resume with a PowerBook G4/667, which is (according to /proc/cpuinfo) a 3rd-gen Titanium, with the same devices as Ignacio lists in comment #9, and have verified the patch in comment #12 does indeed resolve the problem. Patch added to rawhide, building in koji right now: http://koji.fedoraproject.org/koji/taskinfo?taskID=483352 Ignacio, if you'd be so kind, please give that build a try once its done to verify it fixes suspend/resume on your end as well. Hrm, it seems installing rawhide kernels is requiring more and more supporting rawhide bits these days. Understandable if you'd rather wait for a Fedora 8 kernel w/this patch. Pretty sure this will fix your suspend/resume issues though. I installed yesterday the kernel version 2.6.25-0.81.rc3.git2.fc9. It requires some dependencies from rawhide... but enabling the "experimental" repo yum resolved successfully all dependencies. Now, I have two news, one good and one bad. The good news are that it solves the suspend/resume problem in almost all the cases. The bad news are that it solves the problem in _almost_ all the cases. I noticed two resume exceptions after the kernel installation... but most of the tests I did works fine, and I don't know how to reproduce it. I will try more suspend/resume tests this night, to see if I can reproduce the problem. Maybe can you try some test also in your tiBook III? (do something, suspend, resume, and repeat). Anyway, my current situation is much better than before. Thank you for the help :-) I've done a couple of suspend/resume iterations on the tibook III now, and it has successfully resumed every time so far. What exactly were the nature of your resume failures? Did you have to hard-reset the system, or were they less severe (i.e., annoying spew that may have de-stabilized something, but still let you try to cleanly reboot). Also, did you have any sort of peripherals connected? (such as some firewire devices). The failure was the same I had at the begining of the thread: system exception and hard-reset needed. And do not have any peripheral connected. While I'm writing this text I'm testing some more suspend/resume cicles, and right now I've reproduce the problem! Is almost the same trace at the first comment on this thread, but slightly different (smaller): Vector: 300 (Data Access) at [eed97d80] pc: f20889c4: ohci_enable+0x2f8/0x3f0 [firewire_ohci] pr: f20889d4: ohci_enable+0x208/0x3f0 [firewire_ohci] sp: eed97e30 msr: 200b032 dar: 0 dsisr: 40000000 current = 0xef19f020 pid = 1807, comm = pmud enter ? for help [eed97e50] c015b220 pci_device_resume+0x38/0x80 [eed97e50] c01e7e98 device_resume+0x94/0x1f8 [eed97e50] c006209c suspend_devices_and_enter+0x164/0x19c [eed97e50] c0062254 enter_state+0x138/0x1b0 [eed97e50] c01ef6dc pmu_ioctl+0x78/0x1d4 [eed97e50] c00bd418 vfs_ioctl+0x68/0x80 [eed97e50] c00bd7ec do_vfs_ioctl+0x3bc/0x3f4 [eed97e50] c00bd87c sys_ioctl+0x58/0x88 [eed97e50] c0012ae4 ret_from_syscall+0x0/0x38 --- Exception: c00 (System Call) at 0ff09798 SP (bf9106c0) is in userspace mon>_ These are the steps that I have followed: - Reboot the system. - Waiting while starting KDM - Close the lid before entering user or password. - Wait some seconds... - Open the lid. This do not happends all the times I try it, but it's the second time I see this exception following this steps (after the kernel update)... So this is the most reproducible way I know at the moment... It seems also that, if the first resume works fine, then there is no problem in the rest of the session: I mean, resume only fails the first time after boot the system (when it fails). Ah, I'd not tried rebooting between any suspend/resume cycles. I'll have to try again with some reboots mixed in. Also, fwiw, kernel-2.6.24.3-17.fc8 is currently building in koji, and carries this fix (and then some) as well. So I actually did try a good number of suspend/resume cycles last week, intermixed with ten or so reboots, and never hit the system exception problem. I haven't opened the lid on this thing in about a week, and when I did just now... There's the exception. Huh. My trace looks nearly identical, but PID is pm-pmu instead of pmud and some of the addresses are a bit different, but same call chain. This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Needs to be retested w/a current F10 kernel (or rawhide). Hi all. I have been trying Fedora 10 for some weeks and it seems that there are no suspend/resume problems anymore. The laptop is the same as in my original post... so I guess the problem is solved. IMO, the bug can be closed. Thank you and regards, Ignacio. Ignacio, Excellent, glad to hear it. There have been a number of assorted race condition fixes that have gone into the firewire stack in the past few months, I'd wager one of them had a positive effect here. :) |