With DMAR enabled, the kernel crashes in resume very early (before control is passed back to C code). Passing intel_iommu=off results in everything working fine. What information is needed?
Argh, HP. It'll be crashing in the BIOS -- and before we turn the IOMMU back on after resume. That's kind of confusing. We do turn the IOMMU off before we suspend; I'm not sure how it can even _tell_ that it was enabled. Can you show dmesg as it boots up?
Created attachment 368954 [details] dmesg dmesg from boot on an HP 2530p
This is an EFI boot, but the same happens using BIOS
dmesg with iommu enabled, please?
Also try booting with 'iommu=pt'.
Er, wait. That one _did_ have IOMMU enabled. And it did get back to C code.... and all the way back up. What was the problem, again?
Bizarre. I'm sure I didn't suspend that. Hang on, let me do this again - I've clearly screwed something up.
Created attachment 368962 [details] Actual dmesg Attached the wrong file before
Fails with iommu=pt
Others have seen this bug and allege that it's not actually crashing in the BIOS -- it's crashing in the kernel when it gets back to iommu_enable_translation() on resume. Can you confirm or deny that? A register dump of the entire DMAR would be a useful thing to see, both before and immediately after suspend. What happens if you boot with iommu=pt, and then comment out the call to init_iommu_hw() in iommu_resume(), so it seems to fail? I expect things will go south fairly shortly thereafter, but perhaps it'll come back up a little further?
*** Bug 539861 has been marked as a duplicate of this bug. ***
Would be useful to see the output with this patch applied: http://david.woodhou.se/suspend-hack Both with intel_iommu=igfx_off, and without. For the latter you'll probably need a serial console or USB debug cable, or maybe netconsole.
Created attachment 373772 [details] dmesg with test patches OK, I now have an HP6930p and booted it with a variant of my test patch. Because we have the GFX_WA config option enabled, the graphics device gets given a 1:1 mapping to all of memory. A small hack prevents the kernel from re-enabling the IOMMU which is dedicated to the graphics device on resume. And then everything works -- including the other IOMMUs. So the problem is that the dedicated GFX IOMMU is not properly set up on resume, somehow. The register dump looks sane, but something is wrong. When we set the 'translation enable' bit in the command register and wait for that to be reflected in the status register, the status register never changes. One for the chipset folks...
Hm. If I kexec back into the same kernel after a suspend/resume cycle, it works -- it comes back up OK, and re-initialises the hardware correctly. Which means that it _isn't_ just that the IOMMU hardware is set up wrong. Investigating further...
I can even re-enable translation with a userspace hack on /dev/mem, right after the suspend/resume cycle. I note that when it fails, the screen is still turned off from the suspend. It works after the video has been initialised again. Trying to narrow this down further right now, to prove that this observation is something other than just coincidence.
This excerpt from the above boot log shows the problem: pci 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10a) pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x7111) pci 0000:00:02.0: restoring config space at offset 0x6 (was 0xc, writing 0x4000000c) pci 0000:00:02.0: restoring config space at offset 0x4 (was 0x4, writing 0x58000004) pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900000, writing 0x900403) Before the contents of PCI config space for the graphics device (specifically, the BARs) are restored to their correct state, the IOMMU doesn't seem to function. But as soon as they are restored, everything is fine. If I put a hack into the IOMMU resume code to restore just the BARs (words #4 and #6), it all works fine.
Adding a call to this function into the loop in init_iommu_hw(), which happens on resume, seems to catch and fix the problem. void cantiga_hp_hack(struct dmar_drhd_unit *drhd) { int i; uint32_t mmiobar; for (i = 0; i < drhd->devices_cnt; i++) { if (!drhd->devices[i] || drhd->devices[i]->vendor != 0x8086 || drhd->devices[i]->device != 0x2a42) continue; pci_read_config_dword(drhd->devices[i], PCI_BASE_ADDRESS_0, &mmiobar); if (!(mmiobar & PCI_BASE_ADDRESS_MEM_MASK) && pci_resource_start(drhd->devices[i], 0)) { WARN(1, "BIOS failed to restore BARs for integrated graphics device\n"); pci_write_config_dword(drhd->devices[i], PCI_BASE_ADDRESS_0, pci_resource_start(drhd->devices[i], 0) | mmiobar); } } }
A 2.6.31.6-151.fc12 kernel with this fix is building at http://koji.fedoraproject.org/koji/taskinfo?taskID=1831944 There may be some refinement to come, but this should do the job for now. Please confirm that it fixes the problem for you. Please don't be distracted by bug #540218, which also affects some of the same machines.
I can confirm that in my HP 6730b works ok, resume after suspend using suspend button, or closing the lid.
Confirmed works on Suspend to RAM and Suspend to Disk on HP EliteBook 6930p Thanks
Works fine on my 6930p too.
Thanks for testing. Out of interest, does the F12 kernel take ages to initialise for you on this hardware and run very slowly, repeatedly saying: '[drm] TV-20: set mode NTSC 480i 0'?
(In reply to comment #22) > Thanks for testing. Out of interest, does the F12 kernel take ages to > initialise for you on this hardware and run very slowly, repeatedly saying: > '[drm] TV-20: set mode NTSC 480i 0'? No, I get a couple of this message but it is far from slow ~12-14 sec from grub->gdm and after X is up compiz works just fine (yes GM45 version not the ati one). Using the F13 kernel I noticed something similar (X very slow) I just blamed the drm patches not being in sync with the X driver.
Sorry, I meant F13 (rawhide). Sounds like you're seeing what I'm seeing.
(In reply to comment #24) > Sorry, I meant F13 (rawhide). Sounds like you're seeing what I'm seeing. OK, yeah seems to be the same issue, just confirmed it, is there a bug open for that?
F12 is just fine, not slow at all ... X seems stable, resume and suspend stable tested it a few times in a row to ram and disk, only about 8 480i messages in messages, all seems well right now In fact, this update also fixed 540218 bug for me Let us know if you need anything else
(In reply to comment #26) > F12 is just fine, not slow at all ... X seems stable, resume and suspend stable > tested it a few times in a row to ram and disk, only about 8 480i messages in > messages, all seems well right now > > In fact, this update also fixed 540218 bug for me > > Let us know if you need anything else Yeah F12 is fine, the bug is in F13 (rawhide).
On new kernel 2.6.31.6-151 I can confirm that the display is turned on again after resume. Well this would work for me, but maybe this kernel warning could help you to improve your excellent work. Feel free to contact me if I you need more information. x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Back to C! ------------[ cut here ]------------ WARNING: at drivers/pci/intel-iommu.c:3098 cantiga_gfx_bar_enable+0x6f/0xa4() (Not tainted) Hardware name: HP EliteBook 2530p BIOS failed to restore BARs for integrated graphics device; fixing... Modules linked in: aes_i586 aes_generic vfat fat rfcomm sco bridge stp llc bnep l2cap sunrpc coretemp ipv6 cpufreq_ondemand acpi_cpufreq fuse dm_multipath uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec mmc_block snd_hwdep firewire_ohci snd_seq firewire_core snd_seq_device sdhci_pci snd_pcm sdhci btusb crc_itu_t mmc_core bluetooth ricoh_mmc snd_timer snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support arc4 ecb joydev uvcvideo videodev e1000e tpm_infineon v4l1_compat wmi iwlagn iwlcore hp_accel lis3lv02d serio_raw input_polldev mac80211 cfg80211 rfkill pata_acpi ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 3024, comm: pm-suspend Not tainted 2.6.31.6-151.fc12.i686 #1 Call Trace: [<c0436d8b>] warn_slowpath_common+0x70/0x87 [<c05b8a08>] ? cantiga_gfx_bar_enable+0x6f/0xa4 [<c0436de0>] warn_slowpath_fmt+0x29/0x2c [<c05b8a08>] cantiga_gfx_bar_enable+0x6f/0xa4 [<c05b8abd>] iommu_resume+0x80/0x126 [<c06282e7>] __sysdev_resume+0x19/0xb0 [<c062842e>] sysdev_resume+0xb0/0x11b [<c045cec6>] suspend_devices_and_enter+0x10e/0x184 [<c045d009>] enter_state+0xcd/0x119 [<c045c7e4>] state_store+0x98/0xac [<c045c74c>] ? state_store+0x0/0xac [<c059505d>] kobj_attr_store+0x16/0x22 [<c0501625>] sysfs_write_file+0xc1/0xec [<c0501564>] ? sysfs_write_file+0x0/0xec [<c04c07d4>] vfs_write+0x85/0xe4 [<c04c08d1>] sys_write+0x40/0x62 [<c040363c>] syscall_call+0x7/0xb ---[ end trace 5a79984d5796fe95 ]--- CPU0: Thermal monitoring handled by SMI Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1 APIC 0x1 ip 0x6000
(In reply to comment #28) > WARNING: at drivers/pci/intel-iommu.c:3098 cantiga_gfx_bar_enable+0x6f/0xa4() > (Not tainted) > Hardware name: HP EliteBook 2530p > BIOS failed to restore BARs for integrated graphics device; fixing... That's just confirming that it's noticed and fixed the problem. Your BIOS is broken, but we cope. It's reminding you to report the fault to your vendor and demand a fixed BIOS, and ensuring that these problems end up in kerneloops.org where we can see how prevalent they are (and which companies are responsible for them). Thanks.
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.