Starting with kernel 2.6.32 (up to 2.6.32.10-90), the system can no longer start X. After the normal boot sequence, when X should start, the computer restarts (as if the reset button had been pressed) and the bios displays its splash screen. The BIOS is then so confused that it cannot initiate POST and boot: while(1) { after a few seconds of splash screen, the screen blinks and the splash screen reappears ;} It looks like the GPU is in such a poor state that even the BIOS can no longer reset it. I have to completely shut down the computer to be able to boot again. In /var/log, the files Xorg.0.log and dmesg are created but are empty. The file messages contain the lines Mar 30 10:33:52 localhost kernel: imklog 4.4.2, log source = /proc/kmsg started. Mar 30 10:33:52 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.4.2" x-pid="1218" x-info="http://www.rsyslog.com"] (re)start Mar 30 10:33:52 localhost kernel: Initializing cgroup subsys cpuset Mar 30 10:33:52 localhost kernel: Initializing cgroup subsys cpu Mar 30 10:33:52 localhost kernel: Linux version 2.6.32.10-90.fc12.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.4.3 (notice how the last line is truncated.) Everything works well up to kernel-2.6.31.12-174.2.22 System is a HP Z400 in x86_64 mode, with an uptodate F12. xorg-x11-drv-nouveau-0.0.15-21.20091105gite1c2efd.fc12.x86_64 xorg-x11-server-Xorg-1.7.6-1.fc12.x86_64 0f:00.0 VGA compatible controller: nVidia Corporation Device 0659 (rev a1) (prog-if 00 [VGA controller]) Subsystem: nVidia Corporation Device 063a Physical Slot: 2 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 24 Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at e0000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at e000 [size=128] Expansion ROM at <unassigned> [disabled] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntrySize=0 Arb: Fixed- WRR32- WRR64- WRR128- 100ns- - - onfig- TableOffset=0 Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Fixed- RR32- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb The report might be related to bugs 571741, 571058, 577431 but I am not sure as the symptoms are not the same (I never reach a login screen. By the way, I use kdm.)
Created attachment 406011 [details] dmesg of faulty kernel I just want to confirm that the bug is still present in kernel-2.6.32.11-99.fc12.x86_64, with exactly the same symptoms: it works correctly in runlevel 3 (see attached dmesg) but reboots violently when X is started. When it happens, the bios is unable to restart unless there is a cold boot. Is there anything I can do to help corner that bug ? Thanks,
Today I tried to upgrade my f12 with the following packages from f13 kernel-2.6.33.1-24.fc13.x86_64.rpm libdrm-2.4.19-1.fc13.x86_64.rpm udev-151-7.fc13.x86_64.rpm xorg-x11-drv-nouveau-0.0.16-2.20100218git2964702.fc13.x86_64.rpm xorg-x11-server-common-1.7.99.902-2.20100319.fc13.x86_64.rpm xorg-x11-server-Xorg-1.7.99.902-2.20100319.fc13.x86_64.rpm xorg-x11-drv-fbdev-0.4.1-3.fc13.x86_64.rpm linux-firmware-20100106-4.fc14.noarch grubby-7.0.15-1.fc14.x86_64 (The last is fc14 because I also tried rawhide) The result are the same as above: the computer reboots violently into a sore state when X is being run. I also tried a vanilla 2.6.33.3 kernel (with f12 packages: same problem.) Does that mean I am stuck forever with f12 and kernel 2.6.31 ? What should I do ?
I have just tried to boot the LiveCD x86_64 (KDE) for Fedora 13 beta. The same problem occurs. When starting X, the computer reboots into a state where the BIOS itself is helpless. I'd like to stress that HP is for four years the official provider of computers for France's most important research organization (CNRS), and it would really be a pity if Fedora would not install on their computer. What happened between kernels 2.6.31 and 2.6.32 ? As the problem is present with fedora 13 beta, I'll try to add it to the fedora 13 target.
I upgraded my f12 today to kernel-2.6.32.12-115.fc12.x86_64, and the problem is still here. Back to 2.6.31. So to sum up; the bug appeared in a F12 update, and is still present in F13beta. I have no idea how to debug this.
(In reply to comment #4) > I upgraded my f12 today to kernel-2.6.32.12-115.fc12.x86_64, and the problem is > still here. Back to 2.6.31. > > So to sum up; the bug appeared in a F12 update, and is still present in > F13beta. > > I have no idea how to debug this. I have the same problem (and the same market to buy my Z400: french-epst) When I force the kernel option intel_iommu=off the workstations works on graphic mode. With drivers nouveau or kmod-nvidia on kernel-2.6.32.12-115.fc12.x86_64. Regards, Guillaume
I confirm that with the switch I can boot and enjoy kernel 2.6.32.12-115. For info, looking in the dmesg I have with both kernels (the 2.6.31 without the switch and the 2.6.32 with the switch) DMAR: Host address width 39 DMAR: DRHD base: 0x000000fed90000 flags: 0x1 IOMMU fed90000: ver 1:0 cap c90780106f0462 ecap f02076 DMAR: RMRR base: 0x000000cefd0000 end: 0x000000cefd0fff DMAR: RMRR base: 0x000000cefd1000 end: 0x000000cefd1fff DMAR: RMRR base: 0x000000cefd2000 end: 0x000000cefd2fff DMAR: RMRR base: 0x000000cefd3000 end: 0x000000cefd3fff DMAR: RMRR base: 0x000000cefd4000 end: 0x000000cefd4fff DMAR: RMRR base: 0x000000cefd5000 end: 0x000000cefd5fff DMAR: RMRR base: 0x000000cefd6000 end: 0x000000cefd6fff DMAR: RMRR base: 0x000000cefd7000 end: 0x000000cefd7fff DMAR: ATSR flags: 0x0 With the old kernel (2.6.31) without the switch I also have IOMMU 0xfed90000: using Queued invalidation IOMMU: hardware identity mapping for device 0000:0f:00.0 IOMMU: Setting RMRR: IOMMU: Setting identity map for device 0000:00:1a.2 [0xcefd7000 -0xcefd8000] IOMMU: Setting identity map for device 0000:00:1a.1 [0xcefd6000 - 0xcefd7000] IOMMU: Setting identity map for device 0000:00:1a.0 [0xcefd5000 - 0xcefd6000] IOMMU: Setting identity map for device 0000:00:1d.2 [0xcefd4000 - 0xcefd5000] IOMMU: Setting identity map for device 0000:00:1d.1 [0xcefd3000 - 0xcefd4000] IOMMU: Setting identity map for device 0000:00:1d.0 [0xcefd2000 - 0xcefd3000] IOMMU: Setting identity map for device 0000:00:1a.7 [0xcefd1000 - 0xcefd2000] IOMMU: Setting identity map for device 0000:00:1d.7 [0xcefd0000 - 0xcefd1000] IOMMU: Prepare 0-16MiB unity mapping for LPC IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O I am not sure what all of these mean. I remark that the end address is different by one unit between the two sections. The troublesome video card is 0f:00.0
I'm seeing it on HP Z600 though I'm not using nuveau driver. I had been updating, but had not rebooted in a while. first reboot I hit this problem. tried booting without GUI and I have no problems. manualy start X and segfault reboot version 2.6.32.12-115.fc12.x86_64 in dmesg I see nvidia: module license 'NVIDIA' taints kernel. Disabling lock debugging due to kernel taint a few lines down nvidia 0000:0f:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24 nvidia 0000:0f:0.0: setting latency timer to 64 vgaarb: device changed decodes: PCI:0000:0f:0.0,olddecodes=io+mem,decodes=none: owns=io+mem a few more lines down NVRM: loading NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010 nvidia-config-d[1429]: segfault at 7f8f2c000000 ip 00000039a9a79d3c sp 00007fff72a0b3c8 error 4 in libc-2.11.2.so[39a9a00000+170000] I tried the intel_iommu=off and am still getting the nvidia segfault but the systems comes up in init 5 tried installing Fedora-13-x86_64-DVD off of dvd and would reboot when it tried to bring up the GUI for install(dvd was downloaded 6-9-2010) didn't try with =off (didn't want to upgrade unless I had to)
The bug has disappeared with 2.6.33.5-124.fc13.x86_64; I can now boot without intel_iommu=off. I suspect that this bug is another manifestation of bug 561267 for which a fix went into the aforementionned kernel.