Bug 605432
Summary: | [NVaa] Xorg hangs with message: [mi] EQ overflowing. The server is probably stuck in an infinite loop | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | fedoraproject | ||||||||||||||||||
Component: | libdrm | Assignee: | Ben Skeggs <bskeggs> | ||||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||
Priority: | low | ||||||||||||||||||||
Version: | 13 | CC: | ajax, amessina, jsrhbz, mcepl, noldi, steve, vengmd | ||||||||||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | i686 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2011-03-06 23:39:30 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
fedoraproject
2010-06-17 22:16:00 UTC
Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please add drm.debug=0x04 to the kernel command line, restart computer, and attach * your X server config file (/etc/X11/xorg.conf, if available), * X server log file (/var/log/Xorg.*.log) * output of the dmesg command, and * system log (/var/log/messages) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above. We will review this issue again once you've had a chance to attach this information. Thanks in advance. * your X server config file (/etc/X11/xorg.conf, if available), There is no such file (/etc/X11/xorg.conf). Just one for the keyboard: [root@localhost ~]# ls /etc/X11/{X*,x*};find /etc/X11/xorg.conf.d/ /etc/X11/Xmodmap /etc/X11/Xresources /etc/X11/xinit: Xclients Xclients.d xinitrc xinitrc-common xinitrc.d xinput.d xinputrc Xsession /etc/X11/xorg.conf.d: 00-system-setup-keyboard.conf /etc/X11/xorg.conf.d/ /etc/X11/xorg.conf.d/00-system-setup-keyboard.conf [root@localhost ~]# cat /etc/X11/xorg.conf.d/00-system-setup-keyboard.conf # This file is autogenerated by system-setup-keyboard. Any # modifications will be lost. Section "InputClass" Identifier "system-setup-keyboard" MatchIsKeyboard "on" Option "XkbModel" "abnt2" Option "XkbLayout" "br" # Option "XkbVariant" "(null)" Option "XkbOptions" "terminate:ctrl_alt_bksp," EndSection [root@localhost ~]# Created attachment 426249 [details]
Xorg log file
I forgot to show this: [root@localhost ~]# cat /etc/rc.local #!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local amixer -c 0 sset 'Master Front' 80% echo 0x3ff > /sys/module/nouveau/parameters/reg_debug echo 1 > /sys/module/drm/parameters/debug echo 1 > /proc/sys/kernel/sysrq With those parameters, I got that response on dmesg (last line is near the moment when the system hangs): [root@localhost ~]# Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a3c: 0x0004089c Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a40: 0x01000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a44: 0x00080880 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a48: 0x85000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a4c: 0x00002200 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a50: 0x00040080 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a54: 0x00000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a58: PUSH! Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a58: 0x00080c80 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a5c: 0x05000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a60: 0x00000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a64: 0x00040c9c Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a68: 0x00000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a6c: 0x00040080 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a70: 0x00000000 Jun 17 11:34:34 localhost kernel: [drm] nouveau 0000:02:00.0: Ch-1/0x00000a74: PUSH! Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:34 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:35 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 Jun 17 11:34:35 localhost kernel: [drm:drm_ioctl], pid=1850, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1 ESC]0;root@localhost:~^G[root@localhost ~]# I've changed /etc/rc.local to this: [root@localhost ~]# cat /etc/rc.local #!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local amixer -c 0 sset 'Master Front' 80% #echo 0x3ff > /sys/module/nouveau/parameters/reg_debug echo 0x04 > /sys/module/drm/parameters/debug echo 1 > /proc/sys/kernel/sysrq [root@localhost ~]# And will clean the logs and do it all over. I've redirected the kernel messages to another file, so look for dmesg stuff in kerneldmesg. [root@localhost ~]# grep dmesg /etc/rsyslog.conf kern.* /var/log/kerneldmesg I'm attaching /var/log/messages too, and a script session with the Xorg log, and the last output from dmesg after the X crashed. If there is anything that I can do to help fix the bug, please let me know. Thanks for helping. Created attachment 426356 [details]
log from dmesg
Created attachment 426357 [details]
var log messages (except dmesg)
Created attachment 426361 [details]
Script session with the end of dmesg, and Xorg.0.log
I'm adding a "me too" to this bug. Occurring on both a Dell Precision M4500 Laptop with Nvidia Quadro FX880M and a Dell Optiplex Desktop with a Nvidia GeForce GT 220 Video card. Desktop is i386, laptop is x86_64 Most recent lockup on desktop occurred when arriving for work in the morning. Log in, open browser, scroll down a page, machine locks up. Attaching dmesg & Xorg.log.0*, nothing shows in /var/log/messages between 1am (DHCP renewal) and machine being restarted following crash. Desktop only at this point; Name : xorg-x11-drv-nouveau Relocations: (not relocatable) Version : 0.0.16 Vendor: Fedora Project Release : 7.20100423git13c1043.fc13 Build Date: Sat 26 Jun 2010 11:31:19 AM EST Name : glibc Relocations: (not relocatable) Version : 2.12 Vendor: Fedora Project Release : 3 Build Date: Tue 06 Jul 2010 11:59:44 PM EST Name : xorg-x11-server-Xorg Relocations: (not relocatable) Version : 1.8.2 Vendor: Fedora Project Release : 2.fc13 Build Date: Tue 20 Jul 2010 12:12:20 PM EST $ uname -a Linux sjw-dt.apl 2.6.33.6-147.fc13.i686.PAE #1 SMP Tue Jul 6 22:24:44 UTC 2010 i686 i686 i386 GNU/Linux Created attachment 435430 [details]
clean boot Xorg log
Created attachment 435431 [details]
dmesg
Created attachment 435432 [details]
Xorg.log with crash data
Hi, it looks that I'm having exactly the same problem on my new HP 8540w laptop with an nVidia graphics option. 01:00.0 VGA compatible controller: nVidia Corporation GT216 [Quadro FX 880M] (rev a2) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device 1521 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at d2000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M] Region 5: I/O ports at 5000 [size=128] Expansion ROM at d3080000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 256 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntrySize=0 Arb: Fixed- WRR32- WRR64- WRR128- 100ns- - - onfig- TableOffset=0 Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Fixed- RR32- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb Excerpt from the /var/log/Xorg.0.log.old file: [ 2662.352] (II) NOUVEAU(0): Modeline "640x350"x59.8 17.50 640 664 720 800 350 353 363 366 -hsync +vsync (21.9 kHz) [ 2662.451] (II) NOUVEAU(0): EDID for output DP-1 [ 2662.549] (II) NOUVEAU(0): EDID for output DP-2 [ 2662.552] (II) NOUVEAU(0): EDID for output eDP-1 [ 2662.650] (II) NOUVEAU(0): EDID for output DP-3 [ 2662.748] (II) NOUVEAU(0): EDID for output VGA-1 [ 3792.850] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 3792.852] Backtrace: [ 3792.852] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x45cef8] [ 3792.852] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49a4c4] [ 3792.852] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x46f0e4] [ 3792.852] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fa921ef6000+0x3dbf) [0x7fa921ef9dbf] [ 3792.853] 4: /usr/bin/Xorg (0x400000+0x6d747) [0x46d747] [ 3792.853] 5: /usr/bin/Xorg (0x400000+0x11ccf3) [0x51ccf3] [ 3792.853] 6: /lib64/libc.so.6 (0x7fa925d85000+0x32a20) [0x7fa925db7a20] [ 3792.853] 7: /lib64/libc.so.6 (ioctl+0x7) [0x7fa925e5e5a7] [ 3792.853] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7fa9243d6388] [ 3792.853] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7fa9243d660b] [ 3792.853] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7fa923d9a000+0x2dfd) [0x7fa923d9cdfd] [ 3792.853] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7fa923d9cfee] [ 3792.853] 12: /usr/lib64/libdrm_nouveau.so.1 (0x7fa923d9a000+0x207a) [0x7fa923d9c07a] [ 3792.853] 13: /usr/lib64/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x190) [0x7fa923d9c450] [ 3792.853] 14: /usr/lib64/xorg/modules/libexa.so (0x7fa923128000+0x9655) [0x7fa923131655] [ 3792.853] 15: /usr/lib64/xorg/modules/libexa.so (0x7fa923128000+0xa1fa) [0x7fa9231321fa] [ 3792.853] 16: /usr/bin/Xorg (0x400000+0xde91b) [0x4de91b] [ 3792.853] 17: /usr/lib64/xorg/modules/libexa.so (0x7fa923128000+0xb470) [0x7fa923133470] [ 3792.853] 18: /usr/bin/Xorg (0x400000+0xde32a) [0x4de32a] [ 3792.853] 19: /usr/bin/Xorg (0x400000+0xd2bce) [0x4d2bce] [ 3792.854] 20: /usr/bin/Xorg (0x400000+0x3619c) [0x43619c] [ 3792.854] 21: /usr/bin/Xorg (0x400000+0x2189a) [0x42189a] [ 3792.854] 22: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fa925da3c5d] [ 3792.854] 23: /usr/bin/Xorg (0x400000+0x21449) [0x421449] Is there anything I can add solving this bug? Thanks & Regards, Arnold I forgot to mention before, but if I disable video acceleration, the bug doesn't trigger. I've added the parameter: nouveau.noaccel=1 in the grub configuration file. I've also noticed that it "fixed" some random drawing errors. I've spotted them using the audio player amarok. Since I use gnome as a desktop, maybe is something related to the drawing of the kde, but I'm really uncertain about this. Hi again, I had to switch to the Closed-Source driver from nVidia in order to get a stable, working system. Please ping me if I can be of any help with my hardware in order to fix this/these bugs so I can test your findings on my box in order to get stable "nouveau" relase on the latest hardware. Best Regards, Arnold I've been running both desktop and laptop with the 'nouveau.noaccel=1' option added to the grub.conf for ~14 days, and the system has been more stable. It will still freeze every now and again, but it's for 1-2 seconds, then it recovers. One annoying side effect of this on the desktop (which has dual screens) is that the mouse pointer will only draw on the screen it was on when the freeze happened. The point does not drawer on the other screen, but I can still select things if I'm patient enough to try and click blindly all over the screen. The Mouse will continue to draw inside a VNC or RDP session on the affected screen. Hi, I've also reverted back to the nouveau driver and added the "nouveau.noaccel=1" boot option (I've had a misspelling at my first attempt, that's why it didn't work right away). It looks that I now have a stable system, at least on the display side. Has anybody an idea how things like this usually evolve? - Will the upstream code that eventually flows down to the release1 fix things like this automatically over time? Will this driver support accelleration eventually? Best Regards, Arnold Been getting this recently since I upgraded from F12 to F13 (F12 was fine, with the latest updates) - about once or twice a day. I briefly tried the noaccel option, but it makes the whole desktop annoyingly slower (which suggests that this kernel, also the latest F13 update, doesn't have this option on by default). kernel-2.6.34.7-56.fc13.x86_64 xorg-x11-drv-nouveau-0.0.16-8.20100423git13c1043.fc13.x86_64 xorg-x11-server-Xorg-1.8.2-4.fc13.x86_64 Backtrace from a previous crash: #0 0x00000030008329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003000834185 in abort () at abort.c:92 #2 0x000000000045db1e in OsAbort () at utils.c:1335 #3 0x000000000045ac1d in FatalError (f=0x7fb5f96af3b1 "Detected GPU lockup\n") at log.c:555 #4 0x00007fb5f9695d9d in NVLockedUp (pScrn=0x26551e0) at nv_dma.c:37 #5 NVSync (pScrn=0x26551e0) at nv_dma.c:66 #6 0x00007fb5f969620c in NVLeaveVT (scrnIndex=<value optimized out>, flags=<value optimized out>) at nv_driver.c:350 #7 0x000000000046d2b3 in AbortDDX () at xf86Init.c:1284 #8 0x000000000045a40d in AbortServer () at log.c:425 #9 0x000000000045ac60 in FatalError (f=0x575048 "Caught signal %d (%s). Server aborting\n") at log.c:553 #10 0x000000000046355e in OsSigHandler (signo=11, sip=<value optimized out>, unused=<value optimized out>) at osinit.c:156 #11 <signal handler called> #12 0x00000030008d95d7 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #13 0x0000003013803388 in drmIoctl (fd=10, request=1074291842, arg=0x7fff5ece3890) at xf86drm.c:184 #14 0x000000301380360b in drmCommandWrite (fd=<value optimized out>, drmCommandIndex=<value optimized out>, data=<value optimized out>, size=<value optimized out>) at xf86drm.c:2361 #15 0x00007fb5f944ddfd in nouveau_bo_wait (bo=0x2677500, cpu_write=<value optimized out>, no_wait=<value optimized out>, no_block=<value optimized out>) at nouveau_bo.c:385 #16 0x00007fb5f944dfee in nouveau_bo_map_range (bo=0x2677500, delta=0, size=<value optimized out>, flags=8) at nouveau_bo.c:428 #17 0x00007fb5f968f498 in NVAccelUploadM2MF (pdpix=0x4776450, x=0, y=0, w=<value optimized out>, h=4, src=0x52805c0 "\226\377\377\377\377\377\226", src_pitch=8) at nouveau_exa.c:197 #18 nouveau_exa_upload_to_screen (pdpix=0x4776450, x=0, y=0, w=<value optimized out>, h=4, src=0x52805c0 "\226\377\377\377\377\377\226", src_pitch=8) at nouveau_exa.c:469 #19 0x00007fb5f901a23f in exaCopyDirty (migrate=<value optimized out>, pValidDst=0x46df800, pValidSrc=0x46df7f0, transfer=0x7fb5f968f2b0 <nouveau_exa_upload_to_screen>, fallback_index=0, sync=0) at exa_migration_classic.c:220 #20 0x00007fb5f901c494 in exaDoMigration_mixed (pixmaps=<value optimized out>, npixmaps=3, can_accel=<value optimized out>) at exa_migration_mixed.c:113 #21 0x00007fb5f9022117 in exaTryDriverComposite (op=3 '\003', pSrc=0x38d9e90, pMask=0x4648f90, pDst=0x4731150, xSrc=4, ySrc=964, xMask=0, yMask=0, xDst=<value optimized out>, yDst=<value optimized out>, width=7, height=4) at exa_render.c:735 #22 0x00007fb5f90230b2 in exaComposite (op=3 '\003', pSrc=0x38d9e90, pMask=0x4648f90, pDst=0x4731150, xSrc=4, ySrc=964, xMask=0, yMask=0, xDst=4, yDst=964, width=7, height=4) at exa_render.c:1034 #23 0x00000000004d0d80 in damageComposite (op=3 '\003', pSrc=0x38d9e90, pMask=0x4648f90, pDst=0x4731150, xSrc=<value optimized out>, ySrc=<value optimized out>, xMask=0, yMask=0, xDst=4, yDst=964, width=7, height=4) at damage.c:643 #24 0x00007fb5f9d614ef in vncHooksComposite (op=3 '\003', pSrc=0x38d9e90, pMask=0x4648f90, pDst=0x4731150, xSrc=4, ySrc=964, xMask=0, yMask=0, xDst=4, yDst=964, width=7, height=4) at vncHooks.cc:534 #25 0x00007fb5f9021d08 in exaTrapezoids (op=3 '\003', pSrc=0x38d9e90, pDst=0x4731150, maskFormat=0x2677958, xSrc=4, ySrc=964, ntrap=0, traps=0x52f80c8) at exa_render.c:1183 #26 0x00000000004c9077 in ProcRenderTrapezoids (client=0x2ec4460) at render.c:780 #27 0x000000000042dbdc in Dispatch () at dispatch.c:439 #28 0x000000000042189a in main (argc=<value optimized out>, argv=0x7fff5ece4188, envp=<value optimized out>) at main.c:286 (Frame #12 was where it was sat in the same SIGALRM/ioctl loop others have reported, until I manually sent it SIGSEGV to get it to dump core.) I see the same EQ overflowing message and backtrace in the Xorg.0.log file, and at about the time it wedged I see in /var/log/messages the single drm message: kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 I didn't have the drm debugging turned up at the time this happened; I now have so will see if any more output appears there the next time I get this. (Once I've killed off the Xorg process it also leaves the VT hung, so that when it restarts automatically it does so on the next higher available VT and so on.) Nope, running with drm.debug=0x04, the only thing that appears in the dmesg output at the time of the lockup is that PFIFO_DMA_PUSHER line, and that is the only time it appears. (I was running kernel-2.6.32.21-168.fc12.x86_64 under F12 without this problem.) Card is: 0f:00.0 VGA compatible controller [0300]: nVidia Corporation G72 [GeForce 7300 SE/7200 GS] [10de:01d3] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Giga-byte Technology Device [1458:3470] Physical Slot: 2 Flags: bus master, fast devsel, latency 0, IRQ 24 Memory at e0000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at e1000000 (64-bit, non-prefetchable) [size=16M] Expansion ROM at <unassigned> [disabled] Capabilities: [60] Power Management version 2 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb Just got something slightly different on this one: dmesg shows the following: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DATA_ERROR, nStatus: BAD_ARGUMENT [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1840 Data 0x00000000:0x000a00a7 Same backtrace as before. (Is the problem understood at all? Or is there any extra information that would track it down? I could probably go as far as gdbing the wedged X server and digging around, *if* I knew what I was looking for, and assuming there is anything diagnostic present in user space rather than the driver itself.) Created attachment 455952 [details]
dmesg output
After upgrading to kernel-2.6.34.7-61.fc13.x86_64 on Monday I haven't seen the lockup - at a previous rate of 1-3 per day I think that's likely gone.
What I am seeing now is occasional font corruption, often a single glyph gets trashed wherever that one character/font combination appears on the desktop, for a few minutes before recovering.
dmesg shows a flurry of new output, PGRAPH_ERROR and PFIFO_DMA_PUSHER, attached.
(It's my impression that instances of corruption coincide with the appearance of new dmesg messages.)
Ah, that's good to know. The kernel you're using has better GPU error handling implemented, so it probably is indeed the same bug, just with less severe consequences now. Just had another hang (after 3 weeks continuous running), which looks a little different now: [2082340.099] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [2082340.128] Backtrace: [2082340.441] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x460d18] [2082340.441] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x45a0a4] [2082340.441] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x473f84] [2082340.441] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f77b2f73000+0x3dbf) [0x7f77b2f76dbf] [2082340.441] 4: /usr/bin/Xorg (0x400000+0x664f7) [0x4664f7] [2082340.441] 5: /usr/bin/Xorg (0x400000+0x112263) [0x512263] [2082340.451] 6: /lib64/libc.so.6 (0x7f77b6edb000+0x32a20) [0x7f77b6f0da20] [2082340.451] 7: /lib64/libc.so.6 (ioctl+0x7) [0x7f77b6fb45b7] [2082340.451] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x3013803388] [2082340.451] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x301380360b] [2082340.451] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f77b5900000+0x2dfd) [0x7f77b5902dfd] [2082340.452] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7f77b5902fee] [2082340.452] 12: /usr/lib64/libdrm_nouveau.so.1 (0x7f77b5900000+0x207a) [0x7f77b590207a] [2082340.452] 13: /usr/lib64/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x190) [0x7f77b5902450] [2082340.452] 14: /usr/lib64/xorg/modules/libexa.so (0x7f77b54ca000+0x9655) [0x7f77b54d3655] [2082340.452] 15: /usr/lib64/xorg/modules/libexa.so (0x7f77b54ca000+0xa1fa) [0x7f77b54d41fa] [2082340.452] 16: /usr/bin/Xorg (0x400000+0xd169b) [0x4d169b] [2082340.452] 17: /usr/lib64/xorg/modules/extensions/libvnc.so (0x7f77b61a5000+0x379c5) [0x7f77b61dc9c5] [2082340.452] 18: /usr/lib64/xorg/modules/libexa.so (0x7f77b54ca000+0xb470) [0x7f77b54d5470] [2082340.452] 19: /usr/bin/Xorg (0x400000+0xd10aa) [0x4d10aa] [2082340.452] 20: /usr/bin/Xorg (0x400000+0xc775e) [0x4c775e] [2082340.452] 21: /usr/bin/Xorg (0x400000+0x2dbdc) [0x42dbdc] [2082340.452] 22: /usr/bin/Xorg (0x400000+0x2189a) [0x42189a] [2082340.452] 23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f77b6ef9c5d] [2082340.452] 24: /usr/bin/Xorg (0x400000+0x21449) [0x421449] dmesg reports: Nov 18 12:47:11 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x1f340000 Put 0x0001f978 State 0xc002780c Push 0x00000000 Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x3e0ccccc Put 0x00016db0 State 0xc0022054 Push 0x00000000 Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x00026044 Put 0x000175e0 State 0x20022054 Push 0x00000000 Nov 18 12:47:21 kernel: nouveau_ratelimit: 12 callbacks suppressed Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT PROTECTION_FAULT Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/3 Class 0x004a Mthd 0x0c70 Data 0x00000000:0x00606c70 Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT PROTECTION_FAULT Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/3 Class 0x004a Mthd 0x0c74 Data 0x00000000:0x00606c70 [the above two lines repeat another 18 times] Nov 18 12:47:21 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x001f0b10 Put 0x00017ba0 State 0x404a6070 Push 0x00000000 In particular, I don't recall having seen the ratelimit line before. These problems (old hangs, the new glitches, and this new hang) subjectively seem to occur during vertical scrolling. Mostly this is paging down in firefox, this hang happened while a large build was spewing a lot of output in a gnome-terminal window, which I've also seen before. The terminal window (normally black-on-white) went entirely black and the X server wedged (mouse movement but no other activity, including VT switching possible. Requires ssh from outside to kill the X server, which then restarts and recovers.) Although grepping /var/log/messages* shows 7 other instances of the ratelimit line, all after the kernel upgrade, none from before. The earliest instance had a massive 1189 callbacks suppressed (working backwards the others are 12,4,13,75,140,62,6,1189). This appears far less frequently that the glitches are happening though. (In reply to comment #24) > In particular, I don't recall having seen the ratelimit line before. > These problems (old hangs, the new glitches, and this new hang) subjectively > seem to occur during vertical scrolling. Mostly this is paging down in firefox, > this hang happened while a large build was spewing a lot of output in a > gnome-terminal window, which I've also seen before. The terminal window > (normally black-on-white) went entirely black and the X server wedged (mouse > movement but no other activity, including VT switching possible. Requires ssh > from outside to kill the X server, which then restarts and recovers.) I can confirm that I mostly see and can intermittently reproduce this issue with vertical scrolling in firefox. It occurs whether I use the Page Down key or use the scroll wheel on the mouse (though it seems to happen more with the mouse wheel). Just got another hang after 24 hours. On the basis that this may indicate progressive damage within the kernel driver worsening the situation, I have rebooted rather than just restarted the X server. Hopefully this will give me another 3 weeks without lockups. This was during a lot of text output in gnome-terminal again. Tying this together with the font corruption I think this may be happening: Presumably the font server is rendering fonts internally then copying them to off-screen video memory, then applications (whether themselves or by asking the X server, or the font server, I don't know the exact mechanics) can blit between that buffer and the screen on-demand. (My metacity config isn't set for compositing, compositing_manager is unchecked in gconf-editor, so it would be between off-screen and screen, I think.) So the fact that individual glyphs get corrupted throughout the screen indicates the font server is failing to copy into the off-screen texture correctly. This persists for a short while, until the glyph is either evicted or otherwise expired from the cache I assume. The corruption tends to happen during vertical scrolling (whether firefox or a terminal), but I don't think it's the scrolling itself that is the problem. I don't think firefox caches the view itself, so will be rendering each glyph anew as they are scrolled into view. Similarly with the terminal. This will involve a huge number of very small blits from off-screen to -onscreen, presumably in a heavily optimised path so they can all be queued up in rapid succession. This process probably (given the corruption) involves a lot of font server glyph cache replacement too, so there will a lot of traffic both in and out of the off-screen buffer. So I think that is the trigger. Apparently not. After an hour the machine hung again with *massive* corruption in the terminal window. (All black-and-white artifacts, not generic random-colour/noise graphics corruption, looks like lots of over-drawn glyphs and some blocks overdrawn with horizontal stripes.) This time it hard locked: no mouse pointer, and no network access. (Same kernel as before, 2.6.34.7-61.) No details of the event survived in any log files. CLosing as duplicate of bug 465884 for reasons explained there. Please, file a bug for each separate issue here (crashes etc.), message "EQ overflowing ..." etc. is so generic that it actually doesn't mean much. *** This bug has been marked as a duplicate of bug 465884 *** |