Bug 698733 - [NV46] various nouveau lockups [@ nouveau_bo_wait]
Summary: [NV46] various nouveau lockups [@ nouveau_bo_wait]
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 14
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: [cat:crash]
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-21 16:05 UTC by John Sullivan
Modified: 2018-04-11 14:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-16 12:45:18 UTC
Type: ---


Attachments (Terms of Use)
xorg.conf (2.21 KB, text/plain)
2011-05-11 10:40 UTC, John Sullivan
no flags Details
Xorg.log file (127.01 KB, text/plain)
2011-05-11 10:44 UTC, John Sullivan
no flags Details
current dmesg output (108.16 KB, text/plain)
2011-05-11 10:58 UTC, John Sullivan
no flags Details
Xorg.log file of another lockup (244.45 KB, text/plain)
2011-05-17 13:22 UTC, John Sullivan
no flags Details
associated dmesg output (122.66 KB, text/plain)
2011-05-17 13:22 UTC, John Sullivan
no flags Details
screen corruption (4.44 KB, image/png)
2011-05-17 13:24 UTC, John Sullivan
no flags Details
Xorg.log file from screen corruption (116.91 KB, text/plain)
2011-05-17 13:25 UTC, John Sullivan
no flags Details
dmesg output from screen corruption (87.86 KB, text/plain)
2011-05-17 13:27 UTC, John Sullivan
no flags Details

Description John Sullivan 2011-04-21 16:05:42 UTC
Re-raising as per bug 605432.

I have just upgraded from up-to-date F13, to up-to-date F14. I upgraded from F12 to F13 last October.

Every day or two, X hangs. There is a scale of badness:
  mouse pointer active everything else frozen (ssh/kill -9 X recovers)
  mouse pointer frozen (ssh/kill -9 X recovers)
  machine pings but sshd inaccessible
  machine doesn't ping
  machine spontaneously reboots

F12 didn't show this behaviour, F13 and F14 do.

AFAICT all or almost all occurrences have been during vertical scrolling of a gnome-terminal (heavy console output such as compiles or catting logfiles etc), or vertical scrolling in a firefox window (often just small movements).

Often, /var/messages/Xorg.0.log contains the backtrace:

[ 83275.462] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4a0908]
[ 83275.462] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49fe04]
[ 83275.462] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
[ 83275.462] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f4b069b2000+0x453f) [0x7f4b069b653f]
[ 83275.462] 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
[ 83275.462] 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
[ 83275.463] 6: /lib64/libc.so.6 (0x3000800000+0x33140) [0x3000833140]
[ 83275.463] 7: /lib64/libc.so.6 (ioctl+0x7) [0x30008d8997]
[ 83275.463] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x300e403648]
[ 83275.463] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x300e4057ab]
[ 83275.463] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f4b099ba000+0x2ded) [0x7f4b099bcded]
[ 83275.463] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7f4b099bd3fe]
[ 83275.463] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f4b09bff000+0x566e) [0x7f4b09c0466e]
[ 83275.463] 13: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x5fc7) [0x7f4b09584fc7]
[ 83275.463] 14: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x877a) [0x7f4b0958777a]
[ 83275.463] 15: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0xf75f) [0x7f4b0958e75f]
[ 83275.463] 16: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x102de) [0x7f4b0958f2de]
[ 83275.463] 17: /usr/bin/Xorg (0x400000+0xd6993) [0x4d6993]
[ 83275.463] 18: /usr/lib64/xorg/modules/extensions/libvnc.so (0x7f4b0a2a5000+0x326a8) [0x7f4b0a2d76a8]
[ 83275.463] 19: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x11388) [0x7f4b09590388]
[ 83275.463] 20: /usr/bin/Xorg (0x400000+0xd1d11) [0x4d1d11]
[ 83275.463] 21: /usr/bin/Xorg (0x400000+0x2d521) [0x42d521]
[ 83275.463] 22: /usr/bin/Xorg (0x400000+0x2157e) [0x42157e]
[ 83275.463] 23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x300081ee5d]
[ 83275.463] 24: /usr/bin/Xorg (0x400000+0x21129) [0x421129]

Hangs are often accompanied by [drm] error messages via dmesg and /var/log/messages. Sometimes a flurry of such messages occurs without triggering a hang. The last crash produced:

Apr 21 13:09:39 host kernel: [83049.597063] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x0002ed08 Put 0x0001e0f0 State 0x2002d8c8 Push 0x00000000

Shortly before, these failed to cause a hang:

Apr 21 12:27:23 host kernel: [80520.552098] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552103] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a00 Data 0x0001a129:0x02c88000
Apr 21 12:27:23 host kernel: [80520.552112] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552116] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a04 Data 0x0001a129:0x0001a129
Apr 21 12:27:23 host kernel: [80520.552125] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552129] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a08 Data 0x80000000:0x00040404
Apr 21 12:27:23 host kernel: [80520.552138] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552142] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a0c Data 0x80000000:0x80000000
Apr 21 12:27:23 host kernel: [80520.552150] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552154] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a10 Data 0x01013fd6:0x000002ff
Apr 21 12:27:23 host kernel: [80520.552163] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552167] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a14 Data 0x01013fd6:0x01013fd6
Apr 21 12:27:23 host kernel: [80520.552175] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552179] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a18 Data 0x00000000:0x04000140
Apr 21 12:27:23 host kernel: [80520.552188] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552192] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1a1c Data 0x00000000:0x00000000
Apr 21 12:27:23 host kernel: [80520.552201] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552204] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1840 Data 0x00000000:0x00100400
Apr 21 12:27:23 host kernel: [80520.552213] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552217] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1ff4 Data 0x0000c001:0x0000c001
Apr 21 12:27:23 host kernel: [80520.552226] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552229] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x08e4 Data 0x00000000:0x0091a0c1
Apr 21 12:27:23 host kernel: [80520.552238] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: DMA_WIDTH_B, nStatus: INVALID_STATE
Apr 21 12:27:23 host kernel: [80520.552242] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/5 Class 0x4497 Mthd 0x1d60 Data 0x00000000:0x02000000
Apr 21 12:27:23 host kernel: [80520.552266] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x00022044 Put 0x000157b8 State 0x20022054 Push 0x00000000
Apr 21 12:27:23 host kernel: [80520.554141] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x0002281c Put 0x000158a8 State 0x80022054 Push 0x00000000

Often, these flurries were accompanied with major window (the scrolling window either doesn't draw or gets huge black areas)/font (individual glyphs are corrupt wherever they appear throughout the desktop) corruption under F13. This appears to be much less under F14, but still present. I notice that the changelog for the latest F14 kernel shows a potentially relevant change to nouveau betwee it and the latest F13 kernel.

Currently running versions:

kernel-2.6.35.12-88.fc14.x86_64
xorg-x11-server-Xorg-1.9.5-1.fc14.x86_64
xorg-x11-drv-nouveau-0.0.16-11.20100826git065576d.fc14.x86_64
libdrm-2.4.22-1.fc14.x86_64

Graphics card is:

0f:00.0 VGA compatible controller [0300]: nVidia Corporation G72 [GeForce 7300 SE/7200 GS] [10de:01d3] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Giga-byte Technology Device [1458:3470]
        Physical Slot: 2
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 24
        Region 0: Memory at e0000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at e1000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns, L1 <4us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Kernel driver in use: nouveau
        Kernel modules: nouveau, nvidiafb


The kernel reports:

Apr 14 15:30:03 orthrus kernel: [drm] nouveau 0000:0f:00.0: Detected an NV40 generation card (0x246300b1)
Apr 14 15:30:03 orthrus kernel: [drm] nouveau 0000:0f:00.0: Attempting to load BIOS image from PRAMIN
Apr 14 15:30:03 orthrus kernel: [drm] nouveau 0000:0f:00.0: ... appears to be valid
Apr 14 15:30:03 orthrus kernel: [drm] nouveau 0000:0f:00.0: BIT BIOS found
Apr 14 15:30:03 orthrus kernel: [drm] nouveau 0000:0f:00.0: Bios version 05.72.22.81

Xorg reports:

[  1948.874] (--) NOUVEAU(0): Chipset: "NVIDIA NV46"


Can aid debugging, build instrumented kernels etc., if told where to look.

Comment 1 Matěj Cepl 2011-05-10 21:08:14 UTC
Thanks for the bug report.  We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.

Please add drm.debug=0x04 to the kernel command line, restart computer, and attach

* your X server config file (/etc/X11/xorg.conf, if available),
* complete X server log file (/var/log/Xorg.*.log)
* complete output of the dmesg command, and
* system log (/var/log/messages)

to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 2 John Sullivan 2011-05-11 10:39:28 UTC
The above output is *with* drm.debug=0x04.

Comment 3 John Sullivan 2011-05-11 10:40:25 UTC
Created attachment 498251 [details]
xorg.conf

Comment 4 John Sullivan 2011-05-11 10:44:07 UTC
Created attachment 498253 [details]
Xorg.log file

Comment 5 John Sullivan 2011-05-11 10:58:14 UTC
Created attachment 498255 [details]
current dmesg output

As can be seen there are a lot of drm related messages. (Haven't had a lock-up for a while but there's been plenty of screen corruption.)

Don't appear to have any recorded dmesg output from the last boot sequence - will try and capture it next time I reboot.

Comment 6 John Sullivan 2011-05-11 11:01:29 UTC
Is there anything in particular you're looking for in /var/log/messages?

I can't see any extra output around the time of problems that doesn't appear in the dmesg output (and there's an awful lot of unrelated but potentially sensitive local networking/authentication crud which I'm not happy about posting.)

Comment 7 Matěj Cepl 2011-05-14 17:53:41 UTC
(In reply to comment #6)
> Is there anything in particular you're looking for in /var/log/messages?

It is kind of difficult. In the ideal situation (like in your case) /var/log/messages and dmesg output contain the similar information. However, we are talking about crashing systems here, so this duplicity can serve as backup.

Comment 8 Matěj Cepl 2011-05-14 18:04:27 UTC
Backtrace:
[ 83275.462] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4a0908]
[ 83275.462] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49fe04]
[ 83275.462] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
[ 83275.462] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f4b069b2000+0x453f) [0x7f4b069b653f]
[ 83275.462] 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
[ 83275.462] 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
[ 83275.463] 6: /lib64/libc.so.6 (0x3000800000+0x33140) [0x3000833140]
[ 83275.463] 7: /lib64/libc.so.6 (ioctl+0x7) [0x30008d8997]
[ 83275.463] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x300e403648]
[ 83275.463] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x300e4057ab]
[ 83275.463] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f4b099ba000+0x2ded) [0x7f4b099bcded]
[ 83275.463] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7f4b099bd3fe]
[ 83275.463] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f4b09bff000+0x566e) [0x7f4b09c0466e]
[ 83275.463] 13: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x5fc7) [0x7f4b09584fc7]
[ 83275.463] 14: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x877a) [0x7f4b0958777a]
[ 83275.463] 15: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0xf75f) [0x7f4b0958e75f]
[ 83275.463] 16: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x102de) [0x7f4b0958f2de]
[ 83275.463] 17: /usr/bin/Xorg (0x400000+0xd6993) [0x4d6993]
[ 83275.463] 18: /usr/lib64/xorg/modules/extensions/libvnc.so (0x7f4b0a2a5000+0x326a8) [0x7f4b0a2d76a8]
[ 83275.463] 19: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x11388) [0x7f4b09590388]
[ 83275.463] 20: /usr/bin/Xorg (0x400000+0xd1d11) [0x4d1d11]
[ 83275.463] 21: /usr/bin/Xorg (0x400000+0x2d521) [0x42d521]
[ 83275.463] 22: /usr/bin/Xorg (0x400000+0x2157e) [0x42157e]
[ 83275.463] 23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x300081ee5d]
[ 83275.463] 24: /usr/bin/Xorg (0x400000+0x21129) [0x421129]

Comment 9 John Sullivan 2011-05-17 13:22:26 UTC
Created attachment 499349 [details]
Xorg.log file of another lockup

Locked up again just now.

Comment 10 John Sullivan 2011-05-17 13:22:56 UTC
Created attachment 499350 [details]
associated dmesg output

Comment 11 John Sullivan 2011-05-17 13:24:45 UTC
Created attachment 499352 [details]
screen corruption

Shortly after recovering from the lockup, while restarting my firefox session, got a good example of screen font corruption. Note how all the "s", "n" and "u" suffer the same way.

Comment 12 John Sullivan 2011-05-17 13:25:54 UTC
Created attachment 499353 [details]
Xorg.log file from screen corruption

As far as I can see, there is no additional output to Xorg.log related to this.

Comment 13 John Sullivan 2011-05-17 13:27:11 UTC
Created attachment 499354 [details]
dmesg output from screen corruption

The dmesg output is more interesting however (and a little different from the dmesg output from the lockup.)

Comment 14 Matěj Cepl 2011-06-16 22:08:43 UTC
(In reply to comment #0)
> Often, /var/messages/Xorg.0.log contains the backtrace:

Annotated:

Frame 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
/usr/src/debug/xorg-server-1.9.5/hw/xfree86/common/xf86Xinput.c:1053
xf86PostMotionEventP
Frame 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
/usr/src/debug/xorg-server-1.9.5/hw/xfree86/common/xf86Events.c:300
xf86SigioReadInput
Frame 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
/usr/src/debug/xorg-server-1.9.5/hw/xfree86/os-support/linux/../shared/sigio.c:115
xf86SIGIO
Frame 13: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x5fc7) [0x7f4b09584fc7]
/usr/src/debug/xorg-server-1.9.5/exa/exa_migration_classic.c:220
exaCopyDirty
Frame 14: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x877a) [0x7f4b0958777a]
/usr/src/debug/xorg-server-1.9.5/exa/exa_migration_mixed.c:115
exaDoMigration_mixed
Frame 15: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0xf75f) [0x7f4b0958e75f]
/usr/src/debug/xorg-server-1.9.5/exa/exa_render.c:734
exaTryDriverComposite
Frame 16: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x102de) [0x7f4b0958f2de]
/usr/src/debug/xorg-server-1.9.5/exa/exa_render.c:1035
exaComposite
Frame 17: /usr/bin/Xorg (0x400000+0xd6993) [0x4d6993]
/usr/src/debug/xorg-server-1.9.5/miext/damage/damage.c:652
damageComposite
Frame 19: /usr/lib64/xorg/modules/libexa.so (0x7f4b0957f000+0x11388) [0x7f4b09590388]
/usr/src/debug/xorg-server-1.9.5/exa/exa_render.c:1185
exaTrapezoids
Frame 20: /usr/bin/Xorg (0x400000+0xd1d11) [0x4d1d11]
/usr/src/debug/xorg-server-1.9.5/render/render.c:778
ProcRenderTrapezoids
Frame 21: /usr/bin/Xorg (0x400000+0x2d521) [0x42d521]
/usr/src/debug/xorg-server-1.9.5/dix/dispatch.c:432
Dispatch
Frame 22: /usr/bin/Xorg (0x400000+0x2157e) [0x42157e]
/usr/src/debug/xorg-server-1.9.5/dix/main.c:291
main
bradford:tmp $

Comment 15 John Sullivan 2011-06-17 00:08:27 UTC
I think the significant frame is 7: we're stuck in an ioctl call that never returns (presumably because the kernel side has "lost" the command we're waiting for completion on.)

Anything after (higher) than that is just the periodic signal handler which does input event handling.

In terms of what happens just before, I don't know if there's any consistency in the particular libexa function involved (next time it happens I'll try to get gdb on the running server to find out), but the libdrm_nouveau bo_map_range call does seem fairly consistent.

Comment 16 John Sullivan 2011-06-20 15:29:11 UTC
Another, with gdb-supplied backtrace.

(System boot time Tue Jun 14 13:35:14 BST 2011)

Xorg.0.log output:

[524437.923] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[524437.923] 
Backtrace:
[524437.931] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4a0908]
[524437.931] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49fe04]
[524437.931] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
[524437.931] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7ffc85c14000+0x453f) [0x7ffc85c1853f]
[524437.931] 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
[524437.931] 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
[524437.931] 6: /lib64/libc.so.6 (0x3000800000+0x33140) [0x3000833140]
[524437.931] 7: /lib64/libc.so.6 (ioctl+0x7) [0x30008d8997]
[524437.931] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x300e403648]
[524437.931] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x300e4057ab]
[524437.931] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7ffc88cb8000+0x2ded) [0x7ffc88cbaded]
[524437.931] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7ffc88cbb3fe]
[524437.931] 12: /usr/lib64/libdrm_nouveau.so.1 (0x7ffc88cb8000+0x1f2d) [0x7ffc88cb9f2d]
[524437.931] 13: /usr/lib64/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x1af) [0x7ffc88cba4ff]
[524437.931] 14: /usr/lib64/xorg/modules/libexa.so (0x7ffc8887d000+0x9105) [0x7ffc88886105]
[524437.931] 15: /usr/lib64/xorg/modules/libexa.so (0x7ffc8887d000+0xbcd1) [0x7ffc88888cd1]
[524437.931] 16: /usr/bin/Xorg (0x400000+0xd7537) [0x4d7537]
[524437.931] 17: /usr/lib64/xorg/modules/extensions/libvnc.so (0x7ffc895a3000+0x34859) [0x7ffc895d7859]
[524437.931] 18: /usr/lib64/xorg/modules/libexa.so (0x7ffc8887d000+0xd328) [0x7ffc8888a328]
[524437.931] 19: /usr/bin/Xorg (0x400000+0xd6c35) [0x4d6c35]
[524437.931] 20: /usr/bin/Xorg (0x400000+0xd058d) [0x4d058d]
[524437.931] 21: /usr/bin/Xorg (0x400000+0x2d521) [0x42d521]
[524437.931] 22: /usr/bin/Xorg (0x400000+0x2157e) [0x42157e]
[524437.931] 23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x300081ee5d]
[524437.931] 24: /usr/bin/Xorg (0x400000+0x21129) [0x421129]

gdb version:

#0  0x00000030008d8997 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000300e403648 in drmIoctl (fd=10, request=1074291842, arg=0x7fffa020efd0) at xf86drm.c:184
#2  0x000000300e4057ab in drmCommandWrite (fd=<value optimized out>, drmCommandIndex=<value optimized out>, data=<value optimized out>, size=<value optimized out>) at xf86drm.c:2381
#3  0x00007ffc88cbaded in nouveau_bo_wait (bo=0x2358d70, cpu_write=<value optimized out>, no_wait=<value optimized out>, no_block=<value optimized out>) at nouveau_bo.c:385
#4  0x00007ffc88cbb3fe in nouveau_bo_map_range (bo=0x2358d70, delta=0, size=<value optimized out>, flags=8) at nouveau_bo.c:428
#5  0x00007ffc88cb9f2d in nouveau_pushbuf_space (chan=0x2355b20, min=<value optimized out>) at nouveau_pushbuf.c:53
#6  0x00007ffc88cba4ff in nouveau_pushbuf_flush (chan=0x2355b20, min=0) at nouveau_pushbuf.c:273
#7  0x00007ffc88886105 in exaFillRegionSolid (pDrawable=0x4180050, pRegion=0x38d8480, pixel=0, planemask=4294967295, alu=3, clientClipType=<value optimized out>) at exa_accel.c:1038
#8  0x00007ffc88888cd1 in exaPolyFillRect (pDrawable=0x4180050, pGC=0x23b8dd0, nrect=1, prect=0x7fffa0210c10) at exa_accel.c:817
#9  0x00000000004d7537 in damagePolyFillRect (pDrawable=0x4180050, pGC=0x23b8dd0, nRects=1, pRects=0x7fffa0210c10) at damage.c:1400
#10 0x00007ffc895d7859 in ?? () from /usr/lib64/xorg/modules/extensions/libvnc.so
#11 0x00007ffc8888a328 in exaGlyphs (op=3 '\003', pSrc=0x449f540, pDst=0x4050a20, maskFormat=0x235ac78, xSrc=604, ySrc=119, nlist=1, list=0x7fffa02115f0, glyphs=0x7fffa0210df0)
    at exa_glyphs.c:785
#12 0x00000000004d6c35 in damageGlyphs (op=3 '\003', pSrc=0x449f540, pDst=0x4050a20, maskFormat=0x235ac78, xSrc=604, ySrc=119, nlist=1, list=0x7fffa02115f0, glyphs=0x7fffa0210df0)
    at damage.c:718
#13 0x00000000004d058d in ProcRenderCompositeGlyphs (client=0x290b530) at render.c:1435
#14 0x000000000042d521 in Dispatch () at dispatch.c:432
#15 0x000000000042157e in main (argc=7, argv=<value optimized out>, envp=<value optimized out>) at main.c:291

dmesg output:

[523512.932865] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x1f800000 Put 0x0001ca60 State 0xc0020001 Push 0x00000000
[523512.933806] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x00090070 Put 0x0001cba0 State 0x20022054 Push 0x00000000
[523512.934510] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x000225e0 Put 0x0001ccb0 State 0x80022054 Push 0x00000000
[523512.935346] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x00023f3c Put 0x0001cdf0 State 0x20022054 Push 0x00000000

Note that the timestamps (relative to system boot time) on the dmesg lines indicates they were generated at 15:00:26, but they only show up in /var/log/messages at 15:15:43, 7 seconds before the Xorg.log backtrace is logged. (The lockup itself occurred around 15:15 and was apparently functioning fine in the period 15:00-15:15. Extra symptoms: top showed Xorg using 70-100% CPU, "kill -9" on it appears to have recovered the machine.)

Comment 17 John Sullivan 2011-06-21 15:47:55 UTC
Twice so far today. Same thing applies with the dmesg timestamp but I've tracked that down and it's a red herring: printk() uses cpu_clock(), presumably for efficiency, but /proc/uptime uses do_posix_clock_monotonic_gettime() which provides a mostly accurate second-since-boot count. Thus the printk() timestamps tend to drift from actual uptime over time.

1) Scrolling text output in a gnome-terminal

[599099.925474] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x000267c8 Put 0x00020c98 State 0x20022054 Push 0x00000000
[599101.833988] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x0002ad64 Put 0x000192d0 State 0x2002a408 Push 0x00000000

#0  0x00000030008d8997 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000300e403648 in drmIoctl (fd=10, request=1074291842, arg=0x7fff1a211560) at xf86drm.c:184
#2  0x000000300e4057ab in drmCommandWrite (fd=<value optimized out>, drmCommandIndex=<value optimized out>, data=<value optimized out>, 
    size=<value optimized out>) at xf86drm.c:2381
#3  0x00007fc0ae854ded in nouveau_bo_wait (bo=0xfa9ab0, cpu_write=<value optimized out>, no_wait=<value optimized out>, 
    no_block=<value optimized out>) at nouveau_bo.c:385
#4  0x00007fc0ae8553fe in nouveau_bo_map_range (bo=0xfa9ab0, delta=0, size=<value optimized out>, flags=8) at nouveau_bo.c:428
#5  0x00007fc0aea9c66e in NVAccelUploadM2MF (pdpix=0x2875920, x=0, y=0, w=<value optimized out>, h=4, src=0x261ca60 "\226\377\377\377\377\377\226", 
    src_pitch=8) at nouveau_exa.c:190
#6  nouveau_exa_upload_to_screen (pdpix=0x2875920, x=0, y=0, w=<value optimized out>, h=4, src=0x261ca60 "\226\377\377\377\377\377\226", src_pitch=8)
    at nouveau_exa.c:462
#7  0x00007fc0ae41cfc7 in exaCopyDirty (migrate=<value optimized out>, pValidDst=0x28759c8, pValidSrc=0x28759b8, 
    transfer=0x7fc0aea9c4a0 <nouveau_exa_upload_to_screen>, fallback_index=0, sync=0) at exa_migration_classic.c:220
#8  0x00007fc0ae41f77a in exaDoMigration_mixed (pixmaps=<value optimized out>, npixmaps=3, can_accel=<value optimized out>)
    at exa_migration_mixed.c:113
#9  0x00007fc0ae42675f in exaTryDriverComposite (op=3 '\003', pSrc=0x154dfc0, pMask=0x2763a60, pDst=0x25f60a0, xSrc=4, ySrc=848, xMask=0, yMask=0, 
    xDst=<value optimized out>, yDst=<value optimized out>, width=7, height=4) at exa_render.c:734
#10 0x00007fc0ae4272de in exaComposite (op=3 '\003', pSrc=0x154dfc0, pMask=0x2763a60, pDst=0x25f60a0, xSrc=4, ySrc=848, xMask=0, yMask=0, xDst=4, 
    yDst=848, width=7, height=4) at exa_render.c:1033
#11 0x00000000004d6993 in damageComposite (op=3 '\003', pSrc=0x154dfc0, pMask=0x2763a60, pDst=0x25f60a0, xSrc=4, ySrc=848, xMask=0, yMask=0, xDst=4, 
    yDst=848, width=7, height=4) at damage.c:640
#12 0x00007fc0af16f6d8 in vncHooksComposite (op=3 '\003', pSrc=0x154dfc0, pMask=0x2763a60, pDst=0x25f60a0, xSrc=4, ySrc=848, xMask=0, yMask=0, 
    xDst=4, yDst=848, width=7, height=4) at vncHooks.cc:554
#13 0x00007fc0ae428388 in exaTrapezoids (op=3 '\003', pSrc=0x154dfc0, pDst=0x25f60a0, maskFormat=<value optimized out>, xSrc=4, ySrc=848, ntrap=0, 
    traps=0x7fc0aa3fdeec) at exa_render.c:1181
#14 0x00000000004d1d11 in ProcRenderTrapezoids (client=0x1512d00) at render.c:778
#15 0x000000000042d521 in Dispatch () at dispatch.c:432
#16 0x000000000042157e in main (argc=6, argv=<value optimized out>, envp=<value optimized out>) at main.c:291

2) Panning Google Maps in Firefox

[613164.741465] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x00010010 Put 0x0001ae38 State 0x80020000 Push 0x00000000
[613164.743261] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - nSource: METHOD_CNT, nStatus: INVALID_STATE
[613164.743267] [drm] nouveau 0000:0f:00.0: PGRAPH_ERROR - Ch 1/2 Class 0x004a Mthd 0x0404 Data 0x00000000:0x001b0009
[613164.922394] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 1 Get 0x04000000 Put 0x0001b708 State 0xc0020000 Push 0x00000000

#0  0x00000030008d8997 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000300e403648 in drmIoctl (fd=10, request=1074291842, arg=0x7fff8ddb5cc0) at xf86drm.c:184
#2  0x000000300e4057ab in drmCommandWrite (fd=<value optimized out>, drmCommandIndex=<value optimized out>, data=<value optimized out>, size=<value optimized out>) at xf86drm.c:2381
#3  0x00007f333b6c2ded in nouveau_bo_wait (bo=0xcf3ab0, cpu_write=<value optimized out>, no_wait=<value optimized out>, no_block=<value optimized out>) at nouveau_bo.c:385
#4  0x00007f333b6c33fe in nouveau_bo_map_range (bo=0xcf3ab0, delta=0, size=<value optimized out>, flags=8) at nouveau_bo.c:428
#5  0x00007f333b90a66e in NVAccelUploadM2MF (pdpix=0x198c0d0, x=0, y=0, w=<value optimized out>, h=5, 
    src=0x198d0a0 "K\344\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\344J", src_pitch=76) at nouveau_exa.c:190
#6  nouveau_exa_upload_to_screen (pdpix=0x198c0d0, x=0, y=0, w=<value optimized out>, h=5, 
    src=0x198d0a0 "K\344\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\344J", src_pitch=76) at nouveau_exa.c:462
#7  0x00007f333b28afc7 in exaCopyDirty (migrate=<value optimized out>, pValidDst=0x198c178, pValidSrc=0x198c168, transfer=0x7f333b90a4a0 <nouveau_exa_upload_to_screen>, fallback_index=0, sync=0)
    at exa_migration_classic.c:220
#8  0x00007f333b28d77a in exaDoMigration_mixed (pixmaps=<value optimized out>, npixmaps=3, can_accel=<value optimized out>) at exa_migration_mixed.c:113
#9  0x00007f333b29475f in exaTryDriverComposite (op=3 '\003', pSrc=0x198ed90, pMask=0x1989530, pDst=0x1987750, xSrc=1125, ySrc=89, xMask=0, yMask=0, xDst=<value optimized out>, 
    yDst=<value optimized out>, width=73, height=5) at exa_render.c:734
#10 0x00007f333b2952de in exaComposite (op=3 '\003', pSrc=0x198ed90, pMask=0x1989530, pDst=0x1987750, xSrc=1125, ySrc=89, xMask=0, yMask=0, xDst=1125, yDst=89, width=73, height=5)
    at exa_render.c:1033
#11 0x00000000004d6993 in damageComposite (op=3 '\003', pSrc=0x198ed90, pMask=0x1989530, pDst=0x1987750, xSrc=1125, ySrc=89, xMask=0, yMask=0, xDst=1125, yDst=89, width=73, height=5) at damage.c:640
#12 0x00007f333bfdd6d8 in vncHooksComposite (op=3 '\003', pSrc=0x198ed90, pMask=0x1989530, pDst=0x1987750, xSrc=1125, ySrc=89, xMask=0, yMask=0, xDst=1125, yDst=89, width=73, height=5)
    at vncHooks.cc:554
#13 0x00007f333b296388 in exaTrapezoids (op=3 '\003', pSrc=0x198ed90, pDst=0x1987750, maskFormat=<value optimized out>, xSrc=1127, ySrc=89, ntrap=0, traps=0x2395068) at exa_render.c:1181
#14 0x00000000004d1d11 in ProcRenderTrapezoids (client=0x12b4440) at render.c:778
#15 0x000000000042d521 in Dispatch () at dispatch.c:432
#16 0x000000000042157e in main (argc=6, argv=<value optimized out>, envp=<value optimized out>) at main.c:291

Comment 18 John Sullivan 2011-09-12 12:16:23 UTC
Update: behaviour under latest F14 was a flurry of 20-odd [drm] messages every couple of hours, and on average every couple of days one of these would lock up the X server.

After updating (in-place using preupgrade) to F15 I've been running without visible problems for 23 days now (running kernel-2.6.40.3-0.fc15.x86_64). Between first boot and about 30 minutes after boot there were a bunch of exciting log messages including quite a few of these:

Aug 19 19:18:45  kernel: [  573.546987] DRHD: handling fault status reg 2
Aug 19 19:18:45  kernel: [  573.546996] DMAR:[DMA Read] Request device [0f:00.0] fault addr 0 
Aug 19 19:18:45  kernel: [  573.546997] DMAR:[fault reason 06] PTE Read access is not set
Aug 19 19:19:08  kernel: [  596.835796] DRHD: handling fault status reg 102
Aug 19 19:19:08  kernel: [  596.835805] DMAR:[DMA Read] Request device [0f:00.0] fault addr 0 
Aug 19 19:19:08  kernel: [  596.835807] DMAR:[fault reason 06] PTE Read access is not set

(0f:00.0 is the nVidia card, status reg goes 2, 102, 202...702)

About 30 minutes after first boot there was a bunch of 40 or so "PGRAPH - ERROR" messages:

Aug 19 19:42:11  kernel: [ 1977.414109] [drm] nouveau 0000:0f:00.0: PGRAPH - ERROR nsource: DATA_ERROR nstatus: BAD_ARGUMENT
Aug 19 19:42:11  kernel: [ 1977.414116] [drm] nouveau 0000:0f:00.0: PGRAPH - ch 1 (0x000d4000) subc 2 class 0x009f mthd 0x0184 data 0x00004000

(At this time the system had an all-default X/gnome package set.)

Then over the next two days (while I was experimenting with custom metacity builds I suspect, though I couldn't say the one caused the other) I got precisely two sets of:

Aug 20 16:31:39  kernel: [76814.328924] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 3 Get 0x011c5000 Put 0x011c50a0 State 0x80000000 (err: INVALID_CMD) Push 0x00000000
Aug 20 16:31:39  kernel: [76814.328936] DRHD: handling fault status reg 302
Aug 20 16:31:39  kernel: [76814.328941] DMAR:[DMA Read] Request device [0f:00.0] fault addr 0 
Aug 20 16:31:39  kernel: [76814.328942] DMAR:[fault reason 06] PTE Read access is not set

Aug 22 10:16:30  kernel: [226842.539535] [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 3 Get 0x011c5000 Put 0x011c50a0 State 0x80000000 (err: INVALID_CMD) Push 0x00000000
Aug 22 10:16:30  kernel: [226842.539553] DRHD: handling fault status reg 402
Aug 22 10:16:30  kernel: [226842.539559] DMAR:[DMA Read] Request device [0f:00.0] fault addr 0 
Aug 22 10:16:30  kernel: [226842.539560] DMAR:[fault reason 06] PTE Read access is not set

And apart from that it's been quiet since. No other [drm] messages, no lockups. Given all the above happened in the current uptime, I'm not sure I'd say all problems have been fixed but whatever's still going on it appears to be a lot less fatal than before.

Comment 19 Fedora End Of Life 2012-08-16 12:45:21 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.