1205985 – nouveau hang DMA_PUSHER MEM_FAULT unknown intr

Bug 1205985 - nouveau hang DMA_PUSHER MEM_FAULT unknown intr

Summary: nouveau hang DMA_PUSHER MEM_FAULT unknown intr

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	22
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ben Skeggs
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-03-26 04:07 UTC by Trevor Cordes
Modified:	2016-07-19 19:06 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-07-19 19:06:37 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel log (3.79 KB, text/plain) 2015-03-26 15:00 UTC, Kamil Dudka	no flags	Details
kernel log - another crash (19.21 KB, text/plain) 2015-03-26 18:19 UTC, Kamil Dudka	no flags	Details
/v/l/messages of the nouveau hang (13.73 KB, text/plain) 2015-03-29 17:45 UTC, Trevor Cordes	no flags	Details
another /v/l/messages when it hung (then recovered) (20.98 KB, text/plain) 2015-04-07 03:52 UTC, Trevor Cordes	no flags	Details
/v/l/messages of the nouveau hang (28.51 KB, text/plain) 2015-04-18 19:36 UTC, Trevor Cordes	no flags	Details
[PATCH] drm/nouveau: hold mutex when calling nouveau_abi16_fini() (955 bytes, patch) 2015-07-15 17:15 UTC, Kamil Dudka	no flags	Details \| Diff
crash with attachment #1052398 applied (21.86 KB, text/plain) 2015-07-16 12:05 UTC, Kamil Dudka	no flags	Details
[PATCH v2] nouveau: use locks to prevent list corruption (2.62 KB, patch) 2015-07-16 12:10 UTC, Kamil Dudka	kdudka: review?	Details \| Diff
jounald log (-b -1) (553.96 KB, text/x-vhdl) 2015-12-09 16:19 UTC, Pavel Zhukov	no flags	Details
Xorg log (148.20 KB, text/plain) 2015-12-09 16:21 UTC, Pavel Zhukov	no flags	Details
Show Obsolete (1) View All

Description Trevor Cordes 2015-03-26 04:07:54 UTC

Description of problem:
Out of the blue I just got a screen hang/freeze.  Mouse would still move, but keyboard was (mostly) dead (numlock press didn't toggle light).  Video was uncorrupted, and no artifacts.  System was still alive (periodic sounds I have programmed came out of the speakers), so it was just a graphics freeze.  Couldn't switch virtual terminal.  Alt-SysRq *did* work, and was able to sync and reboot.

Only 2 logs showed up:
Mar 25 22:17:51 pog kernel: [1375376.666000] nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 4 [firefox[32205]] get 0x0036f000 put 0x0036f090 state 0xc0000000 (err: MEM_FAULT) push 0x00000000
Mar 25 22:17:51 pog kernel: [1375376.666000] nouveau W[   PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 4

I had just sat back down at the computer after 3 hours absence and unlocked xscreensaver and switched a virtual workspace and click a window to focus, probably Firefox.  That's when it froze.

I haven't had a nouveau or other video problem in many years (3+).  System has been rock solid.  Just recently, system was just solid while up from Mar 10 on 3.18.8-201.fc21 until this hang.

I know this will be difficult to debug unless it starts happening a lot.  I am opening this bug in case anyone else sees it and can add info.  And to jog my memory so I can update it if it happens again.


Version-Release number of selected component (if applicable):
xorg-x11-drv-nouveau-1.0.11-1.fc21.x86_64
kernel-3.18.8-201.fc21.x86_64


How reproducible:
Not.  Just happened once so far.


Steps to Reproduce:
1. Use X
2. unlock screensaver and focus a window

Actual results:
freeze of video, rest of OS was running fine.

Expected results:
no video freeze.


Additional info:
01:00.0 VGA compatible controller: NVIDIA Corporation G73 [GeForce 7600 GS] (rev a1)

Comment 1 Kamil Dudka 2015-03-26 14:59:43 UTC

This just happened to me the second time today.  It must be a bug triggered by a recent update.  In my case, it was with kernel-3.19.1-201.fc21.x86_64.  I tried to boot kernel-3.18.9-200.fc21.x86_64 instead, will observe if the bug goes away.

The GK in my case is:

0f:00.0 VGA compatible controller: NVIDIA Corporation NV44 [Quadro NVS 285] (rev a1)

After the desktop freeze, I was still able to connect the host with SSH, but I was not able to kill the X server.

I will attach the relevant contents of kernel log...

Comment 2 Kamil Dudka 2015-03-26 15:00:27 UTC

Created attachment 1006863 [details]
kernel log

Comment 3 Kamil Dudka 2015-03-26 18:19:27 UTC

Created attachment 1006962 [details]
kernel log - another crash

Now it happened with kernel-3.18.9-200.fc21.x86_64, too.

I suspect the bug is triggered by Firefox...

Comment 4 Trevor Cordes 2015-03-26 20:10:40 UTC

Hi, thanks for adding input.  I had it freeze on 3.18.8, so try earlier than that.  I ran 3.18.5-201.fc21 for over a month with no freeze, so perhaps somewhere between 3.18.5 and 3.18.8 the problem started.  I am trying 3.19.1 now, but if the crash happens again I'll start rolling back the updates.

Also just recently changed (3 days) before the bug started was firefox-36.0-1.fc21.x86_64 to firefox-36.0.3-1.fc21.x86_64.

More importantly, though, wouldn't any nouveau bug be in xorg-x11-drv-nouveau?  Or is most of the code in the kernel?  Anyhow, my xorg-x11-drv-nouveau hasn't been updated since Jan 2, so it probably isn't any changes in there causing it.  If that's the case, then perhaps the component of this bug should be changed to kernel?  I'm unsure.

If you figure out a way to trigger this bug more quickly / on demand, let us know, as if we can do that then perhaps I could bisect the kernel.

Comment 5 Kamil Dudka 2015-03-27 10:55:54 UTC

I updated to firefox-36.0.3-1.fc21.x86_64 on March 25 and since then I have been observing these crashes.  I believe that a change in user space revealed some bug in the driver because an unprivileged user space program should not be able to cause list corruption in kernel space.  I have disabled HW acceleration in Firefox, will see whether the problem recurs or not.

Comment 6 Trevor Cordes 2015-03-28 03:34:40 UTC

I am updating now to the Firefox 36.0.4-1.fc21 that just came out.  We'll see if that helps.  In any case, like you said, userspace should not cause crashes of a driver.  Interesting idea to disable HW accel, I guess we'll see.  This one will be very hard to solve unless more people get bit by it and start complaining.

Comment 7 Trevor Cordes 2015-03-29 17:43:44 UTC

Just hit me again.  Exact same thing, I came out of xscreensaver, typed maybe one key in a terminal (enter) and then clicked a firefox and the whole X server hung.  This time instead of rebooting I waited as I contemplated my SysRq sequence and the system recovered about 2 minutes later!  All of a sudden X was working again.  So if you see this, try waiting it out before rebooting and you may regain control.

Comment 8 Trevor Cordes 2015-03-29 17:45:10 UTC

Created attachment 1008036 [details]
/v/l/messages of the nouveau hang

Comment 9 Trevor Cordes 2015-04-07 03:50:49 UTC

Just happened again.  It recovered again after a ~60s complete screen freeze (except mouse).  Same errors in /v/l/m.  Similar trigger (xscreensaver unlock followed by work in a gnome-terminal that updates a firefox page).

Will attach log.

Comment 10 Trevor Cordes 2015-04-07 03:52:50 UTC

Created attachment 1011610 [details]
another /v/l/messages when it hung (then recovered)

Comment 11 Trevor Cordes 2015-04-07 04:15:19 UTC

Interesting, I just had X completely freeze on me, and not recover, exactly 6 minutes after the above recovered-hang.  I guess this time the hang caused enough instability that it took the rest of the system down.  During that 6 minutes I had updated the bug report (above, at 23:50) then went back to what I was doing, using a script in a gnome-terminal to remote control firefox.  It updated 2 or 3 web pages and then froze.

This was a hard freeze.  The mouse cursor disappeared completely.  And this time Alt-SysRq did NOT work.  I could not ping/ssh the box from another box.  It was dead.  Had to hit reset button.

/v/l/m has no useful info at all, just gibberish ^@^@^@... between the crash and the reset.

Oh well, at least now I'm in 3.19.3-200.fc21 instead of the 3.19.1-201 I was running when it crashed.

Comment 12 Kamil Dudka 2015-04-07 10:07:18 UTC

(In reply to Trevor Cordes from comment #6)
> Interesting idea to disable HW accel, I guess we'll see.

Nope, unchecking the "Use hardware acceleration when available" button in Firefox preferences did not mitigate this bug.  I am still experiencing kernel memory corruption few times a day on my workstation.

Comment 13 Trevor Cordes 2015-04-16 07:49:47 UTC

Interesting.  I just had it hit again.  A temporary freeze this time, about 30s this time.  Mouse cursor still moved.  I know exactly what I was doing when it hit:  watching mythtv recorded video for a long time (no other interactive activity), then I clicked firefox launcher on the xfce panel.  It froze immediately when I clicked, before any firefox drew on the screen (you'd never even know I had clicked on it, but I had).  It froze the playing mythtv video.

Also strange this time is the /v/l/m just has a couple of nouveau entries, with no general kernel warnings or oops:

Apr 16 02:33:47 pog kernel: [790156.423132] nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 4 [firefox[13117]] get 0x00761000 put 0x00761090 state 0xc0000000 (err: MEM_FAULT) push 0x00000000
Apr 16 02:33:47 pog kernel: [790156.423138] nouveau W[   PFIFO][0000:01:00.0] unknown intr 0x00010000, ch 4
Apr 16 02:34:02 pog kernel: [790171.433008] nouveau E[firefox[13117]] failed to idle channel 0xcccc0000 [firefox[13117]]
Apr 16 02:34:17 pog kernel: [790186.433007] nouveau E[firefox[13117]] failed to idle channel 0xcccc0000 [firefox[13117]]

And that's it.  No other log related to this incident.  Very strange.  Perhaps the 3.19.3-200.fc21 kernel I was in changed something?  Or maybe I got "lucky".  In any case, I rebooted (normally) shortly thereafter to mitigate any "aftershock" crash like last time.

Comment 14 Trevor Cordes 2015-04-18 19:35:35 UTC

Just had it hit again.  It's becoming more frequent, it seems.  Similar sequence: was in screensaver, unlocked it, did a couple of terminal commands, did one that remote reloads a firefox page, page redrew then X froze, no cursor showing or movement this time.  However, system was still alive and responded to alt-sysrq so I was able to clean reboot.

This time it left behind several extensive crash backtraces in the logs.  Attaching.

Comment 15 Trevor Cordes 2015-04-18 19:36:36 UTC

Created attachment 1015937 [details]
/v/l/messages of the nouveau hang

Comment 16 Trevor Cordes 2015-04-21 20:01:00 UTC

And again (same situation: screensaver/firefox). This time I just got the DMA_PUSHER/firefox 2 errors and a frozen screen (with mouse cursor working). It didn't unfreeze when I waited. System was accessible with ssh and I tried a few things: I kill -9'd the Xorg ps and lightdm and they respawned and my screen came back and I was able to log in normally. Without reboot! So that's something to try.

I decided to check out the hardware, and remembered the fan on my video card was dead, but it's been dead for years. I have large 120mm fans blowing on it from nearby so I wasn't worried. However, the VC's heat sink was blazing hot, way hotter than I would like. So the fans weren't cooling it enough (cables got in the way). So I rigged a 60mm good quality fan right on the VC heatsink (after taking off the old dead fan). It's staying much cooler now. We'll see if that helps.

I thought since not many people are having this problem, maybe it's hardware. It is an old VC. Kamil, maybe you can check how hot your card is getting (dead fans, etc). Perhaps the newest versions of nouveau/firefox are doing some heavier work in the card, causing it to get hotter than it used to. Just a theory for now, we'll see.

I did try putting a newer, better card into the system my system wouldn't POST with it in (yes, I did have all the extra power plugged in). I have seen it where older (mine is D975X) systems don't like newer (2.0+) high-power cards. It may be hard for me to find a replacement as I'll need a compatible 2-DVI (or 2-digital, but not 1 dig+1 ana) card of a compatible vintage.

Comment 17 Trevor Cordes 2015-04-21 22:07:33 UTC

Argh, just happened again.  No screensaver this time, just trying to open a new firefox window.  Freeze recovered on its own after about 60s.  Nouveau logs showed push as well as "failed to idle channel" usual messages.  I checked the hardware this time right while it was happening and the video card heatsink was as cool as a cucumber.  So heat level doesn't seem to be the problem.

I'm going to actively seek out a replacement nvidia card to try, as this frequency of crashing is making life unbearable, having to restart all my work each time.  I'm not sure why the frequency has gone up so much lately.

Current versions:
xorg-x11-drv-nouveau-1.0.11-1.fc21.x86_64
kernel-3.19.3-200.fc21.x86_64
firefox-37.0.1-1.fc21.x86_64

Comment 18 Trevor Cordes 2015-04-22 04:30:49 UTC

May be the same as bug #1177355.  That bug gave me a good idea, I should be adding my Xorg.log to my bugzilla posts.

Here's what my latest crash shows:
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/libexec/Xorg.bin (mieqEnqueue+0x24b) [0x5796db]
(EE) 1: /usr/libexec/Xorg.bin (QueuePointerEvents+0x52) [0x450af2]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2f0f) [0x7f738eb9091f]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x3655) [0x7f738eb91be5]
(EE) 4: /usr/libexec/Xorg.bin (DPMSSupported+0xe8) [0x4774c8]
(EE) 5: /usr/libexec/Xorg.bin (xf86SerialModemClearBits+0x277) [0x4a1ec7]
(EE) 6: /lib64/libc.so.6 (__restore_rt+0x0) [0x7f7398b4c95f]
(EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x7f7398c0e407]
(EE) 8: /lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f7399ef5b18]
(EE) 9: /lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f7399ef867b]
(EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0x8c) [0x7f7393a8a76c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x2e73) [0x7f7393c9a1c3]
(EE) 12: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x21f) [0x7f73938748cf]
(EE) 13: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x3867) [0x7f739387b697]
(EE) 14: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x493b) [0x7f7393885ffb]
(EE) 15: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x1690) [0x7f739387fde0]
(EE) 16: /usr/libexec/Xorg.bin (DamageRegionAppend+0x541) [0x51f0b1]
(EE) 17: /usr/libexec/Xorg.bin (AddTraps+0x4154) [0x518954]
(EE) 18: /usr/libexec/Xorg.bin (SendErrorToClient+0x2f7) [0x4391b7]
(EE) 19: /usr/libexec/Xorg.bin (remove_fs_handlers+0x416) [0x43d316]
(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f7398b37fe0]
(EE) 21: /usr/libexec/Xorg.bin (_start+0x29) [0x4276ee]
(EE) 22: ? (?+0x29) [0x29]

Comment 19 Kamil Dudka 2015-04-22 08:20:13 UTC

I am pretty sure this is a software bug.  My graphics card (NVIDIA Corporation NV44 [Quadro NVS 285] (rev a1), according to lspci) is cooled passively and its temperature is low (approx. 40 °C from outside).  Also the freeze does not happen while rendering a complex 3D scene.  It usually happens when I try to create a new tab in Firefox after a few minutes of idle state.

Yesterday I updated to Fedora 22 in the hope that it will fix/mitigate this bug.  This updated KDE to 5.x version, which uses an inherently accelerated graphic toolkit, and it completely prevented all windows from being rendered.  I again suspect the nouveau driver here.

I had to put:

    Option "NoAccel" "true"

... to the Device section in my xorg.conf to make my desktop somewhat usable again.  I have not seen any kernel crash or memory corruption since then, will report if the issue recurs.

Also by using software rendering, my desktop has unacceptable latency when switching windows etc.  So I am going to try the proprietary NVidia drivers to check whether they will work any better.

Comment 20 Trevor Cordes 2015-04-25 05:51:07 UTC

Right after my last report (3 days ago), I switched to using the nvidia binary drivers (akmods from rpmfusion) and boom, the bug disappeared. Haven't had a single hiccup for 3 days. And before I changed over, the crashing was starting to happen every couple of hours. Completely impossible to get any work done.

So (unless it crashes on me in the next days/weeks) that means my hardware is fine (so the fan thing was unnecessary, but probably a good idea anyhow). This is 100% a bug in either kernel, nouveau and/or firefox. Well, even if firefox triggers it, it's not firefox's fault since it's userland and userland should never crash a kernel video card driver.

Looking back at the dates I reported in this bug of each crash, and thinking back to each time the crash got more frequent, it looks like it is firefox that is making things progressively worse. The first crash was shortly after I installed firefox firefox-36.0.3-1.fc21.x86_64. firefox-36.0.4-1.fc21.x86_64 didn't seem to alter anything. firefox-37.0-2.fc21.x86_64 didn't either. firefox-37.0.1-1.fc21.x86_64 seemed to be when things started getting really bad.

This could be a red herring though and perhaps the kernel is more telling. Let's see, I rebooted into 3.19.1-201.fc21 (from 3.18.8) Mar 25 and the next day I get my first crash. That is interesting. But then I was in 3.19.3-200.fc21 from Apr 6 onwards. And the crashing got way more frequent Apr 16 on.

Maybe it's kernel 3.19 + firefox-36.0.3 that started it, with newer firefoxes being worse. (Or maybe kernel 3.18.9 as Kamil reported, with 3.18.8 being the last good one.)

I guess we'll wait for more "me toos" to narrow the scope and increase the interest. And wait to see if 4.0 really fixes it or not. In the meantime, it's nvidia binary for me.

Comment 21 Andreas M. Kirchwitz 2015-05-12 22:19:49 UTC

That looks pretty much the same what I'm experiencing for about two or three months now.

In the beginning, the computer just froze when it happened. Usually while running Firefox or SeaMonkey. X11 session didn't update graphics, mouse dead, keyboard dead. No remote login, no Magic SysRq, no syslog messages. All dead.

After some time it changed a little so I could remotely login (to trigger a reboot) or use the Magic SysRq (REISUB).

Now for some weeks the bug changed again. The computer doesn't freeze but for about 30 seconds one CPU core is at 100%. Still related to Firefox. Sometimes it happens after Firefox was used for a while, but for some days (?) it happens quite often on a freshly booted system at first start of Firefox. While the CPU core is at 100%, Firefox is basically dead. After about 30 seconds, everything seems to be fine, Firefox (no restart) also seems to work fine. But usually I do a reboot at this point because I no longer trust the internal state of the system.

When it happens, syslog messages are similar but not exactly the same:


May 11 03:50:56 myhost kernel: nouveau E[   PFIFO][0000:00:05.0] DMA_PUSHER - ch 4 [firefox[20151]] get 0x0036a000 put 0x0036a090 state 0x80000000 (err: INVALID_CMD) push 0x00000000
May 11 03:50:57 myhost kernel: nouveau E[   PFIFO][0000:00:05.0] DMA_PUSHER - ch 4 [firefox[20151]] get 0x20000000 put 0x0036a0b8 state 0xc0000000 (err: MEM_FAULT) push 0x00000000
May 11 03:51:12 myhost kernel: nouveau E[firefox[20151]] failed to idle channel 0xcccc0000 [firefox[20151]]
May 11 03:51:27 myhost kernel: nouveau E[firefox[20151]] failed to idle channel 0xcccc0000 [firefox[20151]]


May 12 22:52:09 myhost kernel: nouveau E[   PFIFO][0000:00:05.0] DMA_PUSHER - ch 3 [firefox[26771]] get 0x20000000 put 0x0026a0b8 state 0xc0020000 (err: MEM_FAULT) push 0x00000000
May 12 22:52:24 myhost kernel: nouveau E[firefox[26771]] failed to idle channel 0xcccc0000 [firefox[26771]]
May 12 22:52:24 myhost kernel: nouveau E[   PFIFO][0000:00:05.0] DMA_PUSHER - ch 3 [firefox[26771]] get 0x20000000 put 0x0026a0c0 state 0xc0020000 (err: MEM_FAULT) push 0x00000000
May 12 22:52:39 myhost kernel: nouveau E[firefox[26771]] failed to idle channel 0xcccc0000 [firefox[26771]]


Graphics "card" is on-board graphics (lspci -v):

00:05.0 VGA compatible controller: NVIDIA Corporation C51PV [GeForce 6150] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. A8N-VM CSM
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 16
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
        [virtual] Expansion ROM at 77f00000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Kernel driver in use: nouveau
        Kernel modules: nouveau

Worked rock-solid all the years before!

Kernel is running with option "nouveau.config=NvMSI=0" otherwise the system would crash immediately. It's a known nouveau bug introduced in kernel 3.13 (don't think it has ever been fixed). That's also the reason why Fedora 21 cannot be installed on any machine with GeForce 6150 (except you add that kernel option). But that's a different story.

In an other ticket (for F22) there was a nouveau problem that was related to libdrm 2.4.60. Tried downgrade to 2.4.59 but that didn't fix the problem.

Comment 22 Kamil Dudka 2015-05-13 09:09:07 UTC

(In reply to Andreas M. Kirchwitz from comment #21)
> That looks pretty much the same what I'm experiencing for about two or three
> months now.
> 
> In the beginning, the computer just froze when it happened. Usually while
> running Firefox or SeaMonkey. X11 session didn't update graphics, mouse
> dead, keyboard dead. No remote login, no Magic SysRq, no syslog messages.
> All dead.
> 
> After some time it changed a little so I could remotely login (to trigger a
> reboot) or use the Magic SysRq (REISUB).
> 
> Now for some weeks the bug changed again. The computer doesn't freeze but
> for about 30 seconds one CPU core is at 100%. Still related to Firefox.
> Sometimes it happens after Firefox was used for a while, but for some days
> (?) it happens quite often on a freshly booted system at first start of
> Firefox. While the CPU core is at 100%, Firefox is basically dead. After
> about 30 seconds, everything seems to be fine, Firefox (no restart) also
> seems to work fine. But usually I do a reboot at this point because I no
> longer trust the internal state of the system.

The bug had similar progress on my box, too.  I upgraded to Fedora 22 meanwhile but it did not fix the bug.  Sometime the system freezes for a couple of seconds.  Slightly less frequently the GUI freezes completely but I am still able to login via SSH.  Then I am able to check logs etc. but attempt to cleanly reboot freezes the system anyway and I have to reboot via SysRq.

> In an other ticket (for F22) there was a nouveau problem that was related to
> libdrm 2.4.60. Tried downgrade to 2.4.59 but that didn't fix the problem.

I already tried to downgrade kernel, libdrm, firefox to no avail.

I wanted to try the proprietary drivers from RPM Fusion but did not find them for Fedora 22 and had not enough time/energy to build them myself, will give it another try later.

Comment 23 Andreas M. Kirchwitz 2015-06-09 17:33:32 UTC

I can confirm that this bug still exists on Fedora 22 (64 bit).

BTW: I've learned that "nouveau.config=NvMSI=0" is no longer necessary because kernel 3.18.4 no longer automatically enables MSI for cards not supporting it properly. But that's a different story...

Comment 24 Kamil Dudka 2015-07-15 17:15:57 UTC

Created attachment 1052398 [details]
[PATCH] drm/nouveau: hold mutex when calling nouveau_abi16_fini()

I believe the attached patch prevents the list_del corruption as captured in attachment #1006962 [details], attachment #1008036 [details], attachment #1011610 [details], attachment #1011610 [details], and attachment #1015937 [details].

Comment 25 Kamil Dudka 2015-07-16 12:05:10 UTC

Created attachment 1052666 [details]
crash with attachment #1052398 [details] applied

(In reply to Kamil Dudka from comment #24)
> Created attachment 1052398 [details]
> [PATCH] drm/nouveau: hold mutex when calling nouveau_abi16_fini()

This appears to be insufficient.  The attached crash happened to me despite the patch was applied.  I will attach an extended version of the patch.

Comment 26 Kamil Dudka 2015-07-16 12:10:09 UTC

Created attachment 1052669 [details]
[PATCH v2] nouveau: use locks to prevent list corruption

Comment 27 Kamil Dudka 2015-07-30 08:02:47 UTC

(In reply to Kamil Dudka from comment #26)
> Created attachment 1052669 [details]
> [PATCH v2] nouveau: use locks to prevent list corruption

The patch looks solid.  I have not seen the list corruption in the last two weeks.  Without the patch it happened to me several times a day.  Could you please consider this patch for inclusion?

Comment 28 Ben Skeggs 2015-08-03 00:58:49 UTC

(In reply to Kamil Dudka from comment #27)
> (In reply to Kamil Dudka from comment #26)
> > Created attachment 1052669 [details]
> > [PATCH v2] nouveau: use locks to prevent list corruption
> 
> The patch looks solid.  I have not seen the list corruption in the last two
> weeks.  Without the patch it happened to me several times a day.  Could you
> please consider this patch for inclusion?

I've actually already picked up the patch and sent it for inclusion in 4.2.  I'll be marking these, and a couple of others, for backporting to 4.1-stable too.

Thank you :)

Comment 29 Kamil Dudka 2015-08-03 08:17:44 UTC

Perfect.  Thank you for taking care of it, Ben!

Comment 30 Fedora End Of Life 2015-11-04 10:57:52 UTC

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 31 Andreas M. Kirchwitz 2015-11-09 01:29:10 UTC

On my PC I haven't seen this nouveau bug for some time now. *fingerscrossed*

Because it didn't happen every day it's hard to say when it actually stopped, but it feels like the bug had already gone in one of the later 4.1.x kernels. And I can't remember it at all for the 4.2.x kernels. For a couple of days I'm now running Fedora 23 which comes with 4.2.x kernel series anyway.

Hopefully the problem has gone for other users too.

Comment 32 Josh Boyer 2015-11-09 14:20:59 UTC

Thank you.

Comment 33 Pavel Zhukov 2015-12-09 16:19:03 UTC

Reopening.

I start hitting the probles while ago without any visible reasons. The X system is hang on screensaver. The system itself is alive and It's possible to login via ssh and reboot the machine. Killing of X server works _sometimes_, killing of parent bash process always work but it's not possible to start X server anymore. 

Dec 09 16:45:29 pzhukov-workstation.usersys.redhat.com kernel: nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 6 [topblock[5076]] get 0x002f8000 put 0x002f8010 state 0x80000054 (err: INVALID_CMD) push 0x00000000
Dec 09 16:45:29 pzhukov-workstation.usersys.redhat.com kernel: nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 6 [topblock[5076]] get 0x002f8010 put 0x002f8084 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Dec 09 16:47:39 pzhukov-workstation.usersys.redhat.com kernel: nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 6 [topblock[5076]] get 0x002f8084 put 0x002f8094 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Dec 09 16:50:01 pzhukov-workstation.usersys.redhat.com systemd[1]: Starting system activity accounting tool...
Dec 09 16:50:01 pzhukov-workstation.usersys.redhat.com systemd[1]: Started system activity accounting tool.
Dec 09 16:50:01 pzhukov-workstation.usersys.redhat.com audit[1]: <audit-1130> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 16:50:01 pzhukov-workstation.usersys.redhat.com audit[1]: <audit-1131> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 17:00:01 pzhukov-workstation.usersys.redhat.com systemd[1]: Starting system activity accounting tool...
Dec 09 17:00:01 pzhukov-workstation.usersys.redhat.com systemd[1]: Started system activity accounting tool.
Dec 09 17:00:01 pzhukov-workstation.usersys.redhat.com audit[1]: <audit-1130> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 17:00:01 pzhukov-workstation.usersys.redhat.com audit[1]: <audit-1131> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 17:01:01 pzhukov-workstation.usersys.redhat.com CROND[5645]: (root) CMD (run-parts /etc/cron.hourly)
Dec 09 17:01:01 pzhukov-workstation.usersys.redhat.com run-parts[5648]: (/etc/cron.hourly) starting 0anacron
Dec 09 17:01:01 pzhukov-workstation.usersys.redhat.com run-parts[5654]: (/etc/cron.hourly) finished 0anacron
Dec 09 17:02:23 pzhukov-workstation.usersys.redhat.com kernel: sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(V) show-blocked-tasks(w) dump-ftrace-buffer(z)

Comment 34 Pavel Zhukov 2015-12-09 16:19:59 UTC

Created attachment 1103976 [details]
jounald log (-b -1)

Comment 35 Pavel Zhukov 2015-12-09 16:21:02 UTC

Created attachment 1103977 [details]
Xorg log

Comment 36 Pavel Zhukov 2015-12-09 16:22:36 UTC

xorg-x11-drv-nouveau-1.0.11-2.fc22.x86_64
kernel-4.2.6-200.fc22.x86_64

Comment 37 Trevor Cordes 2015-12-09 23:49:14 UTC

Could it be a new bug?  I haven't seen this bug since upgrading to Fedora 22 1+ months ago and switching back to nouveau.  Does it happen often or rarely?

Maybe Andreas could confirm he still doesn't see the bug.

Might be a new bug worth opening a new bug for.

My versions are exactly the same as yours.  This bug used to hit me every day until they fixed it.

Comment 38 Andreas M. Kirchwitz 2015-12-10 00:26:12 UTC

Luckily, still haven't been hit by that nasty bug so far.

It may be worth mentioning that my Nvidia chipset (NV4E; NV40 family) doesn't work well with any kind of hardware acceleration. It is turned off automatically but if applications enforce it to be used, they will crash sooner or later, and sometimes even the whole system goes mad.

Comment 39 Trevor Cordes 2015-12-10 05:59:29 UTC

I have:
NVIDIA Corporation G73 [GeForce 7600 GS]

I never turned off (or on) h/w accel, so I don't know if it's even on for mine.  However, 2D performance is super fast on my 2 x 1600x1200 monitors, and the very little 3D I do (screensavers mostly) seems fast, so I assume it's working.  Not sure how I would find out otherwise?

I don't use a 3D / compiz type desktop, just simple XFCE with sawfish wm.  Anyhow, since it's been at least a month with no crashes on my system, I'm pretty sure the bug as it hit me, is fixed.  Doesn't mean a similar one can't hit someone else!  But it's always nice to get a "me too" out there somewhere first to ensure it's not a h/w problem.

Comment 40 Pavel Zhukov 2015-12-11 09:07:17 UTC

Workaround'ed the problem by disabling all 3D screensavers in xscreensaver  config.  However this configuration worked fine until last month
I've opened new bug https://bugzilla.redhat.com/show_bug.cgi?id=1290402 which can be related but looks quite different.

Comment 41 Fedora End Of Life 2016-07-19 19:06:37 UTC

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.