Bug 743608

Summary: laptop freezing hard
Product: [Fedora] Fedora Reporter: kevin martin <ktmdms>
Component: kernelAssignee: Ben Skeggs <bskeggs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: airlied, ajax, bskeggs, bulk, collura, gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-08 17:47:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Xorg log from when freeze occured.
none
messages log from when freeze occured.
none
lxdm log from when freeze occured (using lxde).
none
kern messages log from time of freeze.
none
message compilation when noaccel was turned off on boot.
none
message compilation when vram_pushbuf was turned off at boot (with noaccel turned on).
none
compilation log with no nouveau.noaccel and no vram_pushbuf kernel flags.
none
compilation log with vram_pushbuf kernel flag but no nouveau.noaccel flag.
none
compilation log with acceleration and no vram_pushbuf using kernel rc9.git0.1
none
compilation log with acceleration but this time with vram_pushbuf=1 using kernel rc9.git0.1
none
acceleration on, nouveau.vram_pushbuf=1, nouveau.vram_notify=1, drm.debug off, X still comes up black and white squares. none

Description kevin martin 2011-10-05 14:06:34 UTC
Created attachment 526501 [details]
Xorg log from when freeze occured.

Description of problem:Laptop randomly freezes hard (no mouse, keyboard, network connectivity).  Must hold power button to power down and then restart.


Version-Release number of selected component (if applicable):
xorg-x11-drv-nouveau-0.0.16-27.20110720gitb806e3f.fc17.x86_64
kernel 3.1.0-0.rc8.git0.0.fc17.x86_64


How reproducible:
happens daily, sometimes multiple times.


Steps to Reproduce:
1.boot system
2.do "stuff" (nothing in particular appears to cause the actual freeze).
3.watch the system freeze.
  
Actual results:
system freezes.


Expected results:
usable X system.

Additional info:running with kernel boot flags:

nouveau.noaccel=1 vram_pushbuf=1 drm.debug=14 log_buf_len=16M

Without the first two flags I can't get a usable X screen; 2nd two will hopefully provide enough debug material to help figure this out.

Comment 1 kevin martin 2011-10-05 14:07:52 UTC
Created attachment 526502 [details]
messages log from when freeze occured.

Comment 2 kevin martin 2011-10-05 14:08:39 UTC
Created attachment 526503 [details]
lxdm log from when freeze occured (using lxde).

Comment 3 kevin martin 2011-10-05 14:09:17 UTC
Created attachment 526504 [details]
kern messages log from time of freeze.

Comment 4 kevin martin 2011-10-05 14:13:03 UTC
FWIW, this has been happening for some time now (not just the latest nouveau driver).  It's just in the last week or so that I picked up how to start running nouveau with the kernel boot time flags so I could actually use it (was running the proprietary nVidia driver previously; it also caused hangs) and to try to get some debugging info.

Comment 5 Ben Skeggs 2011-10-05 22:43:19 UTC
The fact that you're using nouveau in noaccel mode (meaning, it's doing pretty much nothing after it lights up your screen), and that you're also seeing these hangs while using the NVIDIA binary driver, I strongly suspect your issue is elsewhere.

Reassigning to the kernel, but it might be worth unloading all the kernel modules you can manage and seeing if you can narrow it down to one of those causing the hangs?

That said, I expect you're running nouveau in noaccel mode because it fails completely without?  I have an idea of the first place to look in your case (we badly misdetect your video memory), can you either email me personally or open another bug for that.

Comment 6 kevin martin 2011-10-06 15:35:04 UTC
I've now rebooted with noaccel turned off and the vram_pushbuf turned off (individually).  With noaccel turned off, I get a garbled, unusable X windows.  I've attached log.log (a compilation of /var/log/messages|dmesg.log|Xorg.0.log|lxdm.log) for supporting information for that run.  When I turned off vram_pushbuf (with noaccel turned on) I get a usable X windows, but of course noaccel is on.  I've attached log2.log which is the same compilation of /var/log logs but for the time-frame involved.

I find it interesting to note that the listed, "supported" nVidia cards (in Xorg.0.log) don't include specifically the card I'm using (it's an NV50 card and/or an NVa3 card).  The driver apparently supports:


[    31.313] (II) NOUVEAU driver for NVIDIA chipset families :
[    31.313] 	RIVA TNT        (NV04)
[    31.314] 	RIVA TNT2       (NV05)
[    31.314] 	GeForce 256     (NV10)
[    31.315] 	GeForce 2       (NV11, NV15)
[    31.315] 	GeForce 4MX     (NV17, NV18)
[    31.316] 	GeForce 3       (NV20)
[    31.316] 	GeForce 4Ti     (NV25, NV28)
[    31.317] 	GeForce FX      (NV3x)
[    31.317] 	GeForce 6       (NV4x)
[    31.318] 	GeForce 7       (G7x)
[    31.318] 	GeForce 8       (G8x)
[    31.319] 	GeForce GTX 200 (NVA0)
[    31.319] 	GeForce GTX 400 (NVC0)

Unfortunately, neither NV5x nor NVA3 are in the list.  I also find it peculiar that I'm running on a 3.1 kernel but X has only been compiled up to a 2.6 kernel.  Don't know if that would have any relevance but if X needs/uses kernel headers and such, why wouldn't it be relevant?

Comment 7 kevin martin 2011-10-06 15:36:56 UTC
Created attachment 526728 [details]
message compilation when noaccel was turned off on boot.

Comment 8 kevin martin 2011-10-06 15:37:35 UTC
Created attachment 526730 [details]
message compilation when vram_pushbuf was turned off at boot (with noaccel turned on).

Comment 9 Ben Skeggs 2011-10-07 06:26:40 UTC
This corruption issue should probably have been addressed in a new bug, but, anyway :)  The list is confusing, but don't worry, NVA3 is supported and NV5x is in the list as G8x.

Also, with noaccel on, vram_pushbuf has *zero* effect.  By garbled and unusable is it just visual issues?  The GPU doesn't hang too?

I've started a kernel[1] building that has a patch which *might* be of some help, it'll address nouveau complaining about your memory controller configuration at the very least.

I'll be leaving the office probably before it finishes building, but if you're around before it's finished, give it an hour or so and check again.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=3411776

Comment 10 kevin martin 2011-10-07 13:52:41 UTC
When noaccel is off (so acceleration is on) I can access the machine via ssh but can't use X at all.  The screen is essentially a mix of black and white blocks at that point.  I'll install the new kernel, turn acceleration back on (and leave vram_pushbuf off) and let you know what happens.

Comment 11 kevin martin 2011-10-07 14:41:27 UTC
Created attachment 526904 [details]
compilation log with no nouveau.noaccel and no vram_pushbuf kernel flags.

with accleration on and no vram_pushbuf statement X still is unusable.  I get a screen with black and white blocks and I can't ctrl-alt to any other VT's, nor ctrl-alt-backspace to try to restart the X server.

Comment 12 kevin martin 2011-10-07 14:42:39 UTC
Created attachment 526905 [details]
compilation log with vram_pushbuf kernel flag but no nouveau.noaccel flag.

with acceleration on and the vram_pushbuf flag set to 1 X still is unusable.  I get a screen with black and white blocks and I can't ctrl-alt to any other VT's, nor ctrl-alt-backspace to try to restart the X server.

Comment 13 Ben Skeggs 2011-10-07 15:24:37 UTC
Those logs aren't from the kernel above, it should be kernel-3.1.0-0.rc9.git0.1, and the logs are from git0.0.

Don't worry about using drm.debug at all either, I have the useful info from there now.

Comment 14 kevin martin 2011-10-07 22:54:05 UTC
I yum updated this morning and that's the kernel I got.  yum update tonight still doesn't pick up a 0.1 version.

Comment 15 Ben Skeggs 2011-10-07 23:36:51 UTC
The end of comment 9 on this bug has a link to a kernel build with test patches in it, these won't appear in updates yet.

Comment 16 kevin martin 2011-10-10 15:00:44 UTC
Created attachment 527252 [details]
compilation log with acceleration and no vram_pushbuf using kernel rc9.git0.1

Same result with this kernel as far as having unusable X with acceleration on and vram_pushbuf not set at boot time.  Still a black and white screen, no keyboard recognition (can't switch into any other VT's).

Comment 17 kevin martin 2011-10-10 15:02:21 UTC
Created attachment 527253 [details]
compilation log with acceleration but this time with vram_pushbuf=1 using kernel rc9.git0.1

Same result with this kernel as far as having unusable X with acceleration on and vram_pushbuf=1 set at boot time.  Still a black and white screen, no keyboard recognition (can't switch into any other VT's).  Have fallen back to nouveau.noaccel=1 again to get back to a usable system.

Comment 18 kevin martin 2011-10-10 15:03:32 UTC
Are there any other debug switches I can turn on at boot time that would be helpful?  I missed the comment about turning off drm.debug.  I'll do that for future attempts.

Comment 19 Ben Skeggs 2011-10-10 23:08:53 UTC
Your GPU is still locking up right after init for some reason.  Btw, it's nouveau.vram_pushbuf=1, also if that's failing you'll probably want nouveau.vram_notify=1 also.

Comment 20 kevin martin 2011-10-11 13:58:34 UTC
Created attachment 527446 [details]
acceleration on, nouveau.vram_pushbuf=1, nouveau.vram_notify=1, drm.debug off, X still comes up black and white squares.

Comment 21 kevin martin 2011-10-13 18:18:26 UTC
Wondering at this point if I should be trying the nvidia driver again as I'm not sure that this topic is being worked at this point.  Don't know if the nvidia driver will be any better and would rather not have to go the proprietary route but would like to have X acceleration if possible and don't know how else to proceed at this juncture.

Comment 22 kevin martin 2011-10-20 16:21:40 UTC
You'll be pleased to know the rc9.git0.3 does the same thing as rc9.git0.1.  black and white screen with acceleration turned on, works with acceleration turned off.  And still the GPU lockup occurs with acceleration turned on.  Is there any way to debug why the gpu would be locking up?

Comment 23 Fedora End Of Life 2013-04-03 14:05:52 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 24 Justin M. Forbes 2013-04-05 19:55:44 UTC
Is this still an issue with the 3.9 kernels in F19?

Comment 25 kevin martin 2013-04-08 01:41:43 UTC
yes.  3.9.0-0.rc5.git3.1.fc20.x86_64

It's happened multiple times in the last 4 days.  Machine hangs, can't login externally (not even a ping works), must power down by holding the power button until shutdown and then power back up again.

Comment 26 Josh Boyer 2013-09-18 20:37:17 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 27 Josh Boyer 2013-10-08 17:47:28 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 28 Johan Vervloet 2013-10-09 18:39:09 UTC
Sorry, I missed this. The problem still occurs. I am running kernel 3.11.3-201.fc19.x86_64. So I guess the problem can be reopened.