Bug 604931 - 2.6.34+kernels don't boot with rv280 in kms mode when apeture < 256 MB
2.6.34+kernels don't boot with rv280 in kms mode when apeture < 256 MB
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati (Show other bugs)
23
All Linux
medium Severity high
: ---
: ---
Assigned To: Jerome Glisse
Fedora Extras Quality Assurance
: Patch, Reopened, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-16 22:47 EDT by Bruno Wolff III
Modified: 2016-04-05 17:02 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-05 17:02:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
dmesg from nomodeset boot (30.42 KB, text/plain)
2010-06-16 22:47 EDT, Bruno Wolff III
no flags Details
Xorg.0.log from a nomodeset boot (47.10 KB, text/plain)
2010-06-16 22:48 EDT, Bruno Wolff III
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 16515 None None None Never

  None (edit)
Description Bruno Wolff III 2010-06-16 22:47:40 EDT
Created attachment 424643 [details]
dmesg from nomodeset boot

Description of problem:
I have been testing some of the 2.6.34 kernels on a few systems and one with an rv280 (and broken EDID) won't boot unless I use the nomodeset parameter. I have another system with an rv530 that appears to work OK and and another with a nv28 that also has a monitor that doesn't do EDID properly that works.

Version-Release number of selected component (if applicable):
I tested using both xorg-x11-drv-ati-6.13.1-0.20100519git428125c09.fc14.i686 and xorg-x11-drv-ati-6.13.0-2.fc13.i686. And on several kernels including2.6.33.5-128.fc13.i686.PAE. This machine has dual processors (though the other machines that work also have more than one processor).
At about the point where the mode switch normally occurs when booting in text mode the screen gets filled with looks like the same text line repeated over again, except that instead of recognizable text there are odd bit patterns.
I have tried to enter the luks password for the file systems blind, but the system does not appear to boot. I am going to attach Xorg.0.log and dmesg from a nomodeset boot as that might give some hints.

How reproducible:
Boots always fail at about the point where the mode switch would occur if no special kernel parameters are used. (I boot in text mode.) Using the nomodeset parameter results in successful booting.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Bruno Wolff III 2010-06-16 22:48:33 EDT
Created attachment 424646 [details]
Xorg.0.log from a nomodeset boot
Comment 2 Kyle McMartin 2010-06-17 03:29:58 EDT
Dave, pretty sure all the commits from 2.6.35-rc3 made it in, any thoughts?
Comment 3 Bruno Wolff III 2010-06-22 03:20:20 EDT
I'm still seeing this with kernel-PAE-2.6.34-45.fc14.i686.
Comment 4 Bruno Wolff III 2010-07-11 12:36:26 EDT
I saw the problem with both kernel-PAE-2.6.34.1-9.fc13.i686 and kernel-PAE-2.6.35-0.31.rc4.git4.fc14.i686.
Comment 5 Chuck Ebbert 2010-08-04 09:16:28 EDT
Is this still failing with the latest 2.6.34 and 2.6.35 kernels?
Comment 6 Bruno Wolff III 2010-08-05 03:36:17 EDT
About a week ago I saw it happen with a rawhide live image. I amon vacation now and won't be able to retest for about a week.
Comment 7 Chuck Ebbert 2010-08-09 01:37:26 EDT
How is the display connected? If it's DVI, can you try using a VGA connection?
Comment 8 Bruno Wolff III 2010-08-11 11:35:26 EDT
It is connected to the VGA port on the card.
I am still seeing the problem with 2.6.34.3-37.fc13.i686.PAE. With nomodeset things work, without it it appears to crash at the point where mode switching would normally happen early in the boot process.
Comment 9 Bruno Wolff III 2010-08-31 23:38:34 EDT
I tried out kernel-PAE-2.6.36-0.12.rc3.git0.fc15.i686 on my F13 system and booting with KMS failed the same way.
Comment 10 Chuck Ebbert 2010-09-14 19:09:06 EDT
First-bad-commit:

commit 6b8b1786a8c29ce6e32298b93ac8d4a18a2b11c4
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Wed Apr 7 10:21:31 2010 +0000

    drm/radeon/kms: enable use of unmappable VRAM V2
Comment 11 Bruno Wolff III 2010-09-14 21:46:49 EDT
Thanks for tracking that down!
Comment 12 Bruno Wolff III 2010-11-15 09:40:10 EST
I am moving this to F14 now as I have upgraded the machine and will use nomodeset instead of kms. The problem still exists with kernel-PAE-2.6.35.8-55.fc14.i686.
Comment 13 Bruno Wolff III 2010-11-16 10:55:51 EST
This is more for other people trying to find workarounds, then actual pin pointing of the bug.
When I was using nomodeset with 2.6.34 and 2.6.35 kernels, I would start seeing processes hang after a while (notably nut). Running in level 3 avoided this.
I took a look at 2.6.36 (only) on an F14 system and I saw what appeared to be the same issue. After also upgrading libdrm, mesa and xorg stuff, nomodeset seems to be working OK so far. (Normally I'd have seen issues by this point.) So it looks like nomodeset and rawhide seems to be the best option, as running the 2.6.33 kernel at this point is pretty dangerous.
Comment 14 Phil V 2010-11-18 00:18:29 EST
Bruno, you wrote: "After also upgrading libdrm, mesa and xorg stuff, nomodeset
seems to be working OK so far. it looks like nomodeset and rawhide seems to be the best option." 

(1) I think you mean nomodeset as grub.conf parameter?

(2) Which components are you ising the rawhide version of?
Is it exactly these components: kernel, libdrm, mesa and xorg stuff?

Thank you!
Comment 15 Bruno Wolff III 2010-11-18 07:29:26 EST
nomodeset is a kernel parameter added in grub.conf.

At the time I was able to update just kernel stuff, mesa stuff, libdrm stuff and xorg stuff without forcing updates of lots of other stuff. The rv280 runs slower with nomodeset than it did without it (with 2.6.33 kernels), but seems to work correctly. Since I have updated completely to rawhide. That is exposing some other issues that aren't video card related. So unless you want to deal with that, you can try to just update the stuff above (and I expect it won't pull in a bunch of other stuff still).

For the update I enabled the rawhide repo and did something like:
yum update kernel'*' mesa'*' libdrm'*' xorg'*'
You'll get prompted before the update if you don't like what's being updated.
Comment 16 Phil V 2010-11-18 18:06:24 EST
Bruno, in your updated configuration can you activate a second monitor in non-mirror mode?
Comment 17 Phil V 2010-11-18 18:08:41 EST
In the meantime, I'm taking Bruno's red pill

to stay in Wonderland and see how deep the rabbit-hole goes.
Comment 18 Bruno Wolff III 2010-11-19 00:08:21 EST
I can't conveniently test using my card with multiple monitors.
I would suggest upgrading the minimum amount of stuff. While rawhide works in general there are some issues that you might run into.
Comment 19 Phil V 2010-11-19 11:01:08 EST
The F14 release notes warn us that they are going to obsolete the nomodeset boot option so it's a bug if nomodeset works but we need it to work.

Bruno, Should we change the title to 

"2.6.34 - 2.6.36 kernels don't boot with rv280 in kms mode"

I needed nomodeset to install F14 from DVD, and the .36 kernel install kept it in its grub.conf line.


By the way, second monitor doesn't work even with the rawhide components.
Comment 20 Bruno Wolff III 2010-11-20 13:06:25 EST
After running using nomodeset for a while, I am seeing slow downs. While rendering isn't as fast as it was with KMS right after boot, things get worse. Particularly workspace switching is very slow. If I stay within the same app and workspace things aren't too bad. That could be due to some other change in rawhide though.
I am in the process of rebuilding 2.6.36-5 with part of the unmappable memory patch (the r100.c part) reverted. It's a slow process on the hardware I have, but I should be able to test it soon.
From what I see of the patch though, I find it hard to believe the real error is in that patch. I did track down one possibility with a remap issue, but found that patch is in Fedora (and I think upstream) so it turned out to be a dead end.
But even if reverting part of the patch works as a workaround that's better than what we have today. (I think it might cost me half my GPU memory.)
Comment 21 Bruno Wolff III 2010-11-21 10:46:14 EST
I tested undoing the patch in r100.c and the system can now boot in KMS mode.
The screen I normally see displayed when things stop flashed briefly when the mode changed, but then text appeared again and the boot continued. X didn't work so well though. The background parts of the screen weren't working very well. Things displayed on top would appear, but things that were supposed to go away wouldn't. I had seem something at least a little like this on my other rawhide system using an nv28, so it may be an unrelated bug.
So it looks like something is broken with the handling of unmappable memory and trying to use it causes a problem on at least rv280s.
Comment 22 Bruno Wolff III 2010-11-22 23:32:32 EST
Tonight things are working better in KMS mode (with my custom kernel). I am not sure what changed. The background is black instead of my custom image. I think that might be an unrelated issue though. So far window switching is fast and other things appear to be working normally.
Comment 23 Bruno Wolff III 2010-11-23 00:22:20 EST
After a bit more time the long slow downs started occurring again. It feels like a memory pressure issue rather than video, though. I also saw the trail of old windows issue appear again, so there seems to be some sort of intermittent problem there.
Comment 24 Matěj Cepl 2010-11-23 19:03:56 EST
(In reply to comment #23)
> After a bit more time the long slow downs started occurring again. It feels
> like a memory pressure issue rather than video, though. I also saw the trail of
> old windows issue appear again, so there seems to be some sort of intermittent
> problem there.

And there is noteworthy or extraordinary about your hardware, right?
Comment 25 Bruno Wolff III 2010-11-23 19:34:59 EST
It's old. The monitor doesn't do EDID. I have a Digium TDM400P attached, but I see the same issues when I boot without having drivers for the card. (I have to rebuild them myself every kernel update.) I am running with luks encryption on top of raid 1.
Comment 26 Kyle McMartin 2010-11-23 21:38:47 EST
Can you try the latest 2.6.37-rc3 kernel here:
http://repos.fedorapeople.org/repos/kyle/kernel/fedora-kernel.repo
and follow up upstream? (There's no radeon/drm changes there, so it will be reproducing it with the latest DRM code...)

Sorry, but I don't have the skill or hardware to debug this further, so the best bet is for us to try and get assistance from upstream.
Comment 27 Bruno Wolff III 2010-11-24 02:21:12 EST
Related to the slowness to switch workspaces, top shows 0 swap being used and I have the following vm settings:
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.swappiness=1
vm.vfs_cache_pressure=50
I was going to try to not use the last two and see if that made a difference.
Because I was having boot problems I skipped a lot of kernels and don't know if the slow window switching happened at the same time or not.
I'll try out the upstream kernel.
Comment 28 Bruno Wolff III 2010-11-24 02:43:29 EST
As a side note, the way you have name kernels in your repo, kernel-PAE-2.6.37-0.rc3.git0.1.fc15.i686 isn't the one that gets installed by the default.
Comment 29 Kyle McMartin 2010-11-24 23:29:21 EST
Thanks, fixed that today.
Comment 30 Bruno Wolff III 2010-11-25 23:59:11 EST
I have a partial report. When I tried kernel-PAE-2.6.37-0.rc3.git0.1.fc15.i686 it crashed on boot in KMS. I patched it to do the partial revert of the commit where things went south and it boots. I still have the background not being redrawn issue (which has a separate bug report), had a couple of crashes that are probably new bugs, but didn't have enough info to report them. I also stopped using the overcommit setting of 2 and didn't have the dahdi driver rebuilt yet. This may have fixed the slow downs I was seeing, though there might have been some other fix that went in that did that. The released version of dahdi won't build so it will take me a bit to try out trunk and see if the problem returns. It's getting late here now, so I probably won't retest the overcommit setting until tomorrow as well.
Comment 31 Bruno Wolff III 2010-11-26 00:06:11 EST
Well I just triggered the slow down issue by trying to start nut (ups). So that appears to not be caused by dahdi nor the overcommit settings. Might not even be kernel related as there have been dbus issues reported.
Comment 32 Bruno Wolff III 2010-11-26 22:40:51 EST
I have some more data. I had the aperture set to 128MB in the bios. When I changed this setting to 256MB, I got a successful boot with an unmodified version of Kyle's kernel. (So far I have only done one reboot.) I believe the card has 256 MB of total memory, but that only half of that is accessible by the OS. But I could be confused about that, and maybe it's 128 MB each for the two subdevices.
I looked at that because I suspected that AGP might be the difference between my rv530 that works and mt rv280 that doesn't. And that by trying to making the aperture larger or smaller than the amount of memory might work around some incorrect test.
I don't know why this ended up working though.
Comment 33 Bruno Wolff III 2010-11-27 11:22:59 EST
In commit 6b8b1786a8c29ce6e32298b93ac8d4a18a2b11c4 does support of unmappable vram also apply to memory outside of the AGP aperture? If not, then a check for AGP mode should be done for those cards that support AGP and the size used should be reduced to the aperture size in those cases. This seems like a plausible cause of the bug, but I'm not sure.
Comment 34 Bruno Wolff III 2010-11-29 09:54:20 EST
The following bug might be related, but the fix isn't all there is, as I tested a 2.6.37-rc3 kernel and still saw my problem. This is in 2.6.36.1-10 which will be available very shortly. I'll retest it to make sure that my issue still appears if I change the aperture size.
http://bugs.freedesktop.org/show_bug.cgi?id=28402
This bug includes a link to the following commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d
The above but does indicate that there can be issues caused by incorrectly sized apertures, so there may be another path there when the bios setting is less then the full size of the device.
Comment 35 Bruno Wolff III 2010-11-29 12:19:22 EST
I tested 2.6.36.1-10 which included an aperture fix, but booting still fails with the aperture set to 128 MB in the bios and succeeds when it is set to 256 MB.

There are some pending fixes in http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=refs/heads/drm-radeon-testing which I'll take a look at and see if I can test.
Comment 36 Bruno Wolff III 2011-01-04 01:40:46 EST
I retested this with kernel-PAE-2.6.37-0.rc8.git3.1.fc15.i686 and am still seeing the problem when I also set the aperture to 128MB instead of 256MB.
Comment 37 Bruno Wolff III 2012-01-23 03:27:29 EST
This is still happening with kernel-PAE-3.3.0-0.rc1.git0.3.fc17.i686.
Comment 38 Fedora End Of Life 2012-08-16 12:47:14 EDT
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 39 Phil V 2012-09-25 20:54:04 EDT
Any news or hope of repair?
Comment 40 Bruno Wolff III 2012-09-26 08:49:17 EDT
When I was having just that problem I went into the BIOS and set the apeture to 256 MB (matching the memory on the video card) and the problem didn't affect things.
However, another AGP problem exists and to solve that one I have disabled AGP with a kernel parameter.
Comment 41 Phil V 2013-03-07 00:21:23 EST
You mean, you have disabled your AGP video card completely?
Or something else?
Comment 42 Bruno Wolff III 2013-03-07 09:04:40 EST
No, I just disabled access to the card using AGP. PCI is used instead.
Comment 43 Fedora End Of Life 2013-07-04 02:42:07 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 44 Fedora End Of Life 2013-08-01 14:18:05 EDT
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 45 Bruno Wolff III 2013-09-21 04:11:29 EDT
I retested this with kernel 3.12.0-0.rc0.git22.2.fc21.i686+PAE (after changing the apeture to 128 in the bios) and at about the point where the screen resolution would normally change the screen changed to several columns of purple and gray bars on a mostly black background. So it looks like there is still a problem when the apeture is not set to 256 (which matches the amount of memory on the graphics card).
Comment 46 Jan Kurik 2015-07-15 11:19:26 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
Comment 47 Bruno Wolff III 2016-04-05 17:02:11 EDT
The machine that had this issue has died. Also I had found some processor errata notes that might have been related to this problem. It had a cache line conflict problem that could hang a cpu. The chip didn't have updatable microcode so it wasn't fixable.

Note You need to log in before you can comment on or make changes to this bug.