Bug 480393
Description
Adam Williamson
2009-01-16 20:04:03 UTC
Update: this is actually probably just X.org auto-detection, now I look into it some more. X just blindly uses nv for *any* NVIDIA card. From hw/xfree86/common/xf86AutoConfig.c in xorg-server : case 0x10de: case 0x12d2: driverList[0] = "nv"; break; 0x10de is NVIDIA's vendor code. So, there should probably be an exception for this card (and some others that don't work with 'nv'...), if the bug is not fixed in nv itself. I'll try and do a patch. I have attached a patch to the fd.o report which causes X.org auto-detection to use vesa rather than nv for a range of devices which have clear fd.o bug reports that they completely and utterly fail to work with the nv driver. It would probably be a good idea to add that patch to Fedora's X server package. Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please attach your X server config file (/etc/X11/xorg.conf, if available) and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below. Could you try noveau? Does it work for you? Could we get /var/log/Xorg.0.log from this attempt as well, please? We will review this issue again once you've had a chance to attach this information. Thanks in advance. I don't think nouveau works - I've read a bug report for the other type of 9400 GT (there's two '9400 GT' PCI IDs) where the nouveau folks acknowledged that it doesn't work, and said they were waiting for it to work in nv before looking at it. But I'll double-check later. There's no xorg.conf in the reported case, it's the installer. I'll have to change some things to test it again in a booted system and see if there's actually a useful Xorg.0.log. It really hangs the system entirely, it is not possible to switch to a console to get the logs. I'll try and do that shortly. Remember, the point of this issue is not mainly to fix the bug in nv. I've filed a bug on freedesktop.org for that. The point is that the fact that it doesn't work with nv causes the installation to fail, because nv is automatically used for the cards. What I'd recommend is the installer use vesa, not nv, for this and other cards like it which do not work with nv. If the installer just uses X.org auto-detection, that can be accomplished via the patch I attached to http://bugs.freedesktop.org/show_bug.cgi?id=19619 . Actually, it might make more sense to re-assign the bug to the X server package. (In reply to comment #5) > Actually, it might make more sense to re-assign the bug to the X server > package. No, it wouldn't -- this is most likely an issue related to the -nv driver (yeah, I know that you claim it is problem with Xserver picking bad driver; bear with me, we need to separate bugs to some reasonable boxes and this seems to be fitting here most). Could we please get those logs (either /var/log/Xorg.0.log or /var/log/anaconda.xlog for the issues during installation)? Thank you Well, obviously, yes, there's a bug in nv. But the point is that if nv isn't fixed, it shouldn't be used...and I suspect getting nv fixed is not going to be straightforward. See my fd.o bugs; there are eight fd.o bugs on hardware that does not work at all with the nv driver, most of which have been open for several months. I'll see if I can get some logs. Ben is looking into the nv bug. I checked Xorg.0.log: when trying to load X with the nv driver, it's a 0-byte file - it obviously hangs very early in the process, I guess. Same whether booting to init level 5 or booting to init level 3 and doing 'startx'. I try it, it hangs the system, then I reboot to init 3, and /var/log/Xorg.0.log is a 0-byte file with the appropriate timestamp. Ben's asked me to do an mmio-trace to see how the proprietary driver (which works) initializes the card, I'll try to get around to that. The mmiotrace of the proprietary driver starting up, that Ben requested from me via email, can be found at: http://www.happyassassin.net/extras/mydump.txt.lzma I reported results with nouveau to Ben via email, but basically - it doesn't work, but in a different way. :) Attempting to start X with nouveau causes a kernel oops. I took pictures of each page of the oops, those can be found here: http://www.happyassassin.net/extras/9400gt-nouveau-trace.tar.gz I sent xorg.conf to Ben via email, but there's nothing particularly interesting in it, it's just the standard one generated by livna-config-display. I think that satisfies all information requested. (In reply to comment #9) > I think that satisfies all information requested. It does, passing to Ben, but please next time, keep all information here so that we have a record. With the latest Rawhide kernel (with a claimed fix for NVIDIA 9 series support) and nouveau, it still fails, but in a different way. There's no kernel panic, but - like with nv - the system hangs at a black screen, cannot switch to console or ssh in. X obviously gets some way into the loading process, as there's a non-zero Xorg.0.log with interesting stuff in it. I'll attach it. Could someone change the component to nouveau, btw? Since Ben's preferred angle of attack is to make nouveau work and start using it by default. Created attachment 331878 [details]
Xorg.0.log from 2009/02/13 Rawhide / nouveau attempt
Attempt to start X.org with kernel 2.6.29-0.112.rc4.git3.fc11.x86_64 and nouveau 0.0.12-3.20090206git945f0cb.fc11.x86_64 .
The GF9 fixes that went into the kernel module fix nouveau's graphics engine setup. The display engine is hanging here, and I assume is the case in nv too, it's hard to say however since nv isn't overly helpful with saying why it's locked up :) Is this card one of those cards that nv works on after you've started X at least once with the binary driver after power on? Ben: yes, indeed it is. Okay. It's a little bit of a long shot, but lets give this a try. There's a modified git tree of airlied's radeontool program at git://anongit.freedesktop.org/~darktama/radeontool (g80-pdisplay branch). Are you able to build that, and run it (./radeontool regs) to grab some register dumps? 1. after a cold boot, with nv attempting and failing to run on the card 2. while running the binary driver 3. while running nv successfully after the binary driver Hopefully we'll be able to pinpoint something by comparing those dumps. "1. after a cold boot, with nv attempting and failing to run on the card" This one I can't do, I don't think, as the system crashes as soon as I try to run X in this case. It's not just an X crash, I can't get out with ctrl-alt-F1 or anything. Even the magic keys do nothing. But I can run a trace from a completely cold boot, I guess. "2. while running the binary driver 3. while running nv successfully after the binary driver" These ones I'll do ASAP. Thanks. Ah, that's a shame. Comparing 1 and 3 would be the most useful :) Not to worry, we'll see what we can do! I guess I could see if I can start the trace running and dump the output into some file via tee or something, then try to bring up X, and see if there's anything useful in the file when I reboot after it inevitably hangs...well, I'll play around a bit and see if I can get anything useful. :) Very sorry for the spam, this is a quick test comment for a Bugzappers team greasemonkey script (if all goes well, it should appear with a signature). I'm about to do the testing Ben requested. OK, some traces. I could not, I'm afraid, manage to get traces from the cases where nv or nouveau fail. The system is too far gone for me to be able to get 'em. Attachments follow. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Created attachment 332286 [details]
radeontool dump from completely clean boot, init level 3, no module loaded
This is the dump from a completely clean boot to init 3 with no 'nvidia' module loaded.
Created attachment 332287 [details]
radeontool dump from completely clean boot, init level 3, after loading 'nvidia' module
This is the dump from a completely clean boot, init level 3, after loading the 'nvidia' module but before starting X.
Created attachment 332289 [details]
radeontool dump from completely clean boot, init level 3, after loading 'nvidia' module then X
This is the dump from a clean boot to init 3, after loading the 'nvidia' module and then starting X with the nvidia driver. X is running successfully at this point.
Created attachment 332291 [details]
radeontool dump from completely clean boot, init level 3, starting X with nvidia then switching to nv
This is the dump after booting clean to init level 3, starting X with the nvidia driver (successfully), immediately logging out, and then starting X with the nv driver (whereupon it works, as previously described).
Still the case with the Test Day live CD. Note that if I boot with nouveau.modeset=1 , modesetting actually works - I get a graphical boot sequence - but the bug still happens; as soon as X kicks in, the system freezes (not at a black screen this time, the graphical boot screen freezes solid). -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Oh, that's interesting to know, I wouldn't have expected kms to be able to program a mode if the 2d driver can't manage! Can you by any chance ssh in and get /var/log/Xorg.0.log and dmesg output after the freeze? As I already said the last two times you asked ;), it's really *frozen*. It's not just X is hung or something - the system is dead. No access. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Right. I thought circumstances may have changed, the drm is managing to do a better job than the 2d driver did, so it was worth a try :) Hey Adam, sorry it's been a while without an update. Can I grab your /var/log/nv*.rom file please :) umm, /var/run/nv*.rom I mean! If that doesn't exist (nouveau should create it), /var/run/video.rom (nv creates this) will be fine. i'll test again with the latest nouveau and nv later today, and give you those files if they show up. nv and nouveau both still hang the system on startup. here's /var/run/nv*.rom (just one file) from starting up nouveau after nvidia has already run (which, as noted earlier, succeeds). Created attachment 342032 [details]
requested file
Some updates from IRC: Ben had the bright idea that this was related to the dual displays. Indeed, booting with only one display connected (either one) works. There are two displays, one connected to the VGA output, one to the DVI output (the card has VGA, DVI and S-Video outputs). Both are 20" Acers, slightly different models (VGA is an AL2016W, DVI is an Acer X203W). Booting with nouveau with either connected alone, works. Then connecting the second display and doing 'xrandr' works in both cases: the second display shows correctly in the xrandr output, though it is not currently activated. Then doing 'xrandr --auto', the result varies depending on which monitor was connected first. If I connect VGA first, boot, then connect DVI, do 'xrandr' then 'xrandr --auto', the system hangs in a similar way to how it does when booting with both displays connected. If I connect DVI first, boot, then connect VGA, do 'xrandr' then 'xrandr --auto', it works as expected; I get clone displays, both behaving correctly. If I *then* use xrandr to try and set the DVI output right-of the VGA output, it succeeds, but VGA starts behaving oddly; the cursor behaves correctly on it, but nothing else is updated, whatever I do that should change what is displayed on VGA, it stays exactly the same as at the point where the xrandr command ran. http://www.happyassassin.net/extras/Xorg.0.log-nouveau is a log from booting with an experimental nouveau build provided by Ben, with DVI connected. I then connect VGA, run 'xrandr', run 'xrandr --auto', then try to set VGA left-of DVI, at which point it behaves rather oddly, with the cursor only updating for short periods between long periods where it's stuck. This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping If you grab the kernel build from http://koji.fedoraproject.org/koji/buildinfo?buildID=112059 and add "nouveau.modeset=1 nouveau.uscript=1" to your boot options, this chipset should work now (mine does at least!). Can you confirm? Not until I'm back in Canada (July 14th). If you don't see something around then, give me a ping :). Thanks for your work on this! I'm back now, but - the test system is on Rawhide now. I will assume the current Rawhide kernel contains the necessary fixes and test with that in a second. Please let me know if it doesn't :) -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers fails with current rawhide, I'm afraid. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers From talking on IRC I can CLOSED,RAWHIDE this? Changing component to nouveau, this is CANTFIX for nv. yes, it works now, we can close. Works in current Rawhide out of the box (no special kernel parameters required). -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers James, not sure why you just CCed yourself on this bug, but as noted in my last comment it's been fixed for ages, for me. Still works in current F13 and Rawhide. I CCed myself on the wrong bug and didn't want to make more noise by removing myself... the bug I was experiencing was Bug 532711 which is fixed by the latest (as of yesterday) kernel build in koji |