Bug 601790
Summary: | RHEL6 DVD Install regression between nightly 20100603 and 20100607 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Zachary Amsden <zamsden> | ||||||||||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | desktop-bugs <desktop-bugs> | ||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||
Priority: | low | ||||||||||||||||
Version: | 6.0 | CC: | jbastian, notting, syeghiay, vbenes | ||||||||||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | xorg-x11-drv-nouveau-0.0.16-8.20100423git13c1043.el6 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2010-11-10 21:56:24 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Zachary Amsden
2010-06-08 15:52:15 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Would it be possible for you to recover the output of "dmesg" and /var/log/Xorg.0.log over ssh so there's a clue as to what may have went wrong? I actually tried. No attempt to switch consoles was successful. It's possible with a hacked initrd that I could get network access, but even so, it's difficult to figure how to activate it specifically at that time... I think the best bet here is binary search; 20100603 and 20100607 are close enough to have good search potential. Well you see, I've booted the relevant packages on a few of my own machines and they work fine. So I need more info to know what could have possibly went wrong. If you plug into a wired network can you get access? I am plugged into a wired network, but this happens so early in the install, I don't think it's active. Being unable to change consoles after the crash / hang doesn't help either. I'm downloading the 20100605 DVD now to see if I can pinpoint the regression a bit closer, but I have things I can try: 1) interrupt install early and get network going (although, an irq lockout seems to be happening as keyboard doesn't work) 2) force memory to be detected as lower than usual, resulting in text mode install, then boot with a full install and active networking. Any suspicious changes went in recently to X or drivers? Because the regression is 100% boolean pass / fail just by switching between these two DVDs. There's one recent nouveau change that went into kernel -32, but I can't think of any way it'd have caused this particular problem. okay, was able to sneak in a command (dhclient eth0) before the crash; saw the X cursor for sure before all went blank. Keyboard leds still responsive. However, seeing as nothing is set up on the box to accept incoming connections, it's still useless even with networking. It does respond to pings however. Switching to console 2, I can get beeps when hitting ctrl-G; perhaps I can blind type some commands. So far I haven't been able to successfully ssh out; is ssh available there in the installer shell? So at least console switching works, but VGA restore to text mode is definitely broken. How about if you boot with nouveau.noaccel=1 in your boot options? okay, this sucks, I spent 30 minute typing commands blindly, was able to get ssh to connect to port 22 (watching packet dump), but for some reason the client disconnects - no useful information on server, it must be a client side failure - and of course I can't see the error message. nouveau.noaccel=1 works Okay, this is really weird then, there's been no nouveau changes *at all* that effect that area recently. Can I see your dmesg output from that regardless? Oh, and what kernel and xorg-x11-drv-nouveau versions are on each of the DVD images? Created attachment 422740 [details]
dmesg output
kernel-2.6.32-33.el6.x86_64.rpm xorg-x11-drivers-7.3-13.2.el6.x86_64.rpm xorg-x11-drv-nouveau-0.0.16-6.20100423git13c1043.el6.x86_64.rpm xorg-x11-server-Xorg-1.7.7-4.el6.x86_64.rpm strange to see a git tag in the nouveau package version.. did an old version get pulled in somehow? Nope, nouveau doesn't technically have a "release" as of yet upstream, so there's no version numbers, I used the git tag of the commit the package was based on instead. How about the working install image? One thing I thought of too, if you do an install, do you see the problem still on the installed system? It may be easier to track down the exact cause that way. I can't do an install; the installer now wants to wipe out my drive, which I can't accept. See bug 602497. BTW, I figured out what went wrong with scp / ssh: it asks "Are you sure you want to connect" the first time, so I must answer "yes" before the password. I could blind scp off stuff from /tmp or /var/log in the failed case if that would help... Yes, that could be *very* helpful potentially. If the GPU hung or something, it *should* have reported the hang to the driver at least. /var/log/Xorg.0.log (maybe Xorg.0.log.old too if it exists) and /var/log/messages are the most useful. believe it or not, that worked. Taking a diff of /tmp/X.log, I see this in the failed version (II) AIGLX error: dlopen of /usr/lib64/dri/nouveau_dri.so failed (/usr/lib64/dri/nouveau_dri.so: cannot open shared object file: No such file or directory) (II) AIGLX: reverting to software rendering Is it just a missing file? Nope, that's not an error at all. It's just saying there's no 3D driver available is all. What did you have to do to make it work? Created attachment 422741 [details]
dmesg from failed X
Created attachment 422742 [details]
/tmp/X.log from failed X server
Created attachment 422743 [details]
/tmp/X.log from nouveau.noaccel=1 (working X server)
Oh, I meant the blind switch to VT-2 and scp trick worked... FWIW, the failure case uses GPU Channel 2, opens and closes it twice, and after the second close, it immediately fails. What it would have done next is: (==) NOUVEAU(0): DPMS enabled BTW, the card has two ports, if I'm reading it right, the output looks like it fails upon initializing the second screen (I have no display connected to it however). Hmm, can I see your X log from the working install DVD too please? I'm not really sure why the channel gets closed and reopened in a single X invocation, but, that gives me something to test against nouveau to see if that case actually works. okay, major change discovered.. the X server went from 1.7.6 (working) to 1.7.7 (fails) uploading dmesg and X.log from 0603 DVD Created attachment 423035 [details]
dmesg 0603 (working)
Created attachment 423037 [details]
X.log 0603
*** Bug 602760 has been marked as a duplicate of this bug. *** Okay. At some point between those two nightlys, something's changed to cause an X server regeneration to happen. I've tracked down a bug in the X server that nouveau somehow triggers, which appears to be the likely candidate for what you're seeing. Let me know if there's anything else I can do to help debug / test this. I've found a workaround for the install issue and went ahead with a new install from 20100607 with the VESA driver. Thanks, I'll push a new server build if/when I get the acks for this bug. Once it makes it into a nightly it'd be great if you could ack/nack that it's fixed :) Just an update, the server fix was a side-effect, and shouldn't have happened. The correct fix will be in xorg-x11-drv-nouveau. we have the same HW as reproducer has.. Will test soon if we can reproduce s/reproducer/reporter :) I've built xorg-x11-drv-nouveau-0.0.16-8.20100423git13c1043.el6 now with what I hope is the fix for this. Once it appears in a nightly, it'd be great if you could ack/nack the fix :) It's far easier for me to install the package, downloading a full DVD takes about 4 hours here. I've had no choice to do a full install unfortunately, which resulted in me finding this bug, but now that I'm installed, I'd gladly just switch X drivers. BTW, the framebuffer seemed to work fine with nouveau kernel driver, I got fancy graphics on the install of a spinning wheel, whee! But this didn't jive well with the install of X with basic video, the VESA driver complained that the nouveau driver had taken it's resources and so X11 refused to start. I fixed that, but I don't know if normal users would be able to: mv /lib/modules/tab/and/search/to/find/the/path/of/nouveau /lib/nouveau-old.ko dracut -o nouveau -f /boot/initramfs-xxx $(uname -r) Feel free to cc me on the bugfix, I used to hack video drivers for fun in another life. I can reproduce this problem too on a system with a Quadro FX 570 card. I can download the ISO images in about an hour so I'll test this next week when it appears in a nightly build. I just tested RHEL 6.0 20100622.n.0 nightly (which includes nouveau 0.0.16-8.20100423git13c1043.el6) and the Anaconda GUI started correctly. I've just tested using 6.0 20100630.n.0 nightly on NVS 290 (Dell Precision T5400). Everything works as expected -> VERIFIED Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |