Bug 492658

Summary: Nouveau Test Day: X server crashes with multihead on 9800M GTS [10de:062c]
Product: [Fedora] Fedora Reporter: Stefan Becker <chemobejk>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 11CC: airlied, ajax, awilliam, bskeggs
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: http://www.smolts.org/client/show/pub_0f1188c8-b2ed-454d-8eb2-e821375598b6
Whiteboard:
Fixed In Version: 2.6.29.5-191.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-12 04:05:55 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
X server logfile
none
No KMS (1/3): X server log
none
No KMS (2/3): dmesg log
none
No KMS (3/3): gdb log
none
KMS (1/3): X server log
none
KMS (2/3): dmesg log
none
KMS (3/3): gdb log
none
No KMS: X server log
none
No KMS: X server log
none
KMS: dmesg log
none
NV (1/3): no multihead X server log
none
NV (2/3): multihead X server log
none
NV (3/3): dmesg log
none
No KMS (20090501 1/3): X server log (LVDS only)
none
No KMS (20090501 2/3): X server log (LVDS & DVI)
none
No KMS (20090501 3/3): dmesg log
none
KMS (1/4): no uscript, X Server log
none
KMS (2/4): no uscript, dmesg log
none
KMS (3/4): uscript=1, X Server log
none
KMS (4/4): uscript=1, dmesg log
none
KMS multihead logs 29-Jun-2009
none
KMS multihead logs kernel-2.6.29.5-208.fc11
none
small test app
none
source to the above test app none

Description Stefan Becker 2009-03-27 17:28:38 EDT
Created attachment 337066 [details]
X server logfile

Description of problem:

When I connect my TV to the DVI Output on the laptop the X server crashes:

(II) NOUVEAU(0): I2C device "DVI-1:ddc2" registered at address 0xA0.
(II) NOUVEAU(0): Detected a Digital output on VGA-4

Backtrace:
0: X(xorg_backtrace+0x26) [0x4e9976]
1: X(xf86SigHandler+0x6f) [0x47ddaf]
2: /lib64/libc.so.6 [0x3054a332f0]
3: /usr/lib64/xorg/modules/drivers//nouveau_drv.so [0x7f61305a1473]
4: X(xf86ProbeOutputModes+0x1a3) [0x4a53e3]

KMS was not in use.

Version-Release number of selected component (if applicable):
kernel-2.6.29-0.258.2.3.rc8.git2.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-10.20090310git8f9a580.fc11.x86_64
xorg-x11-server-Xorg-1.6.0-13.fc11.x86_64

How reproducible:
always

Smolt profile for the test machine attached as URL.
Comment 1 Ben Skeggs 2009-04-09 01:25:21 EDT
Do you still see this problem with the latest updates?
Comment 2 Stefan Becker 2009-04-09 01:35:50 EDT
I updated this morning to:

kernel-2.6.29.1-54.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-25.20090408gitd8545e6.fc11.x86_64

Then I booted up with "nomodeset 3" and ran xinit as normal user with the second display connected. The X server still crashes.

I didn't retry with KMS, but last time I tried connecting the second display stops the boot.
Comment 3 Ben Skeggs 2009-04-09 01:48:41 EDT
Ok.  I'd wager that kms has the same problem as the 2D driver does, and the crash is causing a kernel oops there.

Do you have a separate machine you can ssh in from?  If so, are you enable the debug repositories and install xorg-x11-drv-nouveau-debuginfo and run X (over ssh) with "gdb --args /usr/bin/Xorg -ac :0" and the backtrace?
Comment 4 Stefan Becker 2009-04-09 14:33:18 EDT
Created attachment 338971 [details]
No KMS (1/3): X server log
Comment 5 Stefan Becker 2009-04-09 14:33:57 EDT
Created attachment 338972 [details]
No KMS (2/3): dmesg log
Comment 6 Stefan Becker 2009-04-09 14:35:02 EDT
Created attachment 338973 [details]
No KMS (3/3): gdb log

I took the liberty of entering some "print xyz" at the crash location.
Comment 7 Stefan Becker 2009-04-09 14:35:37 EDT
Created attachment 338974 [details]
KMS (1/3): X server log
Comment 8 Stefan Becker 2009-04-09 14:45:02 EDT
Created attachment 338976 [details]
KMS (2/3): dmesg log
Comment 9 Stefan Becker 2009-04-09 14:47:08 EDT
Created attachment 338977 [details]
KMS (3/3): gdb log

Strangely enough the X server doesn't seem to crash with KMS and multihead. It doesn't show any picture though and the VT can't be restored anymore, ie. you have to reboot.

Version information:

kernel-2.6.29.1-54.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-25.20090408gitd8545e6.fc11.x86_64
Comment 10 Stefan Becker 2009-04-21 11:52:42 EDT
Created attachment 340559 [details]
No KMS: X server log

Retried with latest versions:

kernel-2.6.29.1-100.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-29.20090417gitfa2f111.fc11.x86_64

Booted with "nomodeset": X server still crashes
Comment 11 Stefan Becker 2009-04-21 11:54:43 EDT
Created attachment 340562 [details]
No KMS: X server log

This time with correct type...
Comment 12 Stefan Becker 2009-04-21 11:57:39 EDT
Created attachment 340563 [details]
KMS: dmesg log

Machine not rebooted, but enabled mode setting by hand with:

  $ modprobe -r nouveau
  $ modprobe drm debug=1
  $ modprobe nouveau modeset=1

dmesg shows a kernel bug.
Comment 13 Ben Skeggs 2009-04-23 23:19:18 EDT
There's something very odd here, a digital output is being detected on a VGA connector.  I'm curious, how does nv behave on your card?
Comment 14 Stefan Becker 2009-04-24 14:36:25 EDT
Created attachment 341235 [details]
NV (1/3): no multihead X server log

I added /etc/X11/xorg.conf with a "nv" Device section, booted the machine with "nomodeset 3" and did a "modprobe -r nouveau".

Then I logged in as root and ran "xinit" (only LVDS connected)
Comment 15 Stefan Becker 2009-04-24 14:39:39 EDT
Created attachment 341236 [details]
NV (2/3): multihead X server log

Same session: ran "xinit" (LVDS & DVI connected)

Differences:

 - diff -u of the X server logs:
    * Samsung TV correctly recognized
    * X server stuck before "RandR 1.2 enabled..." message
 - X server stuck in busy loop, needs to be killed with -9
 - Blank picture on LVDS
 - No signal on DVI detected by TV
 - after kill VT is not working
Comment 16 Stefan Becker 2009-04-24 14:40:47 EDT
Created attachment 341237 [details]
NV (3/3): dmesg log

dmesg log from the whole session.
Comment 17 Stefan Becker 2009-05-01 03:29:54 EDT
Created attachment 342051 [details]
No KMS (20090501 1/3): X server log (LVDS only)

Retried with latest driver:

   kernel-2.6.29.1-111.fc11.x86_64
   xorg-x11-drv-nouveau-0.0.12-32.20090501gitf69b34a.fc11.x86_64

Good news is that the crash is gone and that both LVDS and the TV connected to the DVI are correctly recognized. But still no output when both are connected :-( After a long while the X stops with an error message from the driver.
Comment 18 Stefan Becker 2009-05-01 03:31:08 EDT
Created attachment 342052 [details]
No KMS (20090501 2/3): X server log (LVDS & DVI)

X server log when both outputs are connected.
Comment 19 Stefan Becker 2009-05-01 03:32:03 EDT
Created attachment 342053 [details]
No KMS (20090501 3/3): dmesg log

dmesg log from the same session.

BTW: kernel started with "nomodeset 3".
Comment 20 Ben Skeggs 2009-05-01 03:37:39 EDT
When you boot the machine, which display do the boot messages appear on?
Comment 21 Stefan Becker 2009-05-01 04:20:28 EDT
With both outputs connected at boot time:

KMS (kernel: nouveau.modeset=1): crashes immediately in nv50 output function. No surprise here, as the KMS code hasn't been updated yet with the latest nv50 fixes :-)

No KMS (kernel: nomodeset): boot before X is shown on VGA text console on LVDS. When X server starts LVDS goes blank. TV connected to DVI shows "no signal" the whole time.
Comment 22 Adam Williamson 2009-05-01 13:05:25 EDT
ben, this sounds rather like mine - are they the same bug?
Comment 23 Stefan Becker 2009-05-02 14:15:25 EDT
Retried KMS and LVDS+DVI connected with latest kernel:

  kernel-2.6.29.2-123.fc11.x86_64

Kernel still crashes immediately at boot. Partial oops written down from screen:

<top of screen>
   nouveau_load+0x3c5/0x3da [nouveau]
   drm_get_dev+0x3b5/0x49f [drm]
   <uninteresting>
   nouveau_pci_probe+0x15/0x17 [nouveau]
...
Code: 00 c9 c3 55 48 89 e5 41 55 41 ...
...
RIP: nv50_connector_mode_valid+0x28/0xb4 [nouveau]
Comment 24 Ben Skeggs 2009-05-04 00:20:37 EDT
Hopefully -124 will solve that oops, I expect you'll hit the hangs like you see in the DDX however.

If you do, and you're feeling brave, booting with "nouveau.modeset=1 nouveau.uscript=1" may *possibly* help.  nouveau.uscript=1 enables some experimental code to run output scripts that exist in the VBIOS image.  I'll likely add a similar option to the DDX if this helps your case, the code's already there, just disabled.

Adam, it's possible it's the same bug, but I suspect more likely this is the SOR2 issue I thought yours was before I discovered I couldn't read properly :P
Comment 25 Stefan Becker 2009-05-04 14:25:24 EDT
Created attachment 342343 [details]
KMS (1/4): no uscript, X Server log

Retried with:

 xorg-x11-drv-nouveau-0.0.12-33.20090501gitf69b34a.fc11.x86_64
 kernel-2.6.29.2-126.fc11.x86_64

and LVDS & DVI connected with KMS. The kernel no longer crashes, but also it doesn't recognize the connected TV. Output on LVDS works OK. xrandr --query shows DVI as disconnected.
Comment 26 Stefan Becker 2009-05-04 14:26:03 EDT
Created attachment 342344 [details]
KMS (2/4): no uscript, dmesg log
Comment 27 Stefan Becker 2009-05-04 14:26:52 EDT
Created attachment 342345 [details]
KMS (3/4): uscript=1, X Server log
Comment 28 Stefan Becker 2009-05-04 14:29:06 EDT
Created attachment 342346 [details]
KMS (4/4): uscript=1, dmesg log

So in short: on my system I don't see any difference between "no uscript" and "nouveau.uscript=1" on the kernel boot command line
Comment 29 Stefan Becker 2009-05-04 17:32:48 EDT
One thing changed at least: the DVI connector is again incorrectly setup as VGA-0 (with or without uscript=1):

$ xrandr --query
xrandr: Output VGA-0 is not disconnected but has no modes
Screen 0: minimum 320 x 200, current 1680 x 1050, maximum 8192 x 8192
LVDS-0 connected 1680x1050+0+0 (normal left inverted right x axis y axis) 331mmx 207mm
   1680x1050      59.9*+
VGA-0 connected (normal left inverted right x axis y axis)
DVI-D-0 disconnected (normal left inverted right x axis y axis)
DVI-D-1 disconnected (normal left inverted right x axis y axis)

VGA-0 has no modes, although the modes from the TV are recognized from the EDID. I tried to --newmode and then --addmode with xrandr but no picture on the TV and  the X server got really confused and slooooowww until I rebooted.
Comment 30 Bug Zapper 2009-06-09 08:45:26 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 31 Fedora Update System 2009-06-17 07:53:11 EDT
kernel-2.6.29.5-191.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.29.5-191.fc11
Comment 32 Stefan Becker 2009-06-17 17:00:25 EDT
Based on testing with 

  kernel-2.6.29.5-186.fc11.x86_64

(-191 doesn't have any additional changes in that area) for the KMS path and similar changes coming in

  xorg-x11-drv-nouveau-0.0.12-39.20090528git0c17b87.fc11.x86_64

for the non-KMS path:

 - multihead with LVDS + VGA (via DVI-I) works now (KMS and non-KMS)

   * In the KMS case it works correctly out of the box without xorg.conf, i.e. both connected displays are set up with the correct preferred resolution and the screen virtual size is the larger of both resolutions

   * in non-KMS case you need to use xorg.conf if you want the X server to start up with the right resolutions

 - multihead still broken for DVI (via DVI-I) and HDMI

 - DPMS is broken when KMS is used (Ben guesses some other KMS change broke that)

Good progress. But despite what the kernel update note says, this bug is not completely fixed yet.
Comment 33 Fedora Update System 2009-06-19 09:43:46 EDT
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-6768
Comment 34 Stefan Becker 2009-06-19 09:52:08 EDT
See comment #32 :-)
Comment 35 Fedora Update System 2009-06-24 15:22:21 EDT
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 36 Stefan Becker 2009-06-24 15:36:13 EDT
See comment #32
Comment 37 Ben Skeggs 2009-06-29 05:07:30 EDT
http://koji.fedoraproject.org/koji/buildinfo?buildID=112059 may (finally!) help with the issue.  You will need to enable nouveau.uscript=1 in addition to nouveau.modeset=1.
Comment 38 Stefan Becker 2009-06-29 13:13:16 EDT
Created attachment 349824 [details]
KMS multihead logs 29-Jun-2009

Well, some progress, but I suppose not the big leap we hoped for. LVDS + VGA (from DVI-I) now works perfectly under KMS, both displays are initialized with the correct resolution for plymouth and X. So I can now keep the VGA cable permanently connected to the laptop.

But DVI (from DVI-I) and HDMI still don't work :-( The X server gets stuck in an endless loop occupying one CPU core with 100%, LVDS display goes blank and the connected TV shows "No signal".

I've attached the xrandr --query --verbose, dmesg & Xorg logs from the three combinations. I started up the machine with KMS into run level 3, logged in as root and started "xinit". From the dmesg log it looks like that the KMS code tries to do the right things but it only receives timeouts when writing to one register.

Current installed versions:

xorg-x11-server-Xorg-1.6.1.901-2.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-40.20090528git0c17b87.fc11.x86_64
libdrm-2.4.6-7.fc11.x86_64
kernel-2.6.29.5-201.fc11.x86_64

Kernel command line:
ro root=/dev/mapper/VolGroup00-root rhgb quiet fastboot nouveau.modeset=1 nouveau.uscript=1
Comment 39 Stefan Becker 2009-06-30 23:19:42 EDT
Created attachment 350051 [details]
KMS multihead logs kernel-2.6.29.5-208.fc11

Just tested the latest F11 kernel-2.6.29.5-208.fc11. I only updated the kernel, not the X11 server or nouveau driver since the last test. Another step in the right direction, I feel we're getting close now:

 - kernel boots without hanging now also when DVI (via DVI-I) or HDMI are connected

 - with DVI both displays are setup with the correct resolution and enabled, i.e. you can see plymouth & VT.

 - with HDMI only LVDS is enabled, with correct resolution

 - X server still hangs after "xinit"

 - after xinit process has been kill -9'ed the VT shows up again, so at least the modesetting still works

New logs from DVI & HDMI KMS multihead attached.
Comment 40 Ben Skeggs 2009-07-01 01:35:08 EDT
Are the display you're connecting via DVI and HDMI the same display?  If so, could you try letting nouveau set the mode with HDMI connected, and then swap to DVI and see if nouveau programmed that instead?

According to your BIOS, HDMI and DVI share the same SOR (digital encoder).. something we've never encountered as of yet..

I'm assuming you also can't use the DVI and HDMI outputs at the same time, even with the binary driver?
Comment 41 Stefan Becker 2009-07-01 14:59:53 EDT
Yes, it's the same TV. Both DVI and HDMI cable are connected to different HDMI inputs on the TV.

I started up the laptop with HDMI connected and when Plymouth stopped for the LUKS password I switched to the DVI cable. Voilá, you were right, the KMS code had set up the DVI-D part of the DVI-I connector with the right resolution.

I've never tried the nvidia driver, but I doubt that he can drive them seprately. I never intended to use both DVI and HDMI at the same time so I never tried. Should I?
Comment 42 Ben Skeggs 2009-07-02 03:16:41 EDT
Hmm, it might be interesting to know if it's possible.  To my current understanding of the GPU, it wouldn't be, but, I've also never seen a configuration like this, so you never know.

It would be useful however to see what NVIDIA does when initialising the DVI and HDMI outputs, to know exactly what's expected of us there.

If you don't mind, would you be able to install the kernel-debug package, and the NVIDIA binary driver to get some traces for me?  It's not the most straight-forward process, but (from a cold boot is always better):

1. Download the binary driver from ftp://download.nvidia.com/XFree86/Linux-x86_64/185.19/NVIDIA-Linux-x86_64-185.19-pkg0.run
2. Run ./NVIDIA-Linux-x86_64-185.19-pkg0.run, to install the binary driver (you can uninstall again by running it with --uninstall)
3. Make sure nvidia.ko not still loaded, rmmod it if it is
4. mount -t debugfs debugfs /sys/kernel/debug
5. echo 20000 > /sys/kernel/debug/tracing/buffer_size_kb
6. echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
7. cat /sys/kernel/debug/tracking/current_tracer > kmmio.log
8. (on another console) modprobe nvidia
9. startx (assuming you've let the binary driver installer, or manually edited xorg.conf to load the binary driver)
10. bzip -9 and mail the log to mmio.dumps@gmail.com

And could you repeat the process for both whichever connector you didn't do in the first run?

Hopefully the instructions are accurate, there's additional info at http://nouveau.freedesktop.org/wiki/MmioTrace if you like.

Thanks!
Comment 43 Stefan Becker 2009-07-02 03:22:08 EDT
Is it OK to install the nvidia RPMs from rpmfusion-non-free instead? I don't trust the Nvidia installer.
Comment 44 Ben Skeggs 2009-07-02 09:25:25 EDT
That'd be fine so long as they provide a build for the -debug kernel (which has mmiotrace included), last I looked they didn't however.
Comment 45 Stefan Becker 2009-07-02 15:04:15 EDT
The binary driver identifies LVDS as DFP-0, DVI-D as DFP-1 and HDMI as DFP-2. It lets you setup DVI-D & HDMI with TwinView but when you try to apply it the setup fails. I guess then the driver realizes that DFP-1 and DFP-2 share the same digital encoder circuit.

> 7. cat /sys/kernel/debug/tracking/current_tracer > kmmio.log

I guess you meant "/sys/kernel/debug/tracing/trace_pipe". I sent two log files for DVI-D and HDMI.
Comment 46 Ben Skeggs 2009-07-06 23:30:20 EDT
Created attachment 350727 [details]
small test app
Comment 47 Ben Skeggs 2009-07-06 23:30:53 EDT
Created attachment 350728 [details]
source to the above test app
Comment 48 Ben Skeggs 2009-07-06 23:34:27 EDT
I've attached a small test app to try a couple of things, building entire test kernels takes far too long :)  The source is also there if you prefer to not run random binaries you find on the net (gcc -lpciaccess swapconn.c -o swapconn)..

Can you boot nouveau with HDMI connected, and firstly try "./swapconn dcmod", if still nothing on screen give "./swapconn tryswap" a go too.  If still nothing, a nice clean boot and "./swapconn tryswap" on its own too.

If any of the above work, the test app should be able to switch back and forth between them.  But, this is all guesses still really..
Comment 49 Stefan Becker 2009-07-06 23:53:36 EDT
Tried with kernel 2.6.29.5-208.fc11.x86_64 and boot commandline:

ro root=... rhgb quiet fastboot nouveau.modeset=1 nouveau.uscript=1 3

First boot:

"dcmod"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
0x00000102
0x00000202
0x80000681
0x80000081

"swap"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000

Second boot:
"swap"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000

No reaction on the HDMI connection :-(
Comment 50 Stefan Becker 2009-07-07 00:10:46 EDT
Sorry, my bad. Changed the source code for the "user can't follow instruction" case and now got this:

First boot:
"dcmod"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
0x00000102
0x00000202
0x00000681
0x80000081

"tryswap"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
unexpected config

Second boot:
"tryswap"
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
unexpected config
Comment 51 Ben Skeggs 2009-07-07 00:21:43 EDT
Hmm, for the dcmod case, do you see anything from the nouveau kernel module appear in dmesg?  Probably not useful, more out of curiosity.

For the second, sorry, that was an oversight on my behalf :)  Was going from binary driver traces, and we never touch those regs in the first place!  Can you add a "case 0:" above the "case 4:" line ?
Comment 52 Stefan Becker 2009-07-07 00:40:01 EDT
Tried again with case 0: added:

dcmod:
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
0x00000102
0x00000202
0x00000681
0x80000081

tryswap (twice)
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
swapped to 8
0x00000001/0x00000001 0x00000008/0x00000000 0x00000001 0x00000000
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000008/0x00000000 0x00000001 0x00000000
swapped to 4
0x00000001/0x00000001 0x00000004/0x00000000 0x00000001 0x00000000

reboot, tryswap (twice)
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000000/0x00000000 0x00000001 0x00000000
swapped to 8
0x00000001/0x00000001 0x00000008/0x00000000 0x00000001 0x00000000
IO@0xf5000000/0x01000000
0x00000001/0x00000001 0x00000008/0x00000000 0x00000001 0x00000000
swapped to 4
0x00000001/0x00000001 0x00000004/0x00000000 0x00000001 0x00000000

I could see no changes in dmesg after these commands had been executed.

Sorry, have to run to work now. Next text will have to wait for the afternoon...
Comment 53 Stefan Becker 2009-09-12 04:05:55 EDT
The story continues into F12. As this is anyway not going to be fixed anymore with F11 and the test system will move to F12 we can close this one as duplicate.

*** This bug has been marked as a duplicate of bug 522587 ***