Bug 509410 - Black screen when trying to run X by executing startx command after init 3. [NEEDINFO]
Black screen when trying to run X by executing startx command after init 3.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xorg-x11-drv-qxl (Show other bugs)
5.4
All Linux
high Severity high
: rc
: ---
Assigned To: Søren Sandmann Pedersen
yeylon@redhat.com
VDI
: Rebase
: 510189 (view as bug list)
Depends On:
Blocks: 5.5TechNotes-Updates 549408
  Show dependency treegraph
 
Reported: 2009-07-02 12:42 EDT by Leonid Natapov
Modified: 2016-04-18 02:21 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
As mentioned, this rebase is not a real rebase because the alternative is to ship precisely the same code as a patch.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 03:58:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ykaul: needinfo? (sandmann)


Attachments (Terms of Use)

  None (edit)
Description Leonid Natapov 2009-07-02 12:42:46 EDT
How to reproduce:
1.Connect to VM.
2.Start X by running init 5.
3.Kill X by running init 3.
4.Start X by startx command.

You'll get black screen.
Here is log output:

(II) resource ranges after preInit:
        [0] -1  0       0x00100000 - 0x3fffffff (0x3ff00000) MX[B]E(B)
        [1] -1  0       0x000f0000 - 0x000fffff (0x10000) MX[B]
        [2] -1  0       0x000c0000 - 0x000effff (0x30000) MX[B]
        [3] -1  0       0x00000000 - 0x0009ffff (0xa0000) MX[B]
        [4] -1  0       0xc1000000 - 0xc100ffff (0x10000) MX[B]
        [5] -1  0       0xc8002000 - 0xc8003fff (0x2000) MX[B](B)
        [6] -1  0       0xc8000000 - 0xc8000fff (0x1000) MX[B](B)
        [7] -1  0       0xc4000000 - 0xc7ffffff (0x4000000) MX[B](B)
        [8] 0   0       0x000a0000 - 0x000affff (0x10000) MS[B]
        [9] 0   0       0x000b0000 - 0x000b7fff (0x8000) MS[B]
        [10] 0  0       0x000b8000 - 0x000bffff (0x8000) MS[B]
        [11] -1 0       0x0000ffff - 0x0000ffff (0x1) IX[B]
        [12] -1 0       0x00000000 - 0x000000ff (0x100) IX[B]
        [13] -1 0       0x0000c800 - 0x0000c8ff (0x100) IX[B]
        [14] -1 0       0x0000c400 - 0x0000c7ff (0x400) IX[B]
        [15] -1 0       0x0000c160 - 0x0000c17f (0x20) IX[B]
        [16] -1 0       0x0000c140 - 0x0000c14f (0x10) IX[B]
        [17] -1 0       0x0000c120 - 0x0000c13f (0x20) IX[B]
        [18] -1 0       0x0000c100 - 0x0000c10f (0x10) IX[B]
        [19] -1 0       0x0000c000 - 0x0000c0ff (0x100) IX[B]
        [20] -1 0       0x0000c150 - 0x0000c157 (0x8) IX[B](B)
        [21] 0  0       0x000003b0 - 0x000003bb (0xc) IS[B]
        [22] 0  0       0x000003c0 - 0x000003df (0x20) IS[B]
(WW) qxl(0): Failed to set up write-combining range (0xc4000000,0x4000000)
(II) qxl(0): ram at 0x2b266df3d000; vram at 0x2b2671f3d000; rom at 0x2b2671f3e000
(II) qxl(0): Setting mode 9 (800 x 600) (800 x 600) 0x15ebd980
Comment 1 Leonid Natapov 2009-07-06 12:02:59 EDT
On the server side I have the following print outs in stdio file:
ioport_write: unexpected port 0x0 in vga mode
ioport_write: unexpected port 0x1 in vga mode
Comment 3 Søren Sandmann Pedersen 2009-07-15 01:59:40 EDT
I think what's going on is this:

- When you run init 3, the device is getting reset because something is 
  touching a VGA register

- X doesn't realize this, so it continues to put things in the command buffer

- The device switches to VGA mode, but there are now unprocessed commands in the buffer

- There is nothing the driver can do at this point because both commands are not being processed and both RESET and SET_MODE asserts that the buffer is empty.
Comment 4 Søren Sandmann Pedersen 2009-07-15 03:33:52 EDT
1. The something that touches a VGA port is likely vgacon in the kernel
2. This causes the device to reset to VGA mode and clear the command buffer
3. Then X submits some commands (which is what produces those messages)
4. Then X exits
5. When X starts again, it waits for the command ring to go idle before setting the mode
6. But the device is in VGA mode, so commands are not processed => deadlock.

I tried fixing this by explicitly calling reset before setting the mode, but that causes an assert because there is stuff in the command buffer.

At this point, the simplest fix is to:

  (a) have qxl_reset() not assert that the command buffer is empty
  (b) add a reset to the qxl driver at startup.

If (a) is done, I can fix (b).

I think the assert in qxl_reset() should be removed anyway. It is simply incorrect because qxl_detach() only clears the buffer when the device is not in VGA mode, the whole point of 'reset' is to get the device out of a broken state.

Since the only way to get the device out of VGA mode is to set a mode (and that asserts too), basically the QXL device completely stops working if you submit a command in VGA mode.

So reassigning to kvm.

Admittedly, QXL driver does need proper VT handling, but I'm not sure that can happen for 5.4.
Comment 5 Yaniv Kamay 2009-07-15 08:14:45 EDT
(In reply to comment #4)
> 1. The something that touches a VGA port is likely vgacon in the kernel
> 2. This causes the device to reset to VGA mode and clear the command buffer
> 3. Then X submits some commands (which is what produces those messages)

How is it that two components control the same device? looks like a BUG.

> 4. Then X exits
> 5. When X starts again, it waits for the command ring to go idle before setting
> the mode
> 6. But the device is in VGA mode, so commands are not processed => deadlock.
> 

Why it is in VGA mode, if X/driver starts again it need to reset the device
and then call set mode.

> I tried fixing this by explicitly calling reset before setting the mode, but
> that causes an assert because there is stuff in the command buffer.
> 
> At this point, the simplest fix is to:
> 
>   (a) have qxl_reset() not assert that the command buffer is empty


The assert is there for detecting malfunction like we have here.


>   (b) add a reset to the qxl driver at startup.
> 
> If (a) is done, I can fix (b).
> 
> I think the assert in qxl_reset() should be removed anyway. It is simply
> incorrect because qxl_detach() only clears the buffer when the device is not in
> VGA mode, the whole point of 'reset' is to get the device out of a broken
> state.
> 
> Since the only way to get the device out of VGA mode is to set a mode (and that
> asserts too), basically the QXL device completely stops working if you submit a
> command in VGA mode.


Sending command in VGA mode is driver BUG.

> 
> So reassigning to kvm.
> 
> Admittedly, QXL driver does need proper VT handling, but I'm not sure that can
> happen for 5.4.
Comment 7 Søren Sandmann Pedersen 2009-07-15 12:07:27 EDT
> How is it that two components control the same device? looks like a BUG.

There is a mechanism that allows those two components to cooperate, but it is not implemented for the QXL X driver yet. This is what I said here:

    "Admittedly, QXL driver does need proper VT handling, but I'm not
     sure that can happen for 5.4."

> > Since the only way to get the device out of VGA mode is to set a mode (and that
> > asserts too), basically the QXL device completely stops working if you submit a
> > command in VGA mode.

> Sending command in VGA mode is driver BUG.

Yes, but the QXL device locking up to the point where the VM becomes unusable, is a QXL bug.
Comment 8 Yaniv Kamay 2009-07-15 13:33:21 EDT
(In reply to comment #7)
> > How is it that two components control the same device? looks like a BUG.
> 
> There is a mechanism that allows those two components to cooperate, but it is
> not implemented for the QXL X driver yet. This is what I said here:
> 
>     "Admittedly, QXL driver does need proper VT handling, but I'm not
>      sure that can happen for 5.4."
> 
> > > Since the only way to get the device out of VGA mode is to set a mode (and that
> > > asserts too), basically the QXL device completely stops working if you submit a
> > > command in VGA mode.
> 
> > Sending command in VGA mode is driver BUG.
> 
> Yes, but the QXL device locking up to the point where the VM becomes unusable,
> is a QXL bug.  

This is not QXL device bug it is driver bug. It is disallowed to use QXL port
or to use device memory while in VGA mode. You need to reset and set mode on resume from VT and you need to reset the device on driver init. As I said before the assert in QXL device id there for detecting malfunction so the fix is to fix the driver and not to remove the assert.

Reassigning to x11 QXL driver.
Comment 9 Søren Sandmann Pedersen 2009-07-15 16:16:55 EDT
*** Bug 510189 has been marked as a duplicate of this bug. ***
Comment 10 Søren Sandmann Pedersen 2009-07-16 09:25:35 EDT
I can't reset because there are invalid commands in the command buffer. The only way to fix this without removing the assert is to add full VT switch support. However, this is too big a change to put in a this point, so we will need to look at this for a later release.
Comment 11 Yaniv Kamay 2009-07-16 10:01:34 EDT
(In reply to comment #10)
> I can't reset because there are invalid commands in the command buffer. The
> only way to fix this without removing the assert is to add full VT switch
> support. However, this is too big a change to put in a this point, so we will
> need to look at this for a later release.  

Why can't you call qxl reset + qxl set mode on qxlLeaveVT
Comment 12 Yaniv Kamay 2009-07-16 10:34:56 EDT
(In reply to comment #11)
> (In reply to comment #10)
> > I can't reset because there are invalid commands in the command buffer. The
> > only way to fix this without removing the assert is to add full VT switch
> > support. However, this is too big a change to put in a this point, so we will
> > need to look at this for a later release.  
> 
> Why can't you call qxl reset + qxl set mode on qxlLeaveVT 

I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you gain control, is it correct? If so then you need to reset the device on qxlEnterVT so you need to call reset before set mode in qxlSwitchMode.
Comment 13 Yaniv Kamay 2009-07-16 10:39:59 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > I can't reset because there are invalid commands in the command buffer. The
> > > only way to fix this without removing the assert is to add full VT switch
> > > support. However, this is too big a change to put in a this point, so we will
> > > need to look at this for a later release.  
> > 
> > Why can't you call qxl reset + qxl set mode on qxlLeaveVT 
> 
> I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you
> gain control, is it correct? If so then you need to reset the device on
> qxlEnterVT so you need to call reset before set mode in qxlSwitchMode.  

Why do you call qxl_ring_wait_idle in qxlSwitchMode?
Comment 14 Søren Sandmann Pedersen 2009-07-16 12:14:46 EDT
Yes, implementing these calls is what is required. It is also necessary to disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT().

> Why do you call qxl_ring_wait_idle in qxlSwitchMode?  

Because the SET_MODE ioport handler asserts if there are commands in the buffer.
Comment 15 Yaniv Kamay 2009-07-16 15:44:09 EDT
(In reply to comment #14)
> Yes, implementing these calls is what is required. It is also necessary to
> disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT().

But is already implementing, so what is the problem?
Do you send commands between qxlLeaveVT and qxlEnterVT?

> 
> > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> 
> Because the SET_MODE ioport handler asserts if there are commands in the
> buffer.  

I can't see how can it help. It only add risk for having deadlock.
Comment 16 Søren Sandmann Pedersen 2009-07-27 12:51:32 EDT
What happens is that the kernel, with no involvement from X, writes stuff to the VGA ports. This causes QXL to go to VGA mode, but the X driver doesn't know that this happened, so it continues to submit commands. This causes the QXL device to get into a state that there is no way out of.

There is a system in which the kernel first notifies X that it is about to switch away. In response, X has to do whatever it needs to do in preparation for this, then tell the kernel that it is now allowed to switch. Yes, it is dumb, but that's how it works in RHEL 5. So yes, commands will be submitted while in VGA mode; I agree this is a bug in the X driver, but it is not fixable for 5.4. We can fix it in 5.5.

Supporting this system for 5.4 is not going to happen, so if we want VT switching to work, we'll just have to disable that assert. It doesn't serve any useful purpose on a customer's system anyway, because it crashes the entire VM. It is only useful on a developer's system.

> > > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> > 
> > Because the SET_MODE ioport handler asserts if there are commands in the
> > buffer.  
> 
> I can't see how can it help. It only add risk for having deadlock.  

qxl_set_mode() crashes the VM if there are commands in the buffer. 

qxl_reset() crashes the VM if there are commands in the buffer. 

What exactly, apart from waiting for idle, do you suggest doing if the user changes the resolution at runtime?
Comment 17 Yaniv Kamay 2009-07-28 13:56:15 EDT
(In reply to comment #16)

> Supporting this system for 5.4 is not going to happen, so if we want VT
> switching to work, we'll just have to disable that assert. It doesn't serve any

ASSERT is just the symptom. It is not allowed to use VGA and QXL mode simultaneously. Having two components control the same device without
coordination is also a BUG.

> useful purpose on a customer's system anyway, because it crashes the entire VM.
> It is only useful on a developer's system.
> 
> > > > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> > > 
> > > Because the SET_MODE ioport handler asserts if there are commands in the
> > > buffer.  
> > 
> > I can't see how can it help. It only add risk for having deadlock.  
> 
> qxl_set_mode() crashes the VM if there are commands in the buffer. 
> 
> qxl_reset() crashes the VM if there are commands in the buffer. 
> 
> What exactly, apart from waiting for idle, do you suggest doing if the user
> changes the resolution at runtime?  

For what do you wait? changing resolution also reset the rings.
Comment 23 Søren Sandmann Pedersen 2009-11-06 17:11:31 EST
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
As mentioned, this rebase is not a real rebase because the alternative is to ship precisely the same code as a patch.
Comment 25 Yaniv Kaul 2009-11-08 02:28:35 EST
So, was anything fixed here? How should we test this (just re-run the test?)
Comment 26 Yaniv Kaul 2009-12-23 07:15:48 EST
Verifying - it doesn't happen anymore. I can launch X.
Tested with xorg-x11-drv-qxl-0.0.5-1.3.el5.bz549379.i386.rpm
Comment 28 errata-xmlrpc 2010-03-30 03:58:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0188.html

Note You need to log in before you can comment on or make changes to this bug.