Bug 509410

Summary: Black screen when trying to run X by executing startx command after init 3.
Product: Red Hat Enterprise Linux 5 Reporter: Leonid Natapov <lnatapov>
Component: xorg-x11-drv-qxlAssignee: Søren Sandmann Pedersen <sandmann>
Status: CLOSED ERRATA QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: clasohm, cpelland, kem, Rhev-m-bugs, riek, rlerch, sandmann, srevivo, virt-maint, ykamay, ykaul
Target Milestone: rcKeywords: Rebase
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: VDI
Fixed In Version: Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
As mentioned, this rebase is not a real rebase because the alternative is to ship precisely the same code as a patch.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:58:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 541103, 549408    

Description Leonid Natapov 2009-07-02 16:42:46 UTC
How to reproduce:
1.Connect to VM.
2.Start X by running init 5.
3.Kill X by running init 3.
4.Start X by startx command.

You'll get black screen.
Here is log output:

(II) resource ranges after preInit:
        [0] -1  0       0x00100000 - 0x3fffffff (0x3ff00000) MX[B]E(B)
        [1] -1  0       0x000f0000 - 0x000fffff (0x10000) MX[B]
        [2] -1  0       0x000c0000 - 0x000effff (0x30000) MX[B]
        [3] -1  0       0x00000000 - 0x0009ffff (0xa0000) MX[B]
        [4] -1  0       0xc1000000 - 0xc100ffff (0x10000) MX[B]
        [5] -1  0       0xc8002000 - 0xc8003fff (0x2000) MX[B](B)
        [6] -1  0       0xc8000000 - 0xc8000fff (0x1000) MX[B](B)
        [7] -1  0       0xc4000000 - 0xc7ffffff (0x4000000) MX[B](B)
        [8] 0   0       0x000a0000 - 0x000affff (0x10000) MS[B]
        [9] 0   0       0x000b0000 - 0x000b7fff (0x8000) MS[B]
        [10] 0  0       0x000b8000 - 0x000bffff (0x8000) MS[B]
        [11] -1 0       0x0000ffff - 0x0000ffff (0x1) IX[B]
        [12] -1 0       0x00000000 - 0x000000ff (0x100) IX[B]
        [13] -1 0       0x0000c800 - 0x0000c8ff (0x100) IX[B]
        [14] -1 0       0x0000c400 - 0x0000c7ff (0x400) IX[B]
        [15] -1 0       0x0000c160 - 0x0000c17f (0x20) IX[B]
        [16] -1 0       0x0000c140 - 0x0000c14f (0x10) IX[B]
        [17] -1 0       0x0000c120 - 0x0000c13f (0x20) IX[B]
        [18] -1 0       0x0000c100 - 0x0000c10f (0x10) IX[B]
        [19] -1 0       0x0000c000 - 0x0000c0ff (0x100) IX[B]
        [20] -1 0       0x0000c150 - 0x0000c157 (0x8) IX[B](B)
        [21] 0  0       0x000003b0 - 0x000003bb (0xc) IS[B]
        [22] 0  0       0x000003c0 - 0x000003df (0x20) IS[B]
(WW) qxl(0): Failed to set up write-combining range (0xc4000000,0x4000000)
(II) qxl(0): ram at 0x2b266df3d000; vram at 0x2b2671f3d000; rom at 0x2b2671f3e000
(II) qxl(0): Setting mode 9 (800 x 600) (800 x 600) 0x15ebd980

Comment 1 Leonid Natapov 2009-07-06 16:02:59 UTC
On the server side I have the following print outs in stdio file:
ioport_write: unexpected port 0x0 in vga mode
ioport_write: unexpected port 0x1 in vga mode

Comment 3 Søren Sandmann Pedersen 2009-07-15 05:59:40 UTC
I think what's going on is this:

- When you run init 3, the device is getting reset because something is 
  touching a VGA register

- X doesn't realize this, so it continues to put things in the command buffer

- The device switches to VGA mode, but there are now unprocessed commands in the buffer

- There is nothing the driver can do at this point because both commands are not being processed and both RESET and SET_MODE asserts that the buffer is empty.

Comment 4 Søren Sandmann Pedersen 2009-07-15 07:33:52 UTC
1. The something that touches a VGA port is likely vgacon in the kernel
2. This causes the device to reset to VGA mode and clear the command buffer
3. Then X submits some commands (which is what produces those messages)
4. Then X exits
5. When X starts again, it waits for the command ring to go idle before setting the mode
6. But the device is in VGA mode, so commands are not processed => deadlock.

I tried fixing this by explicitly calling reset before setting the mode, but that causes an assert because there is stuff in the command buffer.

At this point, the simplest fix is to:

  (a) have qxl_reset() not assert that the command buffer is empty
  (b) add a reset to the qxl driver at startup.

If (a) is done, I can fix (b).

I think the assert in qxl_reset() should be removed anyway. It is simply incorrect because qxl_detach() only clears the buffer when the device is not in VGA mode, the whole point of 'reset' is to get the device out of a broken state.

Since the only way to get the device out of VGA mode is to set a mode (and that asserts too), basically the QXL device completely stops working if you submit a command in VGA mode.

So reassigning to kvm.

Admittedly, QXL driver does need proper VT handling, but I'm not sure that can happen for 5.4.

Comment 5 Yaniv Kamay 2009-07-15 12:14:45 UTC
(In reply to comment #4)
> 1. The something that touches a VGA port is likely vgacon in the kernel
> 2. This causes the device to reset to VGA mode and clear the command buffer
> 3. Then X submits some commands (which is what produces those messages)

How is it that two components control the same device? looks like a BUG.

> 4. Then X exits
> 5. When X starts again, it waits for the command ring to go idle before setting
> the mode
> 6. But the device is in VGA mode, so commands are not processed => deadlock.
> 

Why it is in VGA mode, if X/driver starts again it need to reset the device
and then call set mode.

> I tried fixing this by explicitly calling reset before setting the mode, but
> that causes an assert because there is stuff in the command buffer.
> 
> At this point, the simplest fix is to:
> 
>   (a) have qxl_reset() not assert that the command buffer is empty


The assert is there for detecting malfunction like we have here.


>   (b) add a reset to the qxl driver at startup.
> 
> If (a) is done, I can fix (b).
> 
> I think the assert in qxl_reset() should be removed anyway. It is simply
> incorrect because qxl_detach() only clears the buffer when the device is not in
> VGA mode, the whole point of 'reset' is to get the device out of a broken
> state.
> 
> Since the only way to get the device out of VGA mode is to set a mode (and that
> asserts too), basically the QXL device completely stops working if you submit a
> command in VGA mode.


Sending command in VGA mode is driver BUG.

> 
> So reassigning to kvm.
> 
> Admittedly, QXL driver does need proper VT handling, but I'm not sure that can
> happen for 5.4.

Comment 7 Søren Sandmann Pedersen 2009-07-15 16:07:27 UTC
> How is it that two components control the same device? looks like a BUG.

There is a mechanism that allows those two components to cooperate, but it is not implemented for the QXL X driver yet. This is what I said here:

    "Admittedly, QXL driver does need proper VT handling, but I'm not
     sure that can happen for 5.4."

> > Since the only way to get the device out of VGA mode is to set a mode (and that
> > asserts too), basically the QXL device completely stops working if you submit a
> > command in VGA mode.

> Sending command in VGA mode is driver BUG.

Yes, but the QXL device locking up to the point where the VM becomes unusable, is a QXL bug.

Comment 8 Yaniv Kamay 2009-07-15 17:33:21 UTC
(In reply to comment #7)
> > How is it that two components control the same device? looks like a BUG.
> 
> There is a mechanism that allows those two components to cooperate, but it is
> not implemented for the QXL X driver yet. This is what I said here:
> 
>     "Admittedly, QXL driver does need proper VT handling, but I'm not
>      sure that can happen for 5.4."
> 
> > > Since the only way to get the device out of VGA mode is to set a mode (and that
> > > asserts too), basically the QXL device completely stops working if you submit a
> > > command in VGA mode.
> 
> > Sending command in VGA mode is driver BUG.
> 
> Yes, but the QXL device locking up to the point where the VM becomes unusable,
> is a QXL bug.  

This is not QXL device bug it is driver bug. It is disallowed to use QXL port
or to use device memory while in VGA mode. You need to reset and set mode on resume from VT and you need to reset the device on driver init. As I said before the assert in QXL device id there for detecting malfunction so the fix is to fix the driver and not to remove the assert.

Reassigning to x11 QXL driver.

Comment 9 Søren Sandmann Pedersen 2009-07-15 20:16:55 UTC
*** Bug 510189 has been marked as a duplicate of this bug. ***

Comment 10 Søren Sandmann Pedersen 2009-07-16 13:25:35 UTC
I can't reset because there are invalid commands in the command buffer. The only way to fix this without removing the assert is to add full VT switch support. However, this is too big a change to put in a this point, so we will need to look at this for a later release.

Comment 11 Yaniv Kamay 2009-07-16 14:01:34 UTC
(In reply to comment #10)
> I can't reset because there are invalid commands in the command buffer. The
> only way to fix this without removing the assert is to add full VT switch
> support. However, this is too big a change to put in a this point, so we will
> need to look at this for a later release.  

Why can't you call qxl reset + qxl set mode on qxlLeaveVT

Comment 12 Yaniv Kamay 2009-07-16 14:34:56 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > I can't reset because there are invalid commands in the command buffer. The
> > only way to fix this without removing the assert is to add full VT switch
> > support. However, this is too big a change to put in a this point, so we will
> > need to look at this for a later release.  
> 
> Why can't you call qxl reset + qxl set mode on qxlLeaveVT 

I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you gain control, is it correct? If so then you need to reset the device on qxlEnterVT so you need to call reset before set mode in qxlSwitchMode.

Comment 13 Yaniv Kamay 2009-07-16 14:39:59 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > I can't reset because there are invalid commands in the command buffer. The
> > > only way to fix this without removing the assert is to add full VT switch
> > > support. However, this is too big a change to put in a this point, so we will
> > > need to look at this for a later release.  
> > 
> > Why can't you call qxl reset + qxl set mode on qxlLeaveVT 
> 
> I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you
> gain control, is it correct? If so then you need to reset the device on
> qxlEnterVT so you need to call reset before set mode in qxlSwitchMode.  

Why do you call qxl_ring_wait_idle in qxlSwitchMode?

Comment 14 Søren Sandmann Pedersen 2009-07-16 16:14:46 UTC
Yes, implementing these calls is what is required. It is also necessary to disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT().

> Why do you call qxl_ring_wait_idle in qxlSwitchMode?  

Because the SET_MODE ioport handler asserts if there are commands in the buffer.

Comment 15 Yaniv Kamay 2009-07-16 19:44:09 UTC
(In reply to comment #14)
> Yes, implementing these calls is what is required. It is also necessary to
> disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT().

But is already implementing, so what is the problem?
Do you send commands between qxlLeaveVT and qxlEnterVT?

> 
> > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> 
> Because the SET_MODE ioport handler asserts if there are commands in the
> buffer.  

I can't see how can it help. It only add risk for having deadlock.

Comment 16 Søren Sandmann Pedersen 2009-07-27 16:51:32 UTC
What happens is that the kernel, with no involvement from X, writes stuff to the VGA ports. This causes QXL to go to VGA mode, but the X driver doesn't know that this happened, so it continues to submit commands. This causes the QXL device to get into a state that there is no way out of.

There is a system in which the kernel first notifies X that it is about to switch away. In response, X has to do whatever it needs to do in preparation for this, then tell the kernel that it is now allowed to switch. Yes, it is dumb, but that's how it works in RHEL 5. So yes, commands will be submitted while in VGA mode; I agree this is a bug in the X driver, but it is not fixable for 5.4. We can fix it in 5.5.

Supporting this system for 5.4 is not going to happen, so if we want VT switching to work, we'll just have to disable that assert. It doesn't serve any useful purpose on a customer's system anyway, because it crashes the entire VM. It is only useful on a developer's system.

> > > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> > 
> > Because the SET_MODE ioport handler asserts if there are commands in the
> > buffer.  
> 
> I can't see how can it help. It only add risk for having deadlock.  

qxl_set_mode() crashes the VM if there are commands in the buffer. 

qxl_reset() crashes the VM if there are commands in the buffer. 

What exactly, apart from waiting for idle, do you suggest doing if the user changes the resolution at runtime?

Comment 17 Yaniv Kamay 2009-07-28 17:56:15 UTC
(In reply to comment #16)

> Supporting this system for 5.4 is not going to happen, so if we want VT
> switching to work, we'll just have to disable that assert. It doesn't serve any

ASSERT is just the symptom. It is not allowed to use VGA and QXL mode simultaneously. Having two components control the same device without
coordination is also a BUG.

> useful purpose on a customer's system anyway, because it crashes the entire VM.
> It is only useful on a developer's system.
> 
> > > > Why do you call qxl_ring_wait_idle in qxlSwitchMode?  
> > > 
> > > Because the SET_MODE ioport handler asserts if there are commands in the
> > > buffer.  
> > 
> > I can't see how can it help. It only add risk for having deadlock.  
> 
> qxl_set_mode() crashes the VM if there are commands in the buffer. 
> 
> qxl_reset() crashes the VM if there are commands in the buffer. 
> 
> What exactly, apart from waiting for idle, do you suggest doing if the user
> changes the resolution at runtime?  

For what do you wait? changing resolution also reset the rings.

Comment 23 Søren Sandmann Pedersen 2009-11-06 22:11:31 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
As mentioned, this rebase is not a real rebase because the alternative is to ship precisely the same code as a patch.

Comment 25 Yaniv Kaul 2009-11-08 07:28:35 UTC
So, was anything fixed here? How should we test this (just re-run the test?)

Comment 26 Yaniv Kaul 2009-12-23 12:15:48 UTC
Verifying - it doesn't happen anymore. I can launch X.
Tested with xorg-x11-drv-qxl-0.0.5-1.3.el5.bz549379.i386.rpm

Comment 28 errata-xmlrpc 2010-03-30 07:58:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0188.html

Comment 29 Red Hat Bugzilla 2023-09-14 01:17:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days