How to reproduce: 1.Connect to VM. 2.Start X by running init 5. 3.Kill X by running init 3. 4.Start X by startx command. You'll get black screen. Here is log output: (II) resource ranges after preInit: [0] -1 0 0x00100000 - 0x3fffffff (0x3ff00000) MX[B]E(B) [1] -1 0 0x000f0000 - 0x000fffff (0x10000) MX[B] [2] -1 0 0x000c0000 - 0x000effff (0x30000) MX[B] [3] -1 0 0x00000000 - 0x0009ffff (0xa0000) MX[B] [4] -1 0 0xc1000000 - 0xc100ffff (0x10000) MX[B] [5] -1 0 0xc8002000 - 0xc8003fff (0x2000) MX[B](B) [6] -1 0 0xc8000000 - 0xc8000fff (0x1000) MX[B](B) [7] -1 0 0xc4000000 - 0xc7ffffff (0x4000000) MX[B](B) [8] 0 0 0x000a0000 - 0x000affff (0x10000) MS[B] [9] 0 0 0x000b0000 - 0x000b7fff (0x8000) MS[B] [10] 0 0 0x000b8000 - 0x000bffff (0x8000) MS[B] [11] -1 0 0x0000ffff - 0x0000ffff (0x1) IX[B] [12] -1 0 0x00000000 - 0x000000ff (0x100) IX[B] [13] -1 0 0x0000c800 - 0x0000c8ff (0x100) IX[B] [14] -1 0 0x0000c400 - 0x0000c7ff (0x400) IX[B] [15] -1 0 0x0000c160 - 0x0000c17f (0x20) IX[B] [16] -1 0 0x0000c140 - 0x0000c14f (0x10) IX[B] [17] -1 0 0x0000c120 - 0x0000c13f (0x20) IX[B] [18] -1 0 0x0000c100 - 0x0000c10f (0x10) IX[B] [19] -1 0 0x0000c000 - 0x0000c0ff (0x100) IX[B] [20] -1 0 0x0000c150 - 0x0000c157 (0x8) IX[B](B) [21] 0 0 0x000003b0 - 0x000003bb (0xc) IS[B] [22] 0 0 0x000003c0 - 0x000003df (0x20) IS[B] (WW) qxl(0): Failed to set up write-combining range (0xc4000000,0x4000000) (II) qxl(0): ram at 0x2b266df3d000; vram at 0x2b2671f3d000; rom at 0x2b2671f3e000 (II) qxl(0): Setting mode 9 (800 x 600) (800 x 600) 0x15ebd980
On the server side I have the following print outs in stdio file: ioport_write: unexpected port 0x0 in vga mode ioport_write: unexpected port 0x1 in vga mode
I think what's going on is this: - When you run init 3, the device is getting reset because something is touching a VGA register - X doesn't realize this, so it continues to put things in the command buffer - The device switches to VGA mode, but there are now unprocessed commands in the buffer - There is nothing the driver can do at this point because both commands are not being processed and both RESET and SET_MODE asserts that the buffer is empty.
1. The something that touches a VGA port is likely vgacon in the kernel 2. This causes the device to reset to VGA mode and clear the command buffer 3. Then X submits some commands (which is what produces those messages) 4. Then X exits 5. When X starts again, it waits for the command ring to go idle before setting the mode 6. But the device is in VGA mode, so commands are not processed => deadlock. I tried fixing this by explicitly calling reset before setting the mode, but that causes an assert because there is stuff in the command buffer. At this point, the simplest fix is to: (a) have qxl_reset() not assert that the command buffer is empty (b) add a reset to the qxl driver at startup. If (a) is done, I can fix (b). I think the assert in qxl_reset() should be removed anyway. It is simply incorrect because qxl_detach() only clears the buffer when the device is not in VGA mode, the whole point of 'reset' is to get the device out of a broken state. Since the only way to get the device out of VGA mode is to set a mode (and that asserts too), basically the QXL device completely stops working if you submit a command in VGA mode. So reassigning to kvm. Admittedly, QXL driver does need proper VT handling, but I'm not sure that can happen for 5.4.
(In reply to comment #4) > 1. The something that touches a VGA port is likely vgacon in the kernel > 2. This causes the device to reset to VGA mode and clear the command buffer > 3. Then X submits some commands (which is what produces those messages) How is it that two components control the same device? looks like a BUG. > 4. Then X exits > 5. When X starts again, it waits for the command ring to go idle before setting > the mode > 6. But the device is in VGA mode, so commands are not processed => deadlock. > Why it is in VGA mode, if X/driver starts again it need to reset the device and then call set mode. > I tried fixing this by explicitly calling reset before setting the mode, but > that causes an assert because there is stuff in the command buffer. > > At this point, the simplest fix is to: > > (a) have qxl_reset() not assert that the command buffer is empty The assert is there for detecting malfunction like we have here. > (b) add a reset to the qxl driver at startup. > > If (a) is done, I can fix (b). > > I think the assert in qxl_reset() should be removed anyway. It is simply > incorrect because qxl_detach() only clears the buffer when the device is not in > VGA mode, the whole point of 'reset' is to get the device out of a broken > state. > > Since the only way to get the device out of VGA mode is to set a mode (and that > asserts too), basically the QXL device completely stops working if you submit a > command in VGA mode. Sending command in VGA mode is driver BUG. > > So reassigning to kvm. > > Admittedly, QXL driver does need proper VT handling, but I'm not sure that can > happen for 5.4.
> How is it that two components control the same device? looks like a BUG. There is a mechanism that allows those two components to cooperate, but it is not implemented for the QXL X driver yet. This is what I said here: "Admittedly, QXL driver does need proper VT handling, but I'm not sure that can happen for 5.4." > > Since the only way to get the device out of VGA mode is to set a mode (and that > > asserts too), basically the QXL device completely stops working if you submit a > > command in VGA mode. > Sending command in VGA mode is driver BUG. Yes, but the QXL device locking up to the point where the VM becomes unusable, is a QXL bug.
(In reply to comment #7) > > How is it that two components control the same device? looks like a BUG. > > There is a mechanism that allows those two components to cooperate, but it is > not implemented for the QXL X driver yet. This is what I said here: > > "Admittedly, QXL driver does need proper VT handling, but I'm not > sure that can happen for 5.4." > > > > Since the only way to get the device out of VGA mode is to set a mode (and that > > > asserts too), basically the QXL device completely stops working if you submit a > > > command in VGA mode. > > > Sending command in VGA mode is driver BUG. > > Yes, but the QXL device locking up to the point where the VM becomes unusable, > is a QXL bug. This is not QXL device bug it is driver bug. It is disallowed to use QXL port or to use device memory while in VGA mode. You need to reset and set mode on resume from VT and you need to reset the device on driver init. As I said before the assert in QXL device id there for detecting malfunction so the fix is to fix the driver and not to remove the assert. Reassigning to x11 QXL driver.
*** Bug 510189 has been marked as a duplicate of this bug. ***
I can't reset because there are invalid commands in the command buffer. The only way to fix this without removing the assert is to add full VT switch support. However, this is too big a change to put in a this point, so we will need to look at this for a later release.
(In reply to comment #10) > I can't reset because there are invalid commands in the command buffer. The > only way to fix this without removing the assert is to add full VT switch > support. However, this is too big a change to put in a this point, so we will > need to look at this for a later release. Why can't you call qxl reset + qxl set mode on qxlLeaveVT
(In reply to comment #11) > (In reply to comment #10) > > I can't reset because there are invalid commands in the command buffer. The > > only way to fix this without removing the assert is to add full VT switch > > support. However, this is too big a change to put in a this point, so we will > > need to look at this for a later release. > > Why can't you call qxl reset + qxl set mode on qxlLeaveVT I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you gain control, is it correct? If so then you need to reset the device on qxlEnterVT so you need to call reset before set mode in qxlSwitchMode.
(In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #10) > > > I can't reset because there are invalid commands in the command buffer. The > > > only way to fix this without removing the assert is to add full VT switch > > > support. However, this is too big a change to put in a this point, so we will > > > need to look at this for a later release. > > > > Why can't you call qxl reset + qxl set mode on qxlLeaveVT > > I assume qxlLeaveVT is called when you lose control and qxlEnterVT when you > gain control, is it correct? If so then you need to reset the device on > qxlEnterVT so you need to call reset before set mode in qxlSwitchMode. Why do you call qxl_ring_wait_idle in qxlSwitchMode?
Yes, implementing these calls is what is required. It is also necessary to disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT(). > Why do you call qxl_ring_wait_idle in qxlSwitchMode? Because the SET_MODE ioport handler asserts if there are commands in the buffer.
(In reply to comment #14) > Yes, implementing these calls is what is required. It is also necessary to > disable command submission in qxlLeaveVT() and reneable it in qxlEnterVT(). But is already implementing, so what is the problem? Do you send commands between qxlLeaveVT and qxlEnterVT? > > > Why do you call qxl_ring_wait_idle in qxlSwitchMode? > > Because the SET_MODE ioport handler asserts if there are commands in the > buffer. I can't see how can it help. It only add risk for having deadlock.
What happens is that the kernel, with no involvement from X, writes stuff to the VGA ports. This causes QXL to go to VGA mode, but the X driver doesn't know that this happened, so it continues to submit commands. This causes the QXL device to get into a state that there is no way out of. There is a system in which the kernel first notifies X that it is about to switch away. In response, X has to do whatever it needs to do in preparation for this, then tell the kernel that it is now allowed to switch. Yes, it is dumb, but that's how it works in RHEL 5. So yes, commands will be submitted while in VGA mode; I agree this is a bug in the X driver, but it is not fixable for 5.4. We can fix it in 5.5. Supporting this system for 5.4 is not going to happen, so if we want VT switching to work, we'll just have to disable that assert. It doesn't serve any useful purpose on a customer's system anyway, because it crashes the entire VM. It is only useful on a developer's system. > > > Why do you call qxl_ring_wait_idle in qxlSwitchMode? > > > > Because the SET_MODE ioport handler asserts if there are commands in the > > buffer. > > I can't see how can it help. It only add risk for having deadlock. qxl_set_mode() crashes the VM if there are commands in the buffer. qxl_reset() crashes the VM if there are commands in the buffer. What exactly, apart from waiting for idle, do you suggest doing if the user changes the resolution at runtime?
(In reply to comment #16) > Supporting this system for 5.4 is not going to happen, so if we want VT > switching to work, we'll just have to disable that assert. It doesn't serve any ASSERT is just the symptom. It is not allowed to use VGA and QXL mode simultaneously. Having two components control the same device without coordination is also a BUG. > useful purpose on a customer's system anyway, because it crashes the entire VM. > It is only useful on a developer's system. > > > > > Why do you call qxl_ring_wait_idle in qxlSwitchMode? > > > > > > Because the SET_MODE ioport handler asserts if there are commands in the > > > buffer. > > > > I can't see how can it help. It only add risk for having deadlock. > > qxl_set_mode() crashes the VM if there are commands in the buffer. > > qxl_reset() crashes the VM if there are commands in the buffer. > > What exactly, apart from waiting for idle, do you suggest doing if the user > changes the resolution at runtime? For what do you wait? changing resolution also reset the rings.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: As mentioned, this rebase is not a real rebase because the alternative is to ship precisely the same code as a patch.
So, was anything fixed here? How should we test this (just re-run the test?)
Verifying - it doesn't happen anymore. I can launch X. Tested with xorg-x11-drv-qxl-0.0.5-1.3.el5.bz549379.i386.rpm
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0188.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days