Bug 466163 - [Stratus 5.5 bug] fbdev's use of shadow is broken resulting in very sluggish CopyArea
[Stratus 5.5 bug] fbdev's use of shadow is broken resulting in very sluggish ...
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xorg-x11-drv-fbdev (Show other bugs)
All Linux
high Severity high
: rc
: 5.5
Assigned To: Adam Jackson
: OtherQA
Depends On:
Blocks: 533941
  Show dependency treegraph
Reported: 2008-10-08 15:50 EDT by Charlotte Richardson
Modified: 2010-06-16 12:18 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-03-30 04:08:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Fix shadowfb usage in fbdev (6.39 KB, patch)
2009-05-08 17:40 EDT, Benjamin Herrenschmidt
no flags Details | Diff

  None (edit)
Description Charlotte Richardson 2008-10-08 15:50:17 EDT
Description of problem:
The framebuffer driver (fbdev) shows dreadful CopyArea performance in the Xserver version 7.1.1 that shipped with RHEL5.2 as compared to version 6.8.2 which shipped on RHEL4U7.

As you know, we use only framebuffer drivers because we must support hotplugging. We have been getting complaints about the very-sluggish performance on RHEL5.2 because it makes the whole system appear to be running very slowly. It is slow enough that it is actively hard to use.

Version-Release number of selected component (if applicable):

How reproducible: 100%

Steps to Reproduce:
If your video chip only has a framebuffer driver (for example, an Asiliant 69000), you do not need to do anything to have X use the framebuffer driver, and the problem will immediately be obvious, so skip to step 8. Otherwise...

1. If the kernel framebuffer driver for your system's video chip is not shipped built by default, build it from the kernel sources. For example, if the video chip is an ATI radeon, build radeonfb.ko.
2. Copy it into place under /lib/modules/
3. depmod
4. modprobe the framebuffer driver (e.g. radeonfb)
5. init 3
6. In /etc/X11/xorg.conf, change the Driver line from whatever it is (e.g. "radeon" or "ati") to "fbdev" to cause X to use the framebuffer driver.
7. init 5
8. Xwindows will show badly degraded bitblt (CopyArea) performance relative to the same thing running on RHEL4U7. This will be immediately obvious if you try to drag a window around on the screen using the mouse. For testing purposes, I wrote a small X application that draws a big square window, then if any key is pressed draws a square in the upper left corner of that window. The next keypress writes a coded pixel value outside of this square (for purposes of bracketing the data of interest in the PCI analyzer trace) and then does a CopyArea call to copy the square down and to the right to the bottom right corner of the big window into a nonoverlapping area and then writes another coded pixel value (for the analyzer) outside the interesting area. This app runs with XSynchronize set so that the operations occur in the requested order.
Actual results:

Using the Xserver from RHEL5.2, the CopyArea operation is obviously degraded in performance, and you can easily see that it is being copied over from bottom to top (done in case the source and destination were to overlap, though tin this case they do not). The PCI analyzer trace shows that this is exactly what is happening. For each pixel involved in the CopyArea, the source address appears on the bus and then the original pixel data (reading from the framebuffer is a very slow operation, which is what is degrading the performance), and then the destination address, and finally the pixel being written out (which is fast).

This happens in spite of efforts on the part of fbdev to use the miextension shadow. That extension was completely rewritten between Xserver version 6.8.2 and 7.1.1, which broke fbdev's use of it in the version of fbdev (0.3.0) picked up for RHEL5.2.

Expected results:

On the Xserver on RHEL4U7, window dragging is quick, and this CopyArea operation in the test application is almost instantaneous. The PCI analyzer trace showed that the data is only being written to the framebuffer, not being read from it, and it is written in ascending order (from top to bottom), not the order the drawing is actually being done in.

This because the fbdev X video driver code uses the shadow miextension and then updates the data on the screen using the damage mechanism, which has kept track of the areas that were drawn into in the main-memory copy of the data that is actually being written to and then outputs them to the real framebuffer in ascending order using a BlockHandler. That's the code that was changed in shadow, and fbdev's use of it was broken in fbdev version 0.3.0.

Additional info:

This is a bug in xf86-video-fbdev-0.3.0 which has been fixed in version 0.3.1 in the X.org archives upstream. I tested with version 0.4.0 (which has a couple of additional unrelated fixes also) and verified that the problem is fixed (both visually and confirming it with the PCI analyzer). The upstream version of fbdev behaves the same way as the version from RHEL4U7 and Xserver 6.8.2.

Please update to the upstream version of fbdev.
Comment 1 Adam Jackson 2008-10-09 14:34:27 EDT
Adding to 5.4 radar.
Comment 6 RHEL Product and Program Management 2009-02-03 18:14:43 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 9 Andrius Benokraitis 2009-02-17 13:08:10 EST
Question for Stratus: How important is this for you all in RHEL 5.4? Currently this package is not slated to be updated in RHEL 5.4 and therefore needs substantial business justification to proceed.
Comment 10 Jim Paradis 2009-02-17 14:31:33 EST
Chas is on vacation so don't take this as Stratus' official position:  My opinion is that this is a fairly important issue since the seeriously sluggish interactive performance can create a negative perception of our fault-tolerant platform.
Comment 12 Chas Horvath 2009-02-24 16:35:34 EST
This issue has caused pain for Stratus customers per the report below and makes our product look bad even though the functional impact is minimal.  While desirable to get the fix in 5.4, but Stratus can wait until 5.5. for this fix:

(1) What Customer(s) are affected by this?

       Customer/Distributor: Panasonic, Japan

(2) What is the impact/Customer view of this problem?

    Slower graphics windows drawing performance;
    Although it is not a serious issue, customer complained  as it is obvious on
    the console.

(3) What if any impact there will be if this is not fixed in the RHEL 5.4 release?

     Apart from slower graphics display performance on console  no further
     problem reported.

>(4) Any other useful info you wish to add.

     There is a workaround and hence, the customer can manage for time being but
     customer expect such issues to be fixed at least in forth coming release
     such as RHEL 5.4 or at least 5.5.
Comment 14 Andrius Benokraitis 2009-03-05 14:01:11 EST
Thanks for the comments Chas - we are unfortunately out of time and out of resources for this anyways. Deferring to 5.5.
Comment 15 Benjamin Herrenschmidt 2009-05-06 21:17:13 EDT
This is also hurting all POWER machines since fbdev is all that is supported on them
Comment 16 Charlotte Richardson 2009-05-07 09:34:49 EDT
At this point we are planning on replacing the xorg-x11-drv-fbdev rpm for our customers with one patched to the upstream (X.Org) version 0.4.0 (probably) to get rid of the perceived performance issue for our new platform, since there have been several complaints about it on earlier platforms (running RHEL5.2 and RHEL5.3). I expect we will also do so for the earlier platforms. It is easier to do now with X11R7 because all of the components of X are split out to separate RPMs (OK, it is a nuisance if you are trying to build a debug version of all of X, but it is useful if you want to replace a small piece like fbdev without having to build everything else).

Comment 17 Benjamin Herrenschmidt 2009-05-07 20:06:00 EDT
I've locally done a patch applying most of the upstream changes from 4.0 except some build-system related bits on top of the RHEL5.3 variant of fbdev, and verified that it fixes the problem here on POWER. I'll attach a patch shortly.
Comment 18 Charlotte Richardson 2009-05-08 09:31:10 EDT
Hi, Benjamin -

Thanks! It was in our last night's build for the first time, and I was about to go out to the lab and make sure it installed OK before doing so, but since you're already on top of it, just put your patch file here.

I think the interaction with shadow is actually fixed in 0.3.1 or thereabouts, but I didn't try that version of fbdev. I tried 0.4.1 and 0.4.0 (which is what we went with).

Comment 19 Benjamin Herrenschmidt 2009-05-08 17:38:57 EDT
Here's the patch I've applied. Among others, there's still a difference with upstream around the call to xf86SetDepthBpp() due to an explicit patch from RedHat which could use some explanations, so I didn't touch that.
Comment 20 Benjamin Herrenschmidt 2009-05-08 17:40:12 EDT
Created attachment 343180 [details]
Fix shadowfb usage in fbdev
Comment 21 IBM Bug Proxy 2009-05-13 12:01:05 EDT
------- Comment From mreed10@us.ibm.com 2009-05-13 10:56 EDT-------
---Problem Description---
The graphics performance when using the RHEL 5 desktop is completely unusable when using the fbdev driver.   The fbdev driver is the only working driver on RHEL 5 for Power.  When I try to move a window across the desktop, it takes an unusually long time to redraw on another area of the screen.
Contact Information = mreed10@us.ibm.com

---Additional Hardware Info---
Matrox GXT145 card was used

---uname output---
uname -a Linux devhv4e-phantom-lp4.austin.ibm.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:58:

Machine Type = P520

A debugger is not configured

---Steps to Reproduce---
Type startx at the command line

---Kernel - Drivers Component Data---
Stack trace output:

Oops output:

System Dump Info:
The system is not configured to capture a system dump.

*Additional Instructions for mreed10@us.ibm.com:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach sysctl -a output output to the bug.
Comment 23 Chris Ward 2009-10-13 09:59:01 EDT
@IBM, @kernel.crashing.org, @Stratus

We would like to confirm that there is commitment to test 
for the resolution of this issue once we have an updated test 
build ready. 

Please post a confirmation to this bugzilla before Oct 16th, 2009, 
including the contact information for testing engineers.
Comment 24 Charlotte Richardson 2009-10-13 14:26:37 EDT
We can test it here at Stratus. (Jim Paradis may be able to test it onsite in Westford also.) We'd be real happy to get rid of this one as it makes affected systems look very sluggish. Email me at charlotte.richardson@stratus.com.

Comment 26 Adam Jackson 2009-11-23 12:00:17 EST
Built xorg-x11-drv-fbdev-0.3.0-3.el5

Comment 28 Chris Ward 2010-02-11 05:25:12 EST
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
Comment 29 Charlotte Richardson 2010-02-12 11:51:05 EST
Startus confirms that going from fbdev-x11-drv-fbdev-0.3.0-2 in RHEL5.4 to fbdev-x11-drv-fbdev-0.3.0-3 in RHEL5.5-Beta gets rid of this performance issue. The performance of CopyArea for framebuffer drivers is back to normal (good riddance to it!), which is most visible when you drag windows. Thanks!
Comment 32 errata-xmlrpc 2010-03-30 04:08:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Comment 33 IBM Bug Proxy 2010-06-16 12:18:16 EDT
------- Comment From edpollar@linux.ibm.com 2010-06-16 11:58 EDT-------
reassigning qa....

Note You need to log in before you can comment on or make changes to this bug.