Bug 83472 - Compiling inside a gnome-terminal is much slower than with a vanilla kernel
Summary: Compiling inside a gnome-terminal is much slower than with a vanilla kernel
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Raw Hide
Classification: Retired
Component: vte
Version: 1.0
Hardware: athlon
OS: Linux
high
high
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Brian Brock
URL:
Whiteboard:
: 81999 84298 (view as bug list)
Depends On:
Blocks: 79578
TreeView+ depends on / blocked
 
Reported: 2003-02-04 18:35 UTC by Pekka Pietikäinen
Modified: 2005-10-31 22:00 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-06-15 20:19:58 UTC
Embargoed:


Attachments (Terms of Use)
VERY quick and dirty read coalescing patch (1.01 KB, patch)
2003-02-14 01:23 UTC, Matt Wilson
no flags Details | Diff
reduce invalidates (452 bytes, patch)
2003-02-17 09:16 UTC, Brian Ryner
no flags Details | Diff

Description Pekka Pietikäinen 2003-02-04 18:35:31 UTC
I recently noticed that compiles in X were a lot slower than in text mode.
For example, xine-lib takes 7 mins in text mode (or a xterm), but 10:30 with
a gnome-terminal in 2.4.20-2.33.

Fair enough, gnome-terminal might be a pig when it comes to performance, but I
did the same test with a vanilla 2.4.21pre4 and it only took 9:15 and it
does feel faster too when using it. The machine is a Athlon XP 2000, so the
terminal really is the bottleneck...

Giving X less priority seems to "fix" this problem, in a similar test (catting a
very large text file), renicing it to +10 made performance just about identical
(25 secs), with the default it was 30 and -10 made it 45s.

Comment 1 Ingo Molnar 2003-02-10 17:19:23 UTC
after lots of debugging on the kernel/scheduler side, this turned out to be an
XFree86 or gnome-terminal problem.

whenever the 'slow motion gnome-terminal' problem happens, X's CPU usage shots
up, spending roughly 40 msecs of CPU time per line rendered on a 2.2 GHz P4
equipped with a Radeon 8500, which is _way_ too much overhead.

the easiest way to trigger this phenomenon was to type 'w' in gnome-terminal,
the slow-motion effect happens almost all of the time. (while eg. 'ls' sometimes
scrolls fast, sometimes slow.)

timing differences caused by scheduler changes might have triggered this race -
but in any case neither xterm nor kconsole (which is an anti-aliased console)
show this problem.


Comment 2 Ingo Molnar 2003-02-13 20:40:10 UTC
I've done some more debugging. The short story:

90% of the CPU overhead is in the /usr/X11R6/lib/modules/libfb.a module, funtion
fbCompositeSolidMask_nx8x0888, relative offset 0x4a0, using
XFree86-4.2.99.901-20030211.0. It's the movzwl (%ebx),%edi bulk-memory operation
that gets interrupted.

the long story: i attached to X via gdb while gnome-terminal was showing the
'slow motion' bug. In the majority of the cases the backtrace looked exactly
like this:

 (gdb) bt
 #0  0x0868f89b in ?? ()
 #1  0x08690ec0 in ?? ()
 #2  0x086de4dc in ?? ()
 #3  0x08193ad1 in miSpriteComposite ()
 #4  0x086ffea2 in ?? ()
 #5  0x0815f431 in CompositePicture ()
 #6  0x0815b070 in miGlyphs ()
 #7  0x086dec87 in ?? ()
 #8  0x08193cd2 in miSpriteGlyphs ()
 #9  0x0815f4c0 in CompositeGlyphs ()
 #10 0x081618be in ProcRenderCompositeGlyphs ()
 #11 0x081627cd in ProcRenderDispatch ()
 #12 0x080b95b2 in Dispatch ()
 #13 0x080cc050 in main ()
 #14 0x420154a0 in __libc_start_main () from /lib/tls/libc.so.6

ie. X is in compositing. Note that it's almost always the instruction at
0x0868f89b that is interrupted. During X's "high overhead" load-spikes, this is
what is visible in strace:

 11008 1045164346.569097 ioctl(7, 0x6444, 0) = 0 <0.000211>
 11008 1045164346.577201 --- SIGALRM (Alarm clock) @ 0 (0) ---
 11008 1045164346.577281 sigreturn()     = ? (mask now []) <0.000009>
 11008 1045164346.596760 --- SIGALRM (Alarm clock) @ 0 (0) ---
 11008 1045164346.596810 sigreturn()     = ? (mask now []) <0.000008>
 11008 1045164346.609278 read(21,"\0\0\5\2\262\0\256\0\263\0\254\0\264\0)\1)\1)\

the timestamps show a 40 msec overhead (!!!), which is purely spent burning
user-space cycles - no kernel activity during this time. (only some apparently
housekeeping signals.)

i matched the disassembly of the interrupted function against the disassembly of
X modules, and found the following match:

00000900 <fbCompositeSolidMask_nx8x0888> function, at absolute offset 0xda0.
(relative offset 0x4a0.)

does anyone have any idea why X spends such a huge amount of time in this
function? I think something else causing this function to be called too
frequently - eg. to composite the whole gnome-terminal screen with its
background (even if it's plain white).


Comment 3 Matt Wilson 2003-02-14 01:08:46 UTC
The real problem here is that vte gets woken up with much much less data from
the child than it used to.  So vte handles TONS more small reads from the child.


Comment 4 Matt Wilson 2003-02-14 01:23:32 UTC
Created attachment 90074 [details]
VERY quick and dirty read coalescing patch

hide me from nalin, for surely he will kill me if he reads this

Comment 5 Matt Wilson 2003-02-14 06:30:37 UTC
xterm can show the interaction as well, but not to the same extent.


Comment 6 Havoc Pennington 2003-02-14 16:20:34 UTC
Note also that the compositing is exactly what Owen says is slow in the X server, 
we should be able to speed up that code substantially (though in this case 
the problem seems to be compositing too often, so should fix that before we 
fix the compositing to simply be faster).


Comment 7 Havoc Pennington 2003-02-14 17:06:30 UTC
*** Bug 84298 has been marked as a duplicate of this bug. ***

Comment 8 Ingo Molnar 2003-02-14 18:40:58 UTC
Even assuming very bad compositing algorithms in X, written in C, on my box the
numbers add up to 28 _thousand_ cycles (13 usecs) per pixel overhead, for every
new line being scrolled in. That number is hard to believe, so there must be
something else going on as well.

it is clearly proven by multiple experiments (top statistics, gdb interruption)
that 95% of the overhead is in X's compositing functions.

I have done a few more experiments. The results prove the following surprising
theory: the rendering done by X, during scrolling gnome-terminal text, is not
only slow, but also is done for the whole area of the window - which, at least
to me, looks patently incorrect. After all, when scrolling line by line, the
only new piece of area that should be rendered is the new line being scrolled in
at the bottom of the window. The rest should just be blitted up, or at least
blitted in from pre-existing buffers, on my hardware.

by measuring the actual execution time of worst-case 'w' scrolling, one can get
a good estimation of scrolling overhead. I've used 5 different terminal heights
for testing, keeping the width of the terminal (and all other settings) constant:

 100x30: 0.75 secs
 100x25: 0.64 secs
 100x20: 0.51 secs
 100x15: 0.37 secs
 100x10: 0.26 secs

it's very visible that an almost perfect line can be fitted over these numbers -
the CPU overhead of scrolling the same amount of output is a linear 0.25 secs
per 10 lines of _terminal height_. Ie. the constant part of the window (which in
theory should just be blitted up), is probably completely re-rendered from scratch.

(The background of the terminal is constant white and the 'Monospace 13' font
was used. Font antialiasing is turned on.)

So, for whatever reason, the compositing is not only done for the updated
rectangle in the new line, but is also done for the whole window area. This also
explains the high overhead (40 msecs on a 80x25 console) per line scrolled.

I agree that reducing the overhead of compositing is important as well (eg.
transparent selection on the root window background is woefully slow on this 2.2
GHz P4 & Radeon 8500), but for the important case of terminal scrolling,
something else is going on as well, that causes the full window area to be
composited by X, causing incredible overhead in turn.


Comment 9 Nalin Dahyabhai 2003-02-15 00:19:39 UTC
*** Bug 81999 has been marked as a duplicate of this bug. ***

Comment 10 Matt Wilson 2003-02-16 19:11:36 UTC
please test with vte packages at:

http://people.redhat.com/msw/vte

thanks nalin for the non-sick-hack fix.


Comment 11 Pekka Pietikäinen 2003-02-16 19:38:37 UTC
Works perfectly for me, faster than xterm now ;)

Comment 12 Stephen John Smoogen 2003-02-16 23:57:20 UTC
Agreed it is faster than xterm with Xft turned on.. so its all down to Xft code
for speed differences.

Thanks Nalin and Matt.

Comment 13 Brian Ryner 2003-02-17 02:45:03 UTC
With vte-0.10.20, when I compile Mozilla, I still see cpu usage by X about 4x
higher in gnome-terminal than with xterm.  I don't think this patch completely
fixes the problem.


Comment 14 Havoc Pennington 2003-02-17 02:59:14 UTC
Are you comparing to xterm -fa? (i.e. xterm with the same font as your 
g-t?)

g-t is still expected to use more CPU due to the different font system, that's 
an X issue rather than a g-t issue.

Comment 15 Brian Ryner 2003-02-17 03:20:01 UTC
I tried setting both g-t and xterm to monospace 12 (xterm -fa monospace -fs 12,
and I verified visually that the fonts were the same) and got the same result as
before -- xterm is much faster.


Comment 16 Havoc Pennington 2003-02-17 03:50:20 UTC
How are you timing it?

People have been using tests such as "time ls -l /dev"
or the like. Note that you need to do it once first to eliminate cache effects, 
before you do the two timed versions.

xterm -fa is broken on my home machine, but using old-school xterm I get
18 seconds to ls -l /dev in xterm, and 3 seconds to ls -l /dev in latest vte.



Comment 17 Brian Ryner 2003-02-17 04:19:35 UTC
On that particular test, vte is _way_ faster for me too (17s vs. 1m 7s).  But,
it's significantly slower compiling Mozilla.  Here's what I timed:

Setup:
cd mozilla
./configure
make export   (to get actual work out of the way, to reduce noise)

Then in xterm and gnome-terminal:
make export

Results (with both terminals using the same font):
xterm: 1m 45s
gnome-terminal: 2m 37s  (~50% slower)


Comment 18 Stephen John Smoogen 2003-02-17 05:07:16 UTC
I can confirm.. ls -l is slower in xterm but compiling is much slower on a
single cpu box. I think it is something to do with process switching between X,
gnome-terminal, make, cc and cpp. The ls while it takes a lot of data spits it
out at the X server in one swoop... cc does lots of changes

Comment 19 Matt Wilson 2003-02-17 05:15:16 UTC
try tweaking VTE_COALESCE_TIMEOUT higher in src/vte.c in the vte package.


Comment 20 Brian Ryner 2003-02-17 06:15:50 UTC
One thing I noticed is that in vte_terminal_scroll_region, I'm hitting the case
where it invalidates the entire window, rather than the (presumably much faster)
case where it just calls gdk_window_scroll():

731             /* We only do this if we're scrolling the entire window. */
732             if (!terminal->pvt->bg_transparent &&
733                 (terminal->pvt->bg_image == NULL) &&
734                 (row == 0) &&
735                 (count == terminal->row_count)) {
736                     widget = GTK_WIDGET(terminal);
737                     gdk_window_scroll(widget->window,
738                                       0, delta * terminal->char_height);
739                     repaint = FALSE;
740             }

(gdb) p row
$25 = 4196

I'm not using an image background or a transparent background, and count ==
terminal->row_count (24), the only reason it doesn't hit this is that row != 0.
   I'm not that familiar with this code, but maybe vte_terminal_handle_scroll
should be passing in 0 instead of screen->scroll_delta, or
vte_terminal_scroll_region should be comparing to terminal->adjustment->value
instead of 0?


Comment 21 Brian Ryner 2003-02-17 06:24:26 UTC
wow. yeah.  changing vte_terminal_handle_scroll to pass 0 instead of
screen->scroll_delta to vte_terminal_scroll_region gave me a time of 1m 38s
(faster than xterm) on the above test with mozilla.


Comment 22 Brian Ryner 2003-02-17 09:16:52 UTC
Created attachment 90124 [details]
reduce invalidates

Comment 23 Ingo Molnar 2003-02-17 09:25:46 UTC
This invalidation bug certainly explains my measurements - the full terminal
window had to be recalculated and thus a dependency on window-height was
introduced. (the other bug(s) discussed here do not have such characteristics.)

Comment 24 Nalin Dahyabhai 2003-02-20 03:57:43 UTC
Good catch!  Because the row passed to vte_terminal_scroll_window is actually
the row in the whole buffer, not just the visible row number, the "row" should
be compared to the scroll delta (terminal->pvt->screen->scroll_delta) instead of
0.  Otherwise the patch is dead-on.  This change will show up in 0.10.21 and later.

Comment 25 Brian Ryner 2003-03-27 07:50:52 UTC
This fix appears to NOT be present in vte-0.10.25-1 (Shrike)!  It's back to
scrolling really slowly.  How did this get lost?


Comment 26 Bill Nottingham 2003-07-28 21:53:27 UTC
Closing some bugs that have been in MODIFIED for a while. Please reopen if the
problem persists.

Comment 27 Pekka Pietikäinen 2004-05-03 07:29:37 UTC
Removing the private bug bit from this bug, as there might
be relevant information here for 

http://bugzilla.gnome.org/show_bug.cgi?id=137864

There certainly are performance problems left, but I suppose
that's a upstream thing, so I'm not reopening the bug.


Comment 28 Colin Murtaugh 2004-05-03 12:26:46 UTC
This seems not to have been fixed (please correct me if I'm mistaken).
Looking at vte.c from CVS (5/03/04), I still see
'screen->scroll_delta' instead of '0' on line 10611:

10608         screen->scroll_delta = adj;
10609         if (dy != 0) {
10610                 vte_terminal_match_contents_clear(terminal);
10611                 vte_terminal_scroll_region(terminal,
screen->scroll_delta,
10612                                            terminal->row_count,
-dy);
10613                 vte_terminal_emit_text_scrolled(terminal, dy);
10614                 vte_terminal_emit_contents_changed(terminal);
10615         }


Comment 29 Pekka Pietikäinen 2004-05-03 17:57:48 UTC
Reopening the bug in case there really has been a regression, 
with the latest src.rpm 0.11.10-5.1 

-               vte_terminal_scroll_region(terminal, sreen->scroll_delta,
+               vte_terminal_scroll_region(terminal, 0,
                                           terminal->row_count, -dy);

and the same with s/0/terminal->pvt->screen->scroll_delta/

did not show any measurable benefit in some very basic benchmarks I
just did, however.

Comment 30 Ray Strode [halfline] 2004-06-15 20:19:58 UTC
Note, that I think the fix was applied in vte_terminal_scroll_region as
 
        if (!terminal->pvt->bg_transparent &&
            (terminal->pvt->bg_pixbuf == NULL) &&
            (terminal->pvt->bg_file == NULL) &&
-           (row == 0) &&
+           (row == terminal->pvt->screen->scroll_delta) &&
            (count == terminal->row_count) &&
            (terminal->pvt->scroll_lock_count == 0)) {
                height = terminal->char_height;
                gdk_window_scroll((GTK_WIDGET(terminal))->window,
                                  0, delta * height);
   
I just tried compiling a program in xterm and in gnome-terminal
without any significant difference in speed.  I also tried Ingo's w
test and I got nearly constant numbers for varying terminal heights. 
I'm closing the bug, but if there is still a problem, feel free to
reopen it.


Note You need to log in before you can comment on or make changes to this bug.