I recently noticed that compiles in X were a lot slower than in text mode. For example, xine-lib takes 7 mins in text mode (or a xterm), but 10:30 with a gnome-terminal in 2.4.20-2.33. Fair enough, gnome-terminal might be a pig when it comes to performance, but I did the same test with a vanilla 2.4.21pre4 and it only took 9:15 and it does feel faster too when using it. The machine is a Athlon XP 2000, so the terminal really is the bottleneck... Giving X less priority seems to "fix" this problem, in a similar test (catting a very large text file), renicing it to +10 made performance just about identical (25 secs), with the default it was 30 and -10 made it 45s.
after lots of debugging on the kernel/scheduler side, this turned out to be an XFree86 or gnome-terminal problem. whenever the 'slow motion gnome-terminal' problem happens, X's CPU usage shots up, spending roughly 40 msecs of CPU time per line rendered on a 2.2 GHz P4 equipped with a Radeon 8500, which is _way_ too much overhead. the easiest way to trigger this phenomenon was to type 'w' in gnome-terminal, the slow-motion effect happens almost all of the time. (while eg. 'ls' sometimes scrolls fast, sometimes slow.) timing differences caused by scheduler changes might have triggered this race - but in any case neither xterm nor kconsole (which is an anti-aliased console) show this problem.
I've done some more debugging. The short story: 90% of the CPU overhead is in the /usr/X11R6/lib/modules/libfb.a module, funtion fbCompositeSolidMask_nx8x0888, relative offset 0x4a0, using XFree86-4.2.99.901-20030211.0. It's the movzwl (%ebx),%edi bulk-memory operation that gets interrupted. the long story: i attached to X via gdb while gnome-terminal was showing the 'slow motion' bug. In the majority of the cases the backtrace looked exactly like this: (gdb) bt #0 0x0868f89b in ?? () #1 0x08690ec0 in ?? () #2 0x086de4dc in ?? () #3 0x08193ad1 in miSpriteComposite () #4 0x086ffea2 in ?? () #5 0x0815f431 in CompositePicture () #6 0x0815b070 in miGlyphs () #7 0x086dec87 in ?? () #8 0x08193cd2 in miSpriteGlyphs () #9 0x0815f4c0 in CompositeGlyphs () #10 0x081618be in ProcRenderCompositeGlyphs () #11 0x081627cd in ProcRenderDispatch () #12 0x080b95b2 in Dispatch () #13 0x080cc050 in main () #14 0x420154a0 in __libc_start_main () from /lib/tls/libc.so.6 ie. X is in compositing. Note that it's almost always the instruction at 0x0868f89b that is interrupted. During X's "high overhead" load-spikes, this is what is visible in strace: 11008 1045164346.569097 ioctl(7, 0x6444, 0) = 0 <0.000211> 11008 1045164346.577201 --- SIGALRM (Alarm clock) @ 0 (0) --- 11008 1045164346.577281 sigreturn() = ? (mask now []) <0.000009> 11008 1045164346.596760 --- SIGALRM (Alarm clock) @ 0 (0) --- 11008 1045164346.596810 sigreturn() = ? (mask now []) <0.000008> 11008 1045164346.609278 read(21,"\0\0\5\2\262\0\256\0\263\0\254\0\264\0)\1)\1)\ the timestamps show a 40 msec overhead (!!!), which is purely spent burning user-space cycles - no kernel activity during this time. (only some apparently housekeeping signals.) i matched the disassembly of the interrupted function against the disassembly of X modules, and found the following match: 00000900 <fbCompositeSolidMask_nx8x0888> function, at absolute offset 0xda0. (relative offset 0x4a0.) does anyone have any idea why X spends such a huge amount of time in this function? I think something else causing this function to be called too frequently - eg. to composite the whole gnome-terminal screen with its background (even if it's plain white).
The real problem here is that vte gets woken up with much much less data from the child than it used to. So vte handles TONS more small reads from the child.
Created attachment 90074 [details] VERY quick and dirty read coalescing patch hide me from nalin, for surely he will kill me if he reads this
xterm can show the interaction as well, but not to the same extent.
Note also that the compositing is exactly what Owen says is slow in the X server, we should be able to speed up that code substantially (though in this case the problem seems to be compositing too often, so should fix that before we fix the compositing to simply be faster).
*** Bug 84298 has been marked as a duplicate of this bug. ***
Even assuming very bad compositing algorithms in X, written in C, on my box the numbers add up to 28 _thousand_ cycles (13 usecs) per pixel overhead, for every new line being scrolled in. That number is hard to believe, so there must be something else going on as well. it is clearly proven by multiple experiments (top statistics, gdb interruption) that 95% of the overhead is in X's compositing functions. I have done a few more experiments. The results prove the following surprising theory: the rendering done by X, during scrolling gnome-terminal text, is not only slow, but also is done for the whole area of the window - which, at least to me, looks patently incorrect. After all, when scrolling line by line, the only new piece of area that should be rendered is the new line being scrolled in at the bottom of the window. The rest should just be blitted up, or at least blitted in from pre-existing buffers, on my hardware. by measuring the actual execution time of worst-case 'w' scrolling, one can get a good estimation of scrolling overhead. I've used 5 different terminal heights for testing, keeping the width of the terminal (and all other settings) constant: 100x30: 0.75 secs 100x25: 0.64 secs 100x20: 0.51 secs 100x15: 0.37 secs 100x10: 0.26 secs it's very visible that an almost perfect line can be fitted over these numbers - the CPU overhead of scrolling the same amount of output is a linear 0.25 secs per 10 lines of _terminal height_. Ie. the constant part of the window (which in theory should just be blitted up), is probably completely re-rendered from scratch. (The background of the terminal is constant white and the 'Monospace 13' font was used. Font antialiasing is turned on.) So, for whatever reason, the compositing is not only done for the updated rectangle in the new line, but is also done for the whole window area. This also explains the high overhead (40 msecs on a 80x25 console) per line scrolled. I agree that reducing the overhead of compositing is important as well (eg. transparent selection on the root window background is woefully slow on this 2.2 GHz P4 & Radeon 8500), but for the important case of terminal scrolling, something else is going on as well, that causes the full window area to be composited by X, causing incredible overhead in turn.
*** Bug 81999 has been marked as a duplicate of this bug. ***
please test with vte packages at: http://people.redhat.com/msw/vte thanks nalin for the non-sick-hack fix.
Works perfectly for me, faster than xterm now ;)
Agreed it is faster than xterm with Xft turned on.. so its all down to Xft code for speed differences. Thanks Nalin and Matt.
With vte-0.10.20, when I compile Mozilla, I still see cpu usage by X about 4x higher in gnome-terminal than with xterm. I don't think this patch completely fixes the problem.
Are you comparing to xterm -fa? (i.e. xterm with the same font as your g-t?) g-t is still expected to use more CPU due to the different font system, that's an X issue rather than a g-t issue.
I tried setting both g-t and xterm to monospace 12 (xterm -fa monospace -fs 12, and I verified visually that the fonts were the same) and got the same result as before -- xterm is much faster.
How are you timing it? People have been using tests such as "time ls -l /dev" or the like. Note that you need to do it once first to eliminate cache effects, before you do the two timed versions. xterm -fa is broken on my home machine, but using old-school xterm I get 18 seconds to ls -l /dev in xterm, and 3 seconds to ls -l /dev in latest vte.
On that particular test, vte is _way_ faster for me too (17s vs. 1m 7s). But, it's significantly slower compiling Mozilla. Here's what I timed: Setup: cd mozilla ./configure make export (to get actual work out of the way, to reduce noise) Then in xterm and gnome-terminal: make export Results (with both terminals using the same font): xterm: 1m 45s gnome-terminal: 2m 37s (~50% slower)
I can confirm.. ls -l is slower in xterm but compiling is much slower on a single cpu box. I think it is something to do with process switching between X, gnome-terminal, make, cc and cpp. The ls while it takes a lot of data spits it out at the X server in one swoop... cc does lots of changes
try tweaking VTE_COALESCE_TIMEOUT higher in src/vte.c in the vte package.
One thing I noticed is that in vte_terminal_scroll_region, I'm hitting the case where it invalidates the entire window, rather than the (presumably much faster) case where it just calls gdk_window_scroll(): 731 /* We only do this if we're scrolling the entire window. */ 732 if (!terminal->pvt->bg_transparent && 733 (terminal->pvt->bg_image == NULL) && 734 (row == 0) && 735 (count == terminal->row_count)) { 736 widget = GTK_WIDGET(terminal); 737 gdk_window_scroll(widget->window, 738 0, delta * terminal->char_height); 739 repaint = FALSE; 740 } (gdb) p row $25 = 4196 I'm not using an image background or a transparent background, and count == terminal->row_count (24), the only reason it doesn't hit this is that row != 0. I'm not that familiar with this code, but maybe vte_terminal_handle_scroll should be passing in 0 instead of screen->scroll_delta, or vte_terminal_scroll_region should be comparing to terminal->adjustment->value instead of 0?
wow. yeah. changing vte_terminal_handle_scroll to pass 0 instead of screen->scroll_delta to vte_terminal_scroll_region gave me a time of 1m 38s (faster than xterm) on the above test with mozilla.
Created attachment 90124 [details] reduce invalidates
This invalidation bug certainly explains my measurements - the full terminal window had to be recalculated and thus a dependency on window-height was introduced. (the other bug(s) discussed here do not have such characteristics.)
Good catch! Because the row passed to vte_terminal_scroll_window is actually the row in the whole buffer, not just the visible row number, the "row" should be compared to the scroll delta (terminal->pvt->screen->scroll_delta) instead of 0. Otherwise the patch is dead-on. This change will show up in 0.10.21 and later.
This fix appears to NOT be present in vte-0.10.25-1 (Shrike)! It's back to scrolling really slowly. How did this get lost?
Closing some bugs that have been in MODIFIED for a while. Please reopen if the problem persists.
Removing the private bug bit from this bug, as there might be relevant information here for http://bugzilla.gnome.org/show_bug.cgi?id=137864 There certainly are performance problems left, but I suppose that's a upstream thing, so I'm not reopening the bug.
This seems not to have been fixed (please correct me if I'm mistaken). Looking at vte.c from CVS (5/03/04), I still see 'screen->scroll_delta' instead of '0' on line 10611: 10608 screen->scroll_delta = adj; 10609 if (dy != 0) { 10610 vte_terminal_match_contents_clear(terminal); 10611 vte_terminal_scroll_region(terminal, screen->scroll_delta, 10612 terminal->row_count, -dy); 10613 vte_terminal_emit_text_scrolled(terminal, dy); 10614 vte_terminal_emit_contents_changed(terminal); 10615 }
Reopening the bug in case there really has been a regression, with the latest src.rpm 0.11.10-5.1 - vte_terminal_scroll_region(terminal, sreen->scroll_delta, + vte_terminal_scroll_region(terminal, 0, terminal->row_count, -dy); and the same with s/0/terminal->pvt->screen->scroll_delta/ did not show any measurable benefit in some very basic benchmarks I just did, however.
Note, that I think the fix was applied in vte_terminal_scroll_region as if (!terminal->pvt->bg_transparent && (terminal->pvt->bg_pixbuf == NULL) && (terminal->pvt->bg_file == NULL) && - (row == 0) && + (row == terminal->pvt->screen->scroll_delta) && (count == terminal->row_count) && (terminal->pvt->scroll_lock_count == 0)) { height = terminal->char_height; gdk_window_scroll((GTK_WIDGET(terminal))->window, 0, delta * height); I just tried compiling a program in xterm and in gnome-terminal without any significant difference in speed. I also tried Ingo's w test and I got nearly constant numbers for varying terminal heights. I'm closing the bug, but if there is still a problem, feel free to reopen it.