Bug 1204242

Summary: gtk spinner makes installation 3 times longer in a VM - implement a temporary workaround
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: anacondaAssignee: Vladimír Slávik <vslavik>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: alan, awilliam, bugzilla.redhat.com, ccecchi, g.kaviyarasu, jonathan, jsedlak, jskladan, j, mclasen, mkolman, otaylor, samuel-rhbugs, vanmeeuwen+fedora, vponcova, vslavik
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: anaconda-32.16-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-10 13:32:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
updates.img to disable spinner animation
none
Updates image to make the spinner repaint at 4 FPS
none
Updates image to make the spinner repaint at 12 FPS none

Description Kamil Páral 2015-03-20 16:43:20 UTC
Description of problem:
Since the introduction of a gtk spinner during installation in Fedora 21, we in QA have been experiencing considerably longer installation times and extremely high VM host load. I have done some measurements lately, after several people complained on the test list:
https://lists.fedoraproject.org/pipermail/test/2015-March/thread.html#125284

These are two Server install times with spinner animated:
16m 57s
15m 6s
and these are the same install times with spinner not animated:
4m 59s
5m 9s
(source https://lists.fedoraproject.org/pipermail/test/2015-March/125369.html)

Please note that this only affects the boot.iso/netinst environment, not Workstation Lives. The full GNOME stack detects VM environment and disables all gtk animations to prevent these issues. But it does not happen in netinst environment.

Recently, we have started leveraging OpenQA for automated anaconda installations. I think we're now close to 20 different installs executed automatically on each test compose, and we will be adding many more. It makes a big difference for us, whether the installation phase takes 5 or 15 minutes.

This is my observation of VM guest utilization during the install phase. I used 'sshd' option to connect to netinst and watch the CPU usage using `top` throughout installation.
* When the spinner is animated, Xorg consumes 50% CPU.
* If I close the VM window, it drops to 40%.
* If I open the window again, but switch the VM to TTY2 (text mode), it drops to 25%.
* If I go back to TTY6 (graphical mode) and open the root password spoke, which makes the spinner disappear temporarily, the usage goes to 0% (!). 
* When I close the root password spoke and the spinner appears again, it again consumes 50%.
* During certain periods of post-installation phase (especially to the end), the Xorg usage climbs up to 90% and holds steady for a minute or two.
The changes are immediate and reproducible (at least here).
( https://lists.fedoraproject.org/pipermail/test/2015-March/125371.html )

Furthermore, for manual testing, these VM guest issues have an unusually high impact on the VM host as well. On standard corporate laptops, we're no longer able to run two VM installations in parallel. Even a single VM install can hog the host CPU so much that it makes any other work impossible. And I have quite decent hardware:
https://lists.fedoraproject.org/pipermail/test/2015-March/125325.html

I believe this is a combination of several issues:
1. Gtk spinner is heavily unoptimized and consumes an excessively large portion of CPU. I reported a bug here:
https://bugzilla.gnome.org/show_bug.cgi?id=746535
2. The issue seems to be worse either in VM environment, or in anaconda-custom netinst environment. Whatever the cause it, the spinner CPU usage seems to be higher in there than on bare metal.
3. QEMU/KVM/libvirt seems to be very inefficient when handling constant graphical updates from the VM, even when it is just a small section of the screen.
4. X+gnome-shell seem to be somewhat inefficient handling the updated images, they consume a lot host CPU just to update the image.

I can report bugs for all of these and hope it will get better in the future. But it will probably take a long time and we need to do something about it now. I'd like you to consider one of these things:

* Please disable gtk animations when running inside VM. One possible way is to include gnome-settings-daemon in netinst, which handles it. A second approach is to run `systemd-detect-virt` and if it detects a VM, then create /etc/gtk-3.0/settings.ini with this contents:
> [Settings]
> gtk-enable-animations=0
before starting anaconda gui.

* Alternatively, simply remove the gtk spinner. There's a progress bar and textual step description, the spinner is not completely necessary.

* You can also use the spinner from Evolution instead of gtk:
https://bugzilla.gnome.org/show_bug.cgi?id=732180
They experienced similar problems and decided to create their own spinner, which is performance friendly.

* Or maybe something else I haven't though of.

Thank you.


Version-Release number of selected component (if applicable):
F22 Beta TC2 Server/Workstation netinst
anaconda 22.20.4-1

How reproducible:
always

Steps to Reproduce:
1. run netinst with sshd boot option
2. ssh into that machine, run top
3. start installation and look how cpu usage changes when the spinner gets animated, and when you hide it by opening the root password dialog. The spinner hogs the system most to the end of the installation (the post-installation phase).

Comment 1 Kamil Páral 2015-03-20 16:45:45 UTC
Created attachment 1004562 [details]
updates.img to disable spinner animation

This is an updates.img to disable spinner animation. Please have a look at your guest+host CPU utilization, measure the installation time. Then try again with this updates image. It should be considerably faster and both host and guest CPU usage should be considerably lower.

Comment 2 Adam Williamson 2015-03-20 17:03:23 UTC
Note upstream now seems to be actively working on this:

https://bugzilla.gnome.org/show_bug.cgi?id=732199

so we may see a resolution there. It's also worth noting that GNOME has a thing where it disables animation if it doesn't think your hardware is up to it (not sure if it's just llvmpipe vs. hardware or if it's something more sophisticated), so on Workstation live installs, we don't have this problem. But anaconda doesn't have the same logic. I'm guessing it's in gnome-settings-daemon.

Comment 3 Matthias Clasen 2015-03-20 17:30:07 UTC
(In reply to awilliam from comment #2)
> Note upstream now seems to be actively working on this:
>
> https://bugzilla.gnome.org/show_bug.cgi?id=732199

Don't get your hopes up. This is not an easy thing to fix, it requires a considerable refactoring of the size allocation machinery in GTK+. Nobody is working on that at the moment.

Comment 4 David Shea 2015-04-17 17:15:23 UTC
Stealing evolution's spinner is pretty easy, turns out, but this sure is a dumb problem to have

Comment 5 David Shea 2015-04-29 17:33:42 UTC
Stealing evolution's spinner is too weird. It's a gtk widget causing the problem, so reassigning to gtk.

(In reply to Kamil Páral from comment #0)
> * Please disable gtk animations when running inside VM. One possible way is
> to include gnome-settings-daemon in netinst, which handles it. A second
> approach is to run `systemd-detect-virt` and if it detects a VM, then create
> /etc/gtk-3.0/settings.ini with this contents:
> > [Settings]
> > gtk-enable-animations=0
> before starting anaconda gui.

That disables all the stack and overlay animations too, though. We want those.

Comment 6 David Shea 2015-04-29 17:35:59 UTC
*** Bug 1201201 has been marked as a duplicate of this bug. ***

Comment 7 Kamil Páral 2015-04-30 08:50:59 UTC
Hi David, thanks for looking into this. There already is a bug reported against gtk3, I don't think it makes much sense to reassign this. I reported it specifically against anaconda to ask for some temporary solution until the gtk bug is fixed. If you decided to wait for gtk fix and not do anything about it in the meantime, let's close this as WONTFIX. However, if you decide to go this route, we can be pretty sure that this won't get fixed before F22 release.

Is the spinner so necessary? Or, are those other animations in a VM so necessary?

Comment 8 David Shea 2015-04-30 13:02:00 UTC
(In reply to Kamil Páral from comment #7)

> Is the spinner so necessary? 

Without it people will think anaconda has locked up while running the %post scripts, so kinda.

> Or, are those other animations in a VM so necessary?

No, but they're nice to have from a usability point of view, since it helps explain the non-linear hub/spoke relationship.

Comment 9 Adam Williamson 2015-05-01 00:13:35 UTC
"That disables all the stack and overlay animations too, though. We want those."

You already don't get them in Workstation live installs, because GNOME does this (disables animation) when hardware acceleration is not available.

Comment 10 David Shea 2015-05-01 11:52:55 UTC
(In reply to awilliam from comment #9)
> "That disables all the stack and overlay animations too, though. We want
> those."
> 
> You already don't get them in Workstation live installs, because GNOME does
> this (disables animation) when hardware acceleration is not available.

I know. I really wish they wouldn't!

Comment 11 David Shea 2015-11-04 21:56:44 UTC
*** Bug 1278189 has been marked as a duplicate of this bug. ***

Comment 12 Adam Williamson 2018-03-27 16:49:07 UTC
This is still a problem, and has just been rediscovered *three years later* by Chris Murphy:

https://lists.fedoraproject.org/archives/list/desktop%40lists.fedoraproject.org/message/XUWK4CV6WONUEHN2VMJVHMSDBR3NWSZS/

we really need to do something about this, it's fairly absurd that this has been sucking up resources and making VM installs slow *this whole time*.

Comment 13 Owen Taylor 2018-03-27 17:21:45 UTC
Problems with GtkSpinner will not cause gnome-shell to use 40x as much CPU as the GTK+ process (anaconda) - I think something else is going on here - I guess some interaction between the qxl driver and gnome-shell running (under wayland?)

Comment 14 Martin Kolman 2018-04-03 11:40:32 UTC
(In reply to Adam Williamson from comment #12)
> This is still a problem, and has just been rediscovered *three years later*
> by Chris Murphy:
> 
> https://lists.fedoraproject.org/archives/list/desktop%40lists.fedoraproject.
> org/message/XUWK4CV6WONUEHN2VMJVHMSDBR3NWSZS/
> 
> we really need to do something about this, it's fairly absurd that this has
> been sucking up resources and making VM installs slow *this whole time*.
Definitely - but is this really an Anaconda issue ? We are just using a spinner widget provided by GTK. 

Comment 7 mentions an upstream Gnome/GTK bug (https://bugzilla.gnome.org/show_bug.cgi?id=732199) that is marked as fixed. So maybe another upstream bug needs to be opened or this bug reassigned to GTK ?

Comment 15 Jan Kurik 2018-08-14 11:09:42 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 16 Ben Cotton 2019-10-31 19:13:22 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Kamil Páral 2019-11-01 15:41:28 UTC
For fun and profit, I retested this with F31 installation media on F30 VM host. For Server DVD, the updates.img from attachment 1004562 [details] was used to disable the spinner animation. For Workstation Live, I used "gsettings set org.gnome.desktop.interface enable-animations false" command.

Server DVD:
1 cpu, spinner disabled: 178 s
1 cpu, spinner enabled: 210 s  (+ 18%)
2 cpu, spinner disabled: 149 s
2 cpu, spinner enabled: 167 s  (+ 12%)

Workstation Live:
1 cpu, spinner disabled: 286 s
1 cpu, spinner enabled: 500 s  (+ 75%)
2 cpu, spinner disabled: 249 s
2 cpu, spinner enabled: 316 s  (+ 27%)

So, compared to comment 0, the penalty is still quite noticeable for Anaconda installer environment (DVD, netinst), but much lower than it used to be. However, it also start affecting Live images (GTK no longer skips animations in VMs), and especially with just a single allocated CPU, the penalty is very noticeable in day-to-day testing.

Comment 18 Vladimír Slávik 2019-11-28 16:31:28 UTC
Kamil, would you be open to repeating your test with some changes? I have appended the following line to anaconda-gtk.css:

spinner { animation-timing-function: steps(4); }

I suspect reducing the number of updates to 4 might make the overhead nearly go away... at a corresponding sacrifice of aesthetic value.

If that does help, perhaps you could also try and see what happens with 12? To me the spinner seems still acceptably beautiful at 12 FPS, which might be less than the GTK default "draw as often as possible". But the documentation seems to suggest 12 is some internal timing-related value too, so I'm not putting too much hope into that.

I'll attach update images against current Rawhide with the changes described above.

Comment 19 Vladimír Slávik 2019-11-28 16:33:40 UTC
Created attachment 1640431 [details]
Updates image to make the spinner repaint at 4 FPS

Comment 20 Vladimír Slávik 2019-11-28 16:34:18 UTC
Created attachment 1640433 [details]
Updates image to make the spinner repaint at 12 FPS

Comment 21 Kamil Páral 2019-11-29 10:08:03 UTC
Hi Vladimir, that's a great idea! 

Those updates.img's were probably created against the development version of anaconda? I get a traceback when I apply it against F31 Workstation Live:

[liveuser@localhost-live ~]$ liveinst --updates=https://bugzilla.redhat.com/attachment.cgi?id=1640431
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 60623  100 60623    0     0  54176      0  0:00:01  0:00:01 --:--:-- 54176
643 blocks
Starting installer, one moment...
Traceback (most recent call last):
  File "/sbin/anaconda", line 247, in <module>
    from pyanaconda import startup_utils
  File "/tmp/updates/pyanaconda/startup_utils.py", line 36, in <module>
    from pyanaconda import network
  File "/tmp/updates/pyanaconda/network.py", line 40, in <module>
    from pyanaconda.modules.common.constants.services import NETWORK, TIMEZONE
  File "/tmp/updates/pyanaconda/modules/common/constants/services.py", line 19, in <module>
    from pyanaconda.core.dbus import SystemBus, DBus
ModuleNotFoundError: No module named 'pyanaconda.core.dbus'


But no problem, I adjusted anaconda-gtk.css manually. I'm now on F31 host, so I had to retest even the default results. Here are my numbers:

Workstation Live:
1 cpu, spinner disabled: 320 s
1 cpu, spinner 4 fps: 334 s  (+ 4%)
1 cpu, spinner 12 fps: 366 s  (+ 14%)
1 cpu, spinner enabled: 565 s  (+ 77%)

The 12 fps spinner seems like a reasonable compromise that doesn't look bad and lowers the overhead substantially.

Comment 22 Vladimír Slávik 2019-11-29 13:42:42 UTC
Thank you for the tests, Kamil! I made a pull request, let's see what the rest of the team thinks: https://github.com/rhinstaller/anaconda/pull/2240


(Note to self: To play with the CSS interactively, do GTK_DEBUG=interactive glade and then fiddle with stuff. Spinner needs re-enabling to apply changes.)

Comment 23 Vendula Poncova 2019-12-03 11:34:24 UTC
We could further reduce the number of steps with a different icon. I have open a new bug for the adwaita-icon-theme (see the bug 1779130).

Comment 24 Vladimír Slávik 2019-12-06 17:11:07 UTC
Kamil, the current compromise that we agree could be merged immediately is making the animation go at 24 FPS. Would that help you? I've interpolated the numbers above which gives me an estimate of 128% for Workstation Live. (Somewhat curiously, the interpolation also predicts the unrestricted rate of repaints is about 65 Hz.)

As 24/65 should be more than 60% better, trying more gymnastics to gain a few more % is probably not worth it. That said, there are some more options to try in case you're bored. Or maybe let's talk about how you do the measurements?

----

For the sake of completeness...

First alternative is to make the animation cycle longer, but at the 12 FPS. This helps the visual jerkiness by making the movement less noticeable. This should give same numbers as the steps(12) variant.

spinner:checked { animation: spin 10s steps(120) infinite; }

In practice the timing seems to be susceptible to random slowdowns due to other processes randomly consuming more CPU, which gets exaggerated at low update rates, so it is randomly ugly and useless anyway.

Second possibility is to use an animation that stays put most of the time. It's debatable if this does save on the CPU or not, though. (I'd love to find out.) It would look something like this...

@keyframes anaconda_spin {
 0% { -gtk-icon-transform: none; }
 66% { -gtk-icon-transform: none; }
 100% { -gtk-icon-transform: rotate(180deg); }
}

spinner {
 -gtk-icon-source: -gtk-icontheme("emblem-synchronizing-symbolic");
}

spinner:checked {
 animation: anaconda_spin 5s ease-in-out infinite;
}

Comment 25 Vladimír Slávik 2019-12-09 10:36:37 UTC
I merged the pull request so the timing should be down to ca. +28% for Workstation Live.

Comment 26 Kamil Páral 2019-12-10 14:06:12 UTC
(In reply to Vladimír Slávik from comment #24)
> Or maybe let's talk about how you do the
> measurements?

I simply boot F31 Workstation Live in a VM, make any spinner adjustments if needed, then start anaconda and configure the installation. I measure the time using an external application from the moment I click Begin installation until the moment the installation is fully finished and a Finish installation button appears.

> the current compromise that we agree could be merged immediately is
> making the animation go at 24 FPS.

1 cpu, spinner disabled: 322 s
1 cpu, spinner 24 fps: 438 s  (+ 36%)

> First alternative is to make the animation cycle longer, but at the 12 FPS.
> This helps the visual jerkiness by making the movement less noticeable. This
> should give same numbers as the steps(12) variant.
> 
> spinner:checked { animation: spin 10s steps(120) infinite; }

1 cpu, spinner 12 fps slower: 372 s  (+ 16%)

> Second possibility is to use an animation that stays put most of the time.
> It's debatable if this does save on the CPU or not, though. (I'd love to
> find out.) It would look something like this...
> 
> @keyframes anaconda_spin {
>  0% { -gtk-icon-transform: none; }
>  66% { -gtk-icon-transform: none; }
>  100% { -gtk-icon-transform: rotate(180deg); }
> }
> 
> spinner {
>  -gtk-icon-source: -gtk-icontheme("emblem-synchronizing-symbolic");
> }
> 
> spinner:checked {
>  animation: anaconda_spin 5s ease-in-out infinite;
> }

1 cpu, spinner emblem-synchronizing-symbolic: 370 s  (+ 15%)


It seems that the spinner load is directly proportional to the number of animations over a time period. The most reasonable thing is to have a mostly static spinner that makes a short animation from time to time (like the emblem-synchronizing-symbolic above). There are "spinners" (progress indicators) out there which do exactly this. Too bad GTK goes a different way at the moment :-/

Thanks a lot for working on this. Even the 24 FPS spinner is still a considerable improvement for the installation times.

Comment 27 Vladimír Slávik 2019-12-13 18:56:02 UTC
Kamil, sorry to keep pestering you, but there is one final question. Do you consider this bug "closeable" based on the changes done?. If not, please state it explicitly. Obviously some of us are itching to close the bug ;-)

Comment 28 Kamil Páral 2019-12-16 17:56:03 UTC
Oh sure. Thank you very much for implementing a decent workaround for the issue!

Comment 29 Ben Cotton 2020-02-11 15:41:42 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 30 Vladimír Slávik 2020-06-10 13:32:17 UTC
Released with F32.