Bug 1623547

Summary: anaconda too resource hungry
Product: [Fedora] Fedora Reporter: Dan Horák <dan>
Component: pygobject3Assignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: anaconda-maint-list, awilliam, bugproxy, dan, hannsj_uhl, icq, jbicha, john.j5live, jonathan, jstodola, kellin, kevin, pbrobinson, pwhalen, vanmeeuwen+fedora, vponcova, walters, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
See Also: https://gitlab.gnome.org/GNOME/pygobject/issues/247
Whiteboard: AcceptedFreezeException
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-18 15:35:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 467765, 1517012    

Description Dan Horák 2018-08-29 15:15:07 UTC
Description of problem:
Looks like anaconda in recent F-29 nightly composes is runs out of memory. My guest has 2 GB RAM, 10 GB disk and after anaconda starts the system load gets over 16 and all of memory (+ swap) is exhausted (eg. ~6 python3 processes are running) and later killed by the kernel.

Version-Release number of selected component (if applicable):
Fedora-29-20180815.n.1
Fedora-29-20180827.n.0

How reproducible:
100%

Comment 1 Dan Horák 2018-08-29 15:16:08 UTC
F-28 installs without any issue on the guest.

Comment 2 Peter Robinson 2018-08-29 15:23:10 UTC
Seeing the same on ARM, the initrd uncompressed comes in around 1.5Gb

Comment 3 Paul Whalen 2018-08-29 15:26:57 UTC
I have also hit this testing on arm, the F29 container images seem to be failing as a result where the host has 2G RAM.

Comment 4 Dan Horák 2018-08-29 15:29:08 UTC
Fedora-29-20180827.n.0 install.img is 394 MB
Fedora 28 GA install.img is 412 MB

Doesn't zswap cause the problem?

Comment 5 Dan Horák 2018-08-31 09:12:40 UTC
screenshot of top from a started installation

top - 09:10:39 up 15 min,  1 user,  load average: 16.88, 14.45, 8.35
Tasks: 122 total,   3 running, 119 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 26.2 sy,  0.0 ni,  4.0 id, 67.9 wa,  0.4 hi,  0.0 si,  1.5 st
MiB Mem :   1997.3 total,     10.0 free,   1409.8 used,    577.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.     27.2 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                  
 2086 root      20   0  411820 170396   2764 D   0.0   8.3   0:14.28 python3 -m pyanaconda.modules.localization                                                                               
 2094 root      20   0  374884 132680   1876 D   0.0   6.5   0:12.24 python3 -m pyanaconda.modules.services                                                                                   
 2085 root      20   0  373612 131764   2224 D   0.0   6.4   0:12.98 python3 -m pyanaconda.modules.security                                                                                   
 2093 root      20   0  379624 130408   2780 D   0.2   6.4   0:11.78 python3 -m pyanaconda.modules.storage                                                                                    
 2077 root      20   0  378272 129556   2588 D   0.0   6.3   0:13.30 python3 -m pyanaconda.modules.network                                                                                    
 2084 root      20   0  363964 121832   2100 D   0.0   6.0   0:11.63 python3 -m pyanaconda.modules.timezone                                                                                   
 2083 root      20   0  359468 117232   1848 D   0.0   5.7   0:11.19 python3 -m pyanaconda.modules.users                                                                                      
 2082 root      20   0  345544 104360   2896 D   0.0   5.1   0:12.46 python3 -m pyanaconda.modules.payload                                                                                    
 2095 root      20   0  342688 101268   2816 D   0.2   5.0   0:12.04 python3 -m pyanaconda.dbus_addons.baz                                                                                    
 1926 root      20   0  360044  89196  24776 S   0.0   4.4   0:03.05 /usr/sbin/rsyslogd -n                                                                                                    
 2015 root      20   0  322360  56760   3056 D   0.2   2.8   0:03.55 /usr/bin/python3 /usr/sbin/anaconda                                                                                      
 2066 root      20   0  297620  55748   2284 D   0.2   2.7   0:08.82 python3 -m pyanaconda.modules.boss                                                                                       
 1735 root      20   0   69576  25996  24904 D   0.2   1.3   0:05.54 /usr/lib/systemd/systemd-journald                                                                                        
 1866 root      rt   0  280212  18036   8000 S   0.0   0.9   0:00.06 /sbin/multipathd -d -s                                                                                                   
    1 root      20   0  103212   7628   3108 D   0.2   0.4   0:06.48 /usr/lib/systemd/systemd --switched-root --system --deserialize 32                                                       
 1983 polkitd   20   0 1975044   6900   2404 S   0.0   0.3   0:00.10 /usr/lib/polkit-1/polkitd --no-debug                                                                                     
 1777 root      20   0   28552   6300   2836 S   0.0   0.3   0:00.14 /usr/lib/systemd/systemd-udevd                                                                                           
 2020 root      20   0  394288   5868   2960 S   0.0   0.3   0:00.10 /usr/libexec/udisks2/udisksd                                                                                             
 1927 root      20   0   17548   5032   2848 S   0.0   0.2   0:00.13 /usr/lib/systemd/systemd-logind                                                                                          
 1845 root      20   0  368384   4772   2480 S   0.0   0.2   0:00.06 /usr/sbin/lvmetad -f -t 3600                                                                                             
 1943 root      20   0  251120   4220   1624 S   0.0   0.2   0:00.46 /usr/sbin/NetworkManager --no-daemon                                                                                     
 2062 root      20   0   15580   3660   3016 S   0.0   0.2   0:00.02 dbus-daemon --print-address --syslog --config-file=/usr/share/anaconda/dbus/anaconda-bus.conf                            
 2141 root      20   0   11416   3660   2760 S   0.0   0.2   0:00.04 sshd: root [priv]                                                                                                        
 2014 root      20   0   11416   3464   2624 S   0.0   0.2   0:00.00 /usr/sbin/sshd -D -f /etc/ssh/sshd_config.anaconda                                                                       
 2168 root      20   0    7044   3456   2700 R   0.2   0.2   0:01.22 top                                                                                                                      
 1947 dbus      20   0   15872   3324   2424 D   0.0   0.2   0:00.36 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only                 
 2151 root      20   0   11764   3228   2284 D   0.2   0.2   0:00.82 sshd: root@pts/0                                                                                                         
 1923 root      20   0   14988   2892   2080 S   0.0   0.1   0:01.76 /sbin/rngd -f                                                                                                            
 2152 root      20   0    3788   2036   1612 S   0.0   0.1   0:00.00 -bash                                                                                                                    
 1937 root      20   0    3788   2032   1612 S   0.0   0.1   0:00.00 /bin/bash --login                                                                                                        
 2079 root      20   0  214132   1684   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.security                                             
 2081 root      20   0  214132   1684   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.payload                                              
 2091 root      20   0  214132   1684   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.services                                             
 2065 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.boss                                                 
 2070 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.timezone                                             
 2073 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.network                                              
 2074 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.localization                                         
 2080 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.users                                                
 2090 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.modules.storage                                              
 2092 root      20   0  214132   1680   1444 S   0.0   0.1   0:00.00 /usr/bin/bash /usr/libexec/anaconda/start-module pyanaconda.dbus_addons.baz                                              
 2061 root      20   0    2200    296    168 S   0.0   0.0   0:00.00 /sbin/auditd                                                                                                             
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kthreadd]                                                                                                               
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 [rcu_gp]                                                                                                                 
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 [rcu_par_gp]                                                                                                             
    5 root      20   0       0      0      0 I   0.2   0.0   0:00.75 [kworker/0:0-events]                                                                                                     
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 [kworker/0:0H-kblockd]                                                                                                   
    7 root      20   0       0      0      0 I   0.0   0.0   0:00.58 [kworker/u128:0-events_unbound]                                                                                          
    8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 [mm_percpu_wq]

Comment 6 Dan Horák 2018-08-31 10:07:42 UTC
No real change with 3GB, both were with inst.zram=off. Load reaching 20 is not sane ...

top - 10:05:00 up 20 min,  1 user,  load average: 19.37, 18.00, 11.84
Tasks: 126 total,   3 running, 123 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 26.0 sy,  0.0 ni, 13.1 id, 59.0 wa,  0.3 hi,  0.0 si,  1.5 st
MiB Mem :   3005.3 total,     13.9 free,   2367.4 used,    624.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.     24.1 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                  
 1410 root       0 -20       0      0      0 R  98.6   0.0  14:10.05 loop1                                                                                                                    
   50 root      20   0       0      0      0 S   3.8   0.0   0:33.11 kswapd0                                                                                                                  
 1405 root       0 -20       0      0      0 R   1.1   0.0   0:09.10 loop0                                       
...

Comment 7 Dan Horák 2018-08-31 10:09:25 UTC
The loop1 process (decompressing the install.img?) consuming 1 whole CPU is also not good ...

Comment 8 Dan Horák 2018-08-31 11:16:31 UTC
and no change with 4GB, I wonder if there is a kernel issue with the squashfs install.img

top - 11:09:24 up 38 min,  1 user,  load average: 19.75, 20.26, 17.65
Tasks: 123 total,   6 running, 116 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0.0 us, 26.8 sy,  0.0 ni, 11.4 id, 59.7 wa,  0.4 hi,  0.0 si,  1.8 st
MiB Mem :   4013.3 total,     19.7 free,   3266.7 used,    727.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.     20.8 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                  
 1419 root       0 -20       0      0      0 R  97.0   0.0  28:23.56 loop1                                                                                                                    
   50 root      20   0       0      0      0 R   6.9   0.0   1:41.26 kswapd0                                                                                                                  
 1413 root       0 -20       0      0      0 R   1.1   0.0   0:19.78 loop0                                                                                                                    
 2334 root      20   0  535908  14988   2900 S   0.2   0.4   0:27.02 lvmdbusd

Comment 9 Dan Horák 2018-09-04 15:42:39 UTC
One hint from mkolman - the memory consumption of the python3 processes managing the anaconda panels is much higher then in other environments. The overloaded "loop1" process might be result of a close-to-OOM situation.

Comment 10 Vendula Poncova 2018-09-05 09:54:50 UTC
Hi, Anaconda runs in multiple processes since Fedora 28 and I don't think there was any major change in Fedora 29 that could cause this problem. Also it looks like only ARM and s390x are affected. I would reassign the bug to someone else, but I don't know to whom.

Comment 11 Dan Horák 2018-09-11 10:13:27 UTC
Seems we have a reproducer
- install f28 and update to f29 (I've used fedora-upgrade)
- dnf install anaconda
- truncate -s 4G /mnt/image
- anaconda <--vnc|--text> --image /mnt/image

--vnc needs metacity installed too

Comment 12 Dan Horák 2018-09-11 12:03:24 UTC
narrowed reproducer
- export DBUS_STARTER_ADDRESS="$DBUS_SESSION_BUS_ADDRESS"
- run "python3.7 -m pyanaconda.modules.storage"
- after a short while

DEBUG:anaconda.dbus.connection:Publishing an object at /org/fedoraproject/Anaconda/Modules/Storage/DiskInitialization.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/site-packages/pyanaconda/modules/storage/storage.py", line 80, in publish
    kickstart_module.publish()
  File "/usr/lib64/python3.7/site-packages/pyanaconda/modules/storage/disk_initialization/initialization.py", line 65, in publish
    DiskInitializationInterface(self))
  File "/usr/lib64/python3.7/site-packages/pyanaconda/dbus/connection.py", line 99, in publish_object
    reg = self.connection.register_object(object_path, obj, None)
  File "/usr/lib/python3.7/site-packages/pydbus/registration.py", line 162, in register_object
    return ObjectRegistration(self, path, interfaces, wrapper, own_wrapper=True)
  File "/usr/lib/python3.7/site-packages/pydbus/registration.py", line 136, in __init__
    ids = [bus.con.register_object(path, interface, wrapper.call_method, None, None) for interface in interfaces]
  File "/usr/lib/python3.7/site-packages/pydbus/registration.py", line 136, in <listcomp>
    ids = [bus.con.register_object(path, interface, wrapper.call_method, None, None) for interface in interfaces]
gi.repository.GLib.Error: g-io-error-quark: An object is already exported for the interface org.fedoraproject.Anaconda.Modules.Storage.DiskInitialization at /org/fedoraproject/Anaconda/Modules/Storage/DiskInitialization (2)
DEBUG:anaconda.dbus.connection:Publishing an object at /org/fedoraproject/Anaconda/Modules/Storage/DiskInitialization.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/site-packages/pyanaconda/modules/storage/storage.py", line 80, in publish

starts to appear and increasing memory consumption is visible in top output for python3 process

Comment 13 Vendula Poncova 2018-09-11 16:16:47 UTC
The problem is that the publish method is called repeatedly, but it should be called only once. We use GLib.idle_add to run the publish method once the main loop is started. It looks like the function behaves differently on s390x.

The following script should print "Running" and "Processing" and wait. It works that way on x86_64, but it keeps printing "Processing" on s390x. Same for Fedora 29 and Rawhide.

Reproducer:

from gi.repository import GLib
loop = GLib.MainLoop()

def f():
  print("Processing")
  return False

GLib.idle_add(f)

print("Running")
loop.run()

Tested on Fedora 29:
pygobject3-3.29.2-1.fc29

Tested on Rawhide:
pygobject3-3.30.0-1.fc30 

Reassigning to pygobject3.

Comment 14 Dan Horák 2018-09-11 16:51:15 UTC
If the following C code is equivalent to the python version, then it behaves correctly meaning the python bindings are buggy, not Glib. Compiled with "gcc -Wall -O2 `pkg-config --cflags glib-2.0` `pkg-config --libs glib-2.0` -g -o g g.c"

#include <glib.h>

gboolean f(gpointer data)
{
    g_print("Processing...\n");

    return FALSE;
}

int main(void)
{
    GMainLoop *loop = g_main_loop_new(NULL, FALSE);

    g_idle_add(f, NULL);

    g_print("Running...\n");

    g_main_loop_run(loop);

    return 0;
}

Comment 15 Dan Horák 2018-09-13 17:29:50 UTC
I'm getting closer - it's reproduceable with F-28 and the latest python3-gobject-3.30.0-1.fc30.s390x (built locally). Next step is bisect between pygobject 3.28 and 3.30

Comment 16 Dan Horák 2018-09-13 20:53:51 UTC
Bisected to https://gitlab.gnome.org/GNOME/pygobject/commit/25a5b066467523547d3b3c1947ae3722c3fc38d5 as the first bad commit, based on the result of the internal mainloop test.

Comment 17 Dan Horák 2018-09-13 21:01:21 UTC
And it's likely an endianness problem, because I get the issue on ppc64.

Comment 18 Dan Horák 2018-09-14 11:44:24 UTC
We are unblocked now - https://gitlab.gnome.org/GNOME/pygobject/issues/247#note_316214

Comment 19 Fedora Update System 2018-09-14 12:19:07 UTC
pygobject3-3.30.0-2.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-5685089f8b

Comment 20 Fedora Blocker Bugs Application 2018-09-14 12:52:07 UTC
Proposed as a Freeze Exception for 29-beta by Fedora user sharkcz using the blocker tracking app because:

 Anaconda crashes without the fix with OOM making it impossible to install F-29 on s390x.

Comment 21 Kevin Fenzi 2018-09-14 19:15:07 UTC
+1 FE

Comment 22 Paul Whalen 2018-09-14 19:37:09 UTC
+1 FE

Comment 23 Fedora Update System 2018-09-14 20:01:09 UTC
pygobject3-3.30.0-2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-5685089f8b

Comment 24 Adam Williamson 2018-09-14 20:54:11 UTC
+1 FE, that's +3, setting accepted.

Comment 25 Fedora Update System 2018-09-15 23:09:38 UTC
pygobject3-3.30.0-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 26 Adam Williamson 2018-09-17 23:43:35 UTC
Can someone test that Beta RC3 worked OK on s390x, then we can close this? Thanks!

https://kojipkgs.fedoraproject.org/compose/29/Fedora-29-20180916.0/

Comment 27 Dan Horák 2018-09-18 07:20:39 UTC
There is no problem with running anaconda any more with the Beta RC3 compose.

Comment 28 Colin Walters 2018-09-18 12:55:46 UTC
FWIW, reverting that patch looks safe to me.

Comment 29 Dan Horák 2018-09-18 14:43:48 UTC
for the record - the revert went into upstream with https://gitlab.gnome.org/GNOME/pygobject/commit/aa6dc30a1eb25eea9587f8e96dad3caa12d02e37 (until there is a better solution)

Comment 30 Adam Williamson 2018-09-18 15:35:36 UTC
Thanks for the confirmation.