Bug 2068773 - systemd-oomd kills rpm-ostree when installing a package on Fedora IoT 36 on the Raspberry Pi Zero 2 W
Summary: systemd-oomd kills rpm-ostree when installing a package on Fedora IoT 36 on t...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 36
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Peter Robinson
QA Contact:
URL:
Whiteboard: RejectedBlocker
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2022-03-26 17:29 UTC by Jordan Williams
Modified: 2022-04-13 13:41 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 13:41:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jordan Williams 2022-03-26 17:29:51 UTC
Description of problem:

rpm-ostree crashes when attempting to install packages with on Fedora IoT on a Raspberry Pi Zero 2 W.
It appears to be killed by systemd-oomd.

Version-Release number of selected component (if applicable): Fedora-IoT-36-20220326.0


How reproducible: 100%


Steps to Reproduce:
1. Flash an SD Card with Fedora-IoT-36-20220326.0.aarch64.xz using the Fedora `arm-image-installer` package.
The full command I used to create the SD card is provided here.

```
$ sudo arm-image-installer \
           --image="$HOME/Downloads/Fedora-IoT-36-20220326.0.aarch64.raw.xz" \
           --media=/dev/sda \
           --addconsole \
           --addkey="$HOME/.ssh/id_rsa.pub" \
           --norootpass \
           --relabel \
           --resizefs \
           --showboot \
           --target=rpi3 \
           -y
```

2. Insert SD Card in the Raspberry Pi Zero 2 W and boot.

3. Log in and attempt to install any package or packages with rpm-ostree.

```
$ rpm-ostree install fish
```

Actual results:

Package fails to install.

```
$ rpm-ostree install fish
Checking out tree 8d4a5a2... done
Enabled rpm-md repositories: updates fedora-cisco-openh264 updates-testing fedora
Importing rpm-md... done
error: Bus owner changed, aborting. This likely means the daemon crashed; check logs with `journalctl -xe`.
```

Expected results:

rpm-ostree should succesfully overlay the package.

Additional info:

rpm-ostreed log output when failure occurs is provided below.

```
$ journalctl -b -u rpm-ostreed
Mar 26 11:56:46 zero2w-01.jwillikers.io systemd[1]: Starting rpm-ostreed.service - rpm-ostree System Management Daemon...
Mar 26 11:56:46 zero2w-01.jwillikers.io rpm-ostree[932]: Reading config file '/etc/rpm-ostreed.conf'
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: In idle state; will auto-exit in 63 seconds
Mar 26 11:56:49 zero2w-01.jwillikers.io systemd[1]: Started rpm-ostreed.service - rpm-ostree System Management Daemon.
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: Allowing active client :1.20 (uid 0)
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: client(id:cli dbus:1.20 unit:session-1.scope uid:0) added; new total=1
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: Locked sysroot
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: Initiated txn PkgChange for client(id:cli dbus:1.20 unit:session-1.scope uid:0): /org/projectatomic/rpmostree1/fedora_iot
Mar 26 11:56:49 zero2w-01.jwillikers.io rpm-ostree[932]: Process [pid: 925 uid: 0 unit: session-1.scope] connected to transaction progress
Mar 26 11:56:50 zero2w-01.jwillikers.io rpm-ostree[932]: Librepo version: 1.14.2 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.2 libpsl/0.21.1 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.46.0 OpenLDAP/2.6.1)
Mar 26 11:57:12 zero2w-01.jwillikers.io rpm-ostree[932]: Downloading: https://pkgs.tailscale.com/stable/fedora/repo.gpg
Mar 26 11:57:13 zero2w-01.jwillikers.io rpm-ostree[932]: Downloading: https://pkgs.tailscale.com/stable/fedora/aarch64/repodata/repomd.xml
Mar 26 11:57:14 zero2w-01.jwillikers.io rpm-ostree[932]: Downloading: https://pkgs.tailscale.com/stable/fedora/aarch64/repodata/repomd.xml.asc
Mar 26 11:57:15 zero2w-01.jwillikers.io rpm-ostree[932]: Downloading: https://pkgs.tailscale.com/stable/fedora/aarch64/repodata/fbc4d973f6685b79fc3235dc4d3973e30a6abd6dca9a37c48b8c814ace531c4f-filelists.xml.gz
Mar 26 11:57:15 zero2w-01.jwillikers.io rpm-ostree[932]: Downloading: https://pkgs.tailscale.com/stable/fedora/aarch64/repodata/5dcfb2273ed51ec0f857dd873fbf1fda0d46e7b43be7b8c11110b6e3aec525cf-primary.xml.gz
Mar 26 12:00:10 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: A process of this unit has been killed by the OOM killer.
Mar 26 12:00:10 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=9/KILL
Mar 26 12:00:10 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Failed with result 'oom-kill'.
Mar 26 12:00:11 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Consumed 57.722s CPU time.
Mar 26 12:03:12 zero2w-01.jwillikers.io systemd[1]: Starting rpm-ostreed.service - rpm-ostree System Management Daemon...
Mar 26 12:03:12 zero2w-01.jwillikers.io rpm-ostree[1415]: Reading config file '/etc/rpm-ostreed.conf'
Mar 26 12:03:13 zero2w-01.jwillikers.io rpm-ostree[1415]: In idle state; will auto-exit in 64 seconds
Mar 26 12:03:13 zero2w-01.jwillikers.io systemd[1]: Started rpm-ostreed.service - rpm-ostree System Management Daemon.
Mar 26 12:03:13 zero2w-01.jwillikers.io rpm-ostree[1415]: Allowing active client :1.28 (uid 0)
Mar 26 12:03:13 zero2w-01.jwillikers.io rpm-ostree[1415]: client(id:cli dbus:1.28 unit:session-4.scope uid:0) added; new total=1
Mar 26 12:03:13 zero2w-01.jwillikers.io rpm-ostree[1415]: client(id:cli dbus:1.28 unit:session-4.scope uid:0) vanished; remaining=0
Mar 26 12:03:13 zero2w-01.jwillikers.io rpm-ostree[1415]: In idle state; will auto-exit in 61 seconds
Mar 26 12:04:14 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Deactivated successfully.
Mar 26 12:05:26 zero2w-01.jwillikers.io systemd[1]: Starting rpm-ostreed.service - rpm-ostree System Management Daemon...
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: Reading config file '/etc/rpm-ostreed.conf'
Mar 26 12:05:26 zero2w-01.jwillikers.io systemd[1]: Started rpm-ostreed.service - rpm-ostree System Management Daemon.
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: In idle state; will auto-exit in 63 seconds
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: Allowing active client :1.30 (uid 0)
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: client(id:cli dbus:1.30 unit:session-4.scope uid:0) added; new total=1
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: Locked sysroot
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: Initiated txn PkgChange for client(id:cli dbus:1.30 unit:session-4.scope uid:0): /org/projectatomic/rpmostree1/fedora_iot
Mar 26 12:05:26 zero2w-01.jwillikers.io rpm-ostree[1486]: Process [pid: 1478 uid: 0 unit: session-4.scope] connected to transaction progress
Mar 26 12:05:27 zero2w-01.jwillikers.io rpm-ostree[1486]: Librepo version: 1.14.2 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.2 libpsl/0.21.1 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.46.0 OpenLDAP/2.6.1)
Mar 26 12:07:03 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: A process of this unit has been killed by the OOM killer.
Mar 26 12:07:03 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=9/KILL
Mar 26 12:07:03 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Failed with result 'oom-kill'.
Mar 26 12:07:03 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Consumed 35.293s CPU time.
Mar 26 12:15:22 zero2w-01.jwillikers.io systemd[1]: Starting rpm-ostreed.service - rpm-ostree System Management Daemon...
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: Reading config file '/etc/rpm-ostreed.conf'
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: In idle state; will auto-exit in 60 seconds
Mar 26 12:15:22 zero2w-01.jwillikers.io systemd[1]: Started rpm-ostreed.service - rpm-ostree System Management Daemon.
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: Allowing active client :1.33 (uid 0)
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: client(id:cli dbus:1.33 unit:session-4.scope uid:0) added; new total=1
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: Locked sysroot
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: Initiated txn PkgChange for client(id:cli dbus:1.33 unit:session-4.scope uid:0): /org/projectatomic/rpmostree1/fedora_iot
Mar 26 12:15:22 zero2w-01.jwillikers.io rpm-ostree[1806]: Process [pid: 1798 uid: 0 unit: session-4.scope] connected to transaction progress
Mar 26 12:15:23 zero2w-01.jwillikers.io rpm-ostree[1806]: Librepo version: 1.14.2 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.2 libpsl/0.21.1 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.46.0 OpenLDAP/2.6.1)
Mar 26 12:15:24 zero2w-01.jwillikers.io rpm-ostree[1806]: Txn PkgChange on /org/projectatomic/rpmostree1/fedora_iot failed: Removing extensions/rpmostree/private/commit: Operation was cancelled
Mar 26 12:15:24 zero2w-01.jwillikers.io rpm-ostree[1806]: Unlocked sysroot
Mar 26 12:15:24 zero2w-01.jwillikers.io rpm-ostree[1806]: Process [pid: 1798 uid: 0 unit: session-4.scope] disconnected from transaction progress
Mar 26 12:15:24 zero2w-01.jwillikers.io rpm-ostree[1806]: client(id:cli dbus:1.33 unit:session-4.scope uid:0) vanished; remaining=0
Mar 26 12:15:24 zero2w-01.jwillikers.io rpm-ostree[1806]: In idle state; will auto-exit in 60 seconds
Mar 26 12:16:24 zero2w-01.jwillikers.io rpm-ostree[1806]: In idle state; will auto-exit in 60 seconds
Mar 26 12:16:24 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Deactivated successfully.
Mar 26 12:16:24 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Consumed 1.476s CPU time.
Mar 26 12:18:23 zero2w-01.jwillikers.io systemd[1]: Starting rpm-ostreed.service - rpm-ostree System Management Daemon...
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Reading config file '/etc/rpm-ostreed.conf'
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: In idle state; will auto-exit in 64 seconds
Mar 26 12:18:24 zero2w-01.jwillikers.io systemd[1]: Started rpm-ostreed.service - rpm-ostree System Management Daemon.
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Allowing active client :1.42 (uid 0)
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: client(id:cli dbus:1.42 unit:session-5.scope uid:0) added; new total=1
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Locked sysroot
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Initiated txn PkgChange for client(id:cli dbus:1.42 unit:session-5.scope uid:0): /org/projectatomic/rpmostree1/fedora_iot
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Process [pid: 2017 uid: 0 unit: session-5.scope] connected to transaction progress
Mar 26 12:18:24 zero2w-01.jwillikers.io rpm-ostree[2024]: Librepo version: 1.14.2 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.2 libpsl/0.21.1 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.46.0 OpenLDAP/2.6.1)
Mar 26 12:19:35 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: A process of this unit has been killed by the OOM killer.
Mar 26 12:19:35 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=9/KILL
Mar 26 12:19:35 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Failed with result 'oom-kill'.
Mar 26 12:19:35 zero2w-01.jwillikers.io systemd[1]: rpm-ostreed.service: Consumed 33.281s CPU time.
```

Comment 1 Peter Robinson 2022-03-26 18:16:35 UTC
TBH I don't think this is rpm-ostree as I'm also seeing things die on a traditional rpm based system when trying to update a Zero2W. I think something else has changed, I think it might be a libdnf change or something that both dnf and rpm-ostree link against.

Comment 2 Jordan Williams 2022-03-26 18:20:54 UTC
(In reply to Peter Robinson from comment #1)
> TBH I don't think this is rpm-ostree as I'm also seeing things die on a
> traditional rpm based system when trying to update a Zero2W. I think
> something else has changed, I think it might be a libdnf change or something
> that both dnf and rpm-ostree link against.

Yes, I just tried Fedora Minimal 36. `dnf upgrade` is also killed by systemd-oomd.

Comment 3 Jordan Williams 2022-03-26 18:21:20 UTC
(In reply to Peter Robinson from comment #1)
> TBH I don't think this is rpm-ostree as I'm also seeing things die on a
> traditional rpm based system when trying to update a Zero2W. I think
> something else has changed, I think it might be a libdnf change or something
> that both dnf and rpm-ostree link against.

Yes, I just tried Fedora Minimal 36. `dnf upgrade` is also killed by systemd-oomd.

Comment 4 Lukáš Hrázký 2022-03-28 14:32:54 UTC
Hello, the Raspberry Pi Zero 2 W has 512MB of RAM. This is way below the Fedora minimal spec of 2GB. Your process is being killed because you've run out of memory. I'm not saying Fedora shouldn't work on these minimal platforms, but it's a fact that dnf eats several hundreds of megabytes when reading the repository data into memory, so that's already pushing it. The amount of memory is obviously directly tied to the size of the repositories, so what likely changed is just that the repos have grown over time.

I'm not entirely sure if rpm-ostree changes something in the grand scheme of things but I don't think it does.

So it'd be best to investigate the repository size (possibly see what caused the increase) and see if it could be reduced.

There's a possible optimization on dnf side that we are planning for the next version of dnf (dnf 5), in not always loading the rpm filelists, which should reduce the footprint. But that's still somewhat far in the future (first release is aimed at fedora 38 right now) and is a bit problematic because the filelists are being used by dependency resolution (though that should really be migrated away from).

Comment 5 Peter Robinson 2022-03-28 16:43:05 UTC
(In reply to Lukáš Hrázký from comment #4)
> Hello, the Raspberry Pi Zero 2 W has 512MB of RAM. This is way below the
> Fedora minimal spec of 2GB. Your process is being killed because you've run
> out of memory. I'm not saying Fedora shouldn't work on these minimal
> platforms, but it's a fact that dnf eats several hundreds of megabytes when
> reading the repository data into memory, so that's already pushing it. The
> amount of memory is obviously directly tied to the size of the repositories,
> so what likely changed is just that the repos have grown over time.

So the 2Gb minimum requirement for Fedora is for running anaconda and dealing with loading the selinux into memory in a RAM disk. Fedora has run just fine in even 265Mb of RAM on Arm devices for some time including dnf, this is a recent regression in the use of memory.

In cases like cloud images and arm images when you're not running the anaconda installer the memory usage is much lower. It's no uncommon to have cloud instances with just 512Mb of RAM.

> I'm not entirely sure if rpm-ostree changes something in the grand scheme of
> things but I don't think it does.

I also see this problem with a traditional Fedora on the Zero2W

> So it'd be best to investigate the repository size (possibly see what caused
> the increase) and see if it could be reduced.

This was working fine until the last update of dnf.

Comment 6 Lukáš Hrázký 2022-03-29 09:19:03 UTC
What are the good and bad versions then? Could you make a comparison of the memory footprint using the exact same repositories?

Comment 7 Peter Robinson 2022-03-29 09:37:49 UTC
The last time dnf worked on my rpi-zero2w the following was upgraded:

[root@rpi-zero2w ~]# dnf upgrade --exclude=grub2-tools-extra --refresh
Fedora 36 -aarch64                         13 kB/s |  12 kB     00:00
Fedora 36 openh264 (From Cisco) -aarch64  1.0 kB/s | 990  B     00:00
Fedora 36 -aarch64 - Updates              9.3 kB/s |  19 kB     00:02
Fedora 36 -aarch64 - Test Updates          13 kB/s |  13 kB     00:00
Dependencies resolved.
===========================================================================================
 Package                          Architecture   Version            Repository        Size
===========================================================================================
Upgrading:
 NetworkManager                       aarch64    1:1.36.2-1.fc36    updates-testing   1.9 M
 NetworkManager-initscripts-updown    noarch     1:1.36.2-1.fc36    updates-testing    14 k
 NetworkManager-libnm                 aarch64    1:1.36.2-1.fc36    updates-testing   1.6 M
 NetworkManager-wifi                  aarch64    1:1.36.2-1.fc36    updates-testing   118 k
 bind-libs                            aarch64    32:9.16.27-1.fc36  updates-testing   1.2 M
 bind-license                         noarch     32:9.16.27-1.fc36  updates-testing    16 k
 bind-utils                           aarch64    32:9.16.27-1.fc36  updates-testing   206 k
 curl                                 aarch64    7.82.0-2.fc36      updates-testing   305 k
 dnf                                  noarch     4.11.1-1.fc36      updates-testing   454 k
 dnf-data                             noarch     4.11.1-1.fc36      updates-testing    42 k
 dnf-plugins-core                     noarch     4.1.0-1.fc36       updates-testing    34 k
 libcurl                              aarch64    7.82.0-2.fc36      updates-testing   294 k
 libdnf                               aarch64    0.66.0-1.fc36      updates-testing   613 k
 microdnf                             aarch64    3.8.1-1.fc36       updates-testing    48 k
 openssl-libs                         aarch64    1:3.0.2-1.fc36     updates-testing   2.0 M
 python-unversioned-command           noarch     3.10.3-1.fc36      updates-testing    10 k
 python3                              aarch64    3.10.3-1.fc36      updates-testing    27 k
 python3-dnf                          noarch     4.11.1-1.fc36      updates-testing   414 k
 python3-dnf-plugins-core             noarch     4.1.0-1.fc36       updates-testing   220 k
 python3-hawkey                       aarch64    0.66.0-1.fc36      updates-testing   101 k
 python3-libdnf                       aarch64    0.66.0-1.fc36      updates-testing   743 k
 python3-libs                         aarch64    3.10.3-1.fc36      updates-testing   7.3 M
 systemd                              aarch64    250.3-8.fc36       updates-testing   4.1 M
 systemd-libs                         aarch64    250.3-8.fc36       updates-testing   586 k
 systemd-networkd                     aarch64    250.3-8.fc36       updates-testing   516 k
 systemd-oomd-defaults                noarch     250.3-8.fc36       updates-testing    27 k
 systemd-pam                          aarch64    250.3-8.fc36       updates-testing   323 k
 systemd-resolved                     aarch64    250.3-8.fc36       updates-testing   258 k
 systemd-udev                         aarch64    250.3-8.fc36       updates-testing   1.8 M
 vim-data                             noarch     2:8.2.4579-1.fc36  updates-testing    28 k
 vim-minimal                          aarch64    2:8.2.4579-1.fc36  updates-testing   696 k
 wget                                 aarch64    1.21.3-1.fc36      updates-testing   771 k
 yum                                  noarch     4.11.1-1.fc36      updates-testing    40 k
Installing dependencies:
 NetworkManager-initscripts-ifcfg-rh  aarch64    1:1.36.2-1.fc36    updates-testing   113 k

Transaction Summary
===========================================================================================
Install   1 Package
Upgrade  33 Packages

Total download size: 27 M
Is this ok [y/N]:

Comment 8 Fedora Blocker Bugs Application 2022-04-01 13:11:44 UTC
Proposed as a Blocker for 36-final by Fedora user jwillikers using the blocker tracking app because:

 Basic Release Criterion: https://fedoraproject.org/wiki/Basic_Release_Criteria#Installing.2C_removing_and_updating_software

"The installed system must be able appropriately to install, remove, and update software with the default console tool for the relevant software type (e.g. default console package manager). This includes downloading of packages to be installed/updated."

Basic Release Criterion (Fedora IoT): https://fedoraproject.org/wiki/Basic_Release_Criteria#rpm-ostree_requirements

"It must be possible to install additional software with the rpm-ostree install command. Software installation must also include dependencies where necessary and installed software should provide the intended functionality."

This bugs makes it not possible to install additional software via DNF or rpm-ostree on any system with 1/2 GiB of RAM which is common for Arm devices and cloud images.

Comment 9 František Zatloukal 2022-04-04 18:59:05 UTC
Discussed during the 2022-04-04 blocker review meeting: [1]

The decision to classify this bug as an RejectedBlocker was made:

"The RPi Zero 2W is explicitly stated that it is not currently supported by Fedora ARM, so we reject this bug as a blocker."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2022-04-04/f36-blocker-review.2022-04-04-16.00.log.html

Comment 10 Lukáš Hrázký 2022-04-05 14:23:24 UTC
A test in an F36 container on x86_64:

# dnf install time
...
# dnf rq --installed dnf
dnf-0:4.10.0-2.fc36.noarch
# dnf clean all
36 files removed
# /usr/bin/time -v dnf upgrade
...
Command exited with non-zero status 1
	Command being timed: "dnf upgrade"
	User time (seconds): 25.46
	System time (seconds): 1.47
	Percent of CPU this job got: 52%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:50.83
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 523572                <<<<<< the total memory consumed: 523MB
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 253
	Minor (reclaiming a frame) page faults: 347318
	Voluntary context switches: 12644
	Involuntary context switches: 378
	Swaps: 0
	File system inputs: 7672
	File system outputs: 625976
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 1
# dnf upgrade dnf
...
# dnf rq --installed dnf
dnf-0:4.11.1-2.fc36.noarch
# dnf clean all
36 files removed
# /usr/bin/time -v dnf upgrade
...
Command exited with non-zero status 1
	Command being timed: "dnf upgrade"
	User time (seconds): 26.40
	System time (seconds): 1.48
	Percent of CPU this job got: 72%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.49
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 533084                <<<<<< the total memory consumed: 533MB
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 245
	Minor (reclaiming a frame) page faults: 329871
	Voluntary context switches: 12366
	Involuntary context switches: 280
	Swaps: 0
	File system inputs: 32
	File system outputs: 564184
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 1

We can see an increase of 10MB in memory consumption, I've run the second part another time and it was 517MB, actually lower. I see no significant memory consumption increase between the two versions, but feel free to provide a similar test in your environment to prove there's one in your case.

Unfortunately the older versions of the packages are no longer in F36 repos, I've downloaded them from here manually:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1881803
https://koji.fedoraproject.org/koji/buildinfo?buildID=1847273
https://koji.fedoraproject.org/koji/buildinfo?buildID=1847272

Again, the memory consumption is high indeed and we'd like for dnf to work on these low-spec devices, but it is what it is right now. If it wasn't clear originally, the memory is actually consumed by libsolv, which is what loads the repository data into memory. It is very non-trivial to fix, but we'd welcome any contributions in this space.

Comment 11 Jordan Williams 2022-04-05 16:35:59 UTC
Does openSUSE organize there repositories differently?
I'm a little confused why zypper doesn't have this problem while DNF does since they both use libsolv.
Zypper, however, has much lower memory usage.
The following is the performance of using zypper to install a package on openSUSE Tumbleweed on the Raspberry Pi Zero 2 W.


# /usr/bin/time -v zypper install podman                                                              
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 recommended packages were automatically selected:
  cni-plugins criu podman-cni-config

The following 23 NEW packages are going to be installed:
  catatonit cni cni-plugins conmon criu fuse-overlayfs iptables libbsd0
  libcontainers-common libfuse3-3 libip6tc2 libnet9 libnetfilter_conntrack3
  libnfnetlink0 libprotobuf-c1 libslirp0 podman podman-cni-config
  python38-ipaddr python38-protobuf runc slirp4netns xtables-plugins

23 new packages to install.
...
Checking for file conflicts: .............................................[done]
...
Executing %posttrans scripts .............................................[done]
        Command being timed: "zypper install podman"
        User time (seconds): 23.00
        System time (seconds): 8.51
        Percent of CPU this job got: 38%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.70
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 110612
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1757
        Minor (reclaiming a frame) page faults: 104141
        Voluntary context switches: 17447
        Involuntary context switches: 3463
        Swaps: 0
        File system inputs: 39024
        File system outputs: 434432
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Comment 12 Lukáš Hrázký 2022-04-06 09:06:12 UTC
So (as you've touched on), you have a whole different set of repositories there. They contain a different number of packages, that may be the most significant factor.

Second, I don't know the details of how zypper works, but did you clean its cache first? With how dnf works, it downloads repo metadata, loads them into memory (that's the memory peak) and then writes libsolv cache files for them. On another run, it only loads the cache files and the memory consumption is much lower due to some optimizations.

If I re-run dnf with loading from libsolv cache I get:
Maximum resident set size (kbytes): 160308

Another thing that could be causing this is that from what I've heard, zypper doesn't use file provides for dependency resolution (they've got ridden of it, as we also should I think), so it may not be loading the file lists and that can make a big difference in memory consumption.

It'd be great if someone did a more thorough analysis and came with some conclusions (there may possibly also be savings to be made on the side of the repo metadata as they're generated). Right now we have higher priorities with dnf 5, which is also the place where e.g. a change to not use file lists would be done anyway, we most likely wouldn't be doing it for dnf 4.

Comment 13 Jordan Williams 2022-04-06 12:26:19 UTC
If I remove all the existing repository metadata and the cache, before installing a package, there is a slight increase in the max amount of memory usage, but it's still close to 100 MiB.
The command output is shown below.
Running the `zypper update` and `zypper refresh` commands separately show similar results.

If I find some more time I may be able to profile DNF and compare the Fedora / openSUSE repositories.
It would be great to find someone with some working knowledge of Zypper.
I'd be happy to see this situation improved for DNF 5.
I appreciate all the feedback on this, thanks!


# zypper clean --all
All repositories have been cleaned up.

# /usr/bin/time -v zypper install podman                                                                                                                                                                                  
Retrieving repository 'openSUSE-Tumbleweed-Oss' metadata .................[done]
Building repository 'openSUSE-Tumbleweed-Oss' cache ......................[done]
Retrieving repository 'openSUSE-Tumbleweed-Update' metadata ..............[done]
Building repository 'openSUSE-Tumbleweed-Update' cache ...................[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 recommended packages were automatically selected:
  cni-plugins criu podman-cni-config

The following 23 NEW packages are going to be installed:
  catatonit cni cni-plugins conmon criu fuse-overlayfs iptables libbsd0
  libcontainers-common libfuse3-3 libip6tc2 libnet9 libnetfilter_conntrack3
  libnfnetlink0 libprotobuf-c1 libslirp0 podman podman-cni-config
  python38-ipaddr python38-protobuf runc slirp4netns xtables-plugins

23 new packages to install.
Overall download size: 27.4 MiB. Already cached: 0 B. After the operation,
additional 142.8 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
...
Checking for file conflicts: .............................................[done]
...
Executing %posttrans scripts .............................................[done]
        Command being timed: "zypper install podman"
        User time (seconds): 51.63
        System time (seconds): 10.17
        Percent of CPU this job got: 52%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:56.78
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 111044
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1765
        Minor (reclaiming a frame) page faults: 157141
        Voluntary context switches: 37330
        Involuntary context switches: 4680
        Swaps: 0
        File system inputs: 40168
        File system outputs: 499456
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0


Refresh:

# zypper clean --all
All repositories have been cleaned up.
# /usr/bin/time -v zypper refresh                              
Retrieving repository 'openSUSE-Tumbleweed-Oss' metadata .................]
Building repository 'openSUSE-Tumbleweed-Oss' cache ......................]
Retrieving repository 'openSUSE-Tumbleweed-Update' metadata ..............]
Building repository 'openSUSE-Tumbleweed-Update' cache ...................]
All repositories have been refreshed.
        Command being timed: "zypper refresh"
        User time (seconds): 34.86
        System time (seconds): 3.38
        Percent of CPU this job got: 80%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:47.44
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 82956
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 4
        Minor (reclaiming a frame) page faults: 69197
        Voluntary context switches: 15830
        Involuntary context switches: 1683
        Swaps: 0
        File system inputs: 1400
        File system outputs: 99928
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Comment 14 Neal Gompa 2022-04-06 12:51:49 UTC
(In reply to Lukáš Hrázký from comment #12)
> 
> Another thing that could be causing this is that from what I've heard,
> zypper doesn't use file provides for dependency resolution (they've got
> ridden of it, as we also should I think), so it may not be loading the file
> lists and that can make a big difference in memory consumption.
> 

Zypper supports file provides just fine, they just don't load filelists.xml by default, so they're restricted to the ones in primary.xml.

It will load the filelists.xml on-demand if a non-primary.xml file dependency is detected, though.

Comment 15 Jaroslav Mracek 2022-04-13 13:40:17 UTC
In the next major version of packager (DNF5 project) we plan that it will be possible to disable loading of filelists. Of cource it will disable some functionality in DNF therefore people will suffer. But it is a future - Fedora 39.

Comment 16 Jaroslav Mracek 2022-04-13 13:41:09 UTC
I am proposing to close it as deferred.


Note You need to log in before you can comment on or make changes to this bug.