RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2040170 - dnf check-update --changelog <package-name> causes OOM kill on 2GB RAM system supported for RHEL8
Summary: dnf check-update --changelog <package-name> causes OOM kill on 2GB RAM system...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: libsolv
Version: 8.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: amatej
QA Contact: swm-qe
URL:
Whiteboard:
: 2008233 2126516 (view as bug list)
Depends On:
Blocks: 2064584
TreeView+ depends on / blocked
 
Reported: 2022-01-13 07:25 UTC by Masahiro Matsuya
Modified: 2023-09-18 04:30 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 2064584 (view as bug list)
Environment:
Last Closed: 2023-03-27 11:11:20 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
sosreport (13.34 MB, application/octet-stream)
2022-09-19 17:31 UTC, IBM Bug Proxy
no flags Details
sysctl output (33.14 KB, application/octet-stream)
2022-09-19 17:31 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELDST-11565 0 None None None 2022-05-16 12:54:48 UTC
Red Hat Issue Tracker RHELPLAN-107749 0 None None None 2022-01-13 07:33:05 UTC
Red Hat Knowledge Base (Solution) 6960717 0 None None None 2022-05-25 14:41:45 UTC

Description Masahiro Matsuya 2022-01-13 07:25:23 UTC
Description of problem:

A customer reported a OOM kill problem with the following dnf command.

  # yum check-update --changelog nss
  ...
  Updating Subscription Management repositories.
  Killed

Jan 13 16:23:02 kvm-122-85 kernel: Out of memory: Killed process 2252 (dnf) total-vm:2932988kB, anon-rss:1499040kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:5116kB oom_score_adj:0
Jan 13 16:23:02 kvm-122-85 kernel: oom_reaper: reaped process 2252 (dnf), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB


The --changelog option is used to show the delta with the latest package,
and other.xml.gz repo data provides the changelog information.

The other.xml.gz is so large, and it seems that the OOM kill happened when the file content is stored on memory for XML parsing.

2GB RAM is supported for RHEL8, and such RAM size should be popular on VM or container's environment.


Version-Release number of selected component (if applicable):
libsolv-0.7.19-1.el8.x86_64
libxml2-2.9.7-9.el8_4.2.x86_64
libdnf-0.63.0-3.el8.x86_64
dnf-4.7.0-4.el8.noarch


How reproducible:
Always

Steps to Reproduce:
1. Prepare 2GB RAM system with RHEL8.5
2. Run dnf check-update --changelog nss

Actual results:
The dnf command failed with OOM Kill.

Expected results:
The dnf command can complete without OOM Kill.

Comment 2 Renaud Métrich 2022-02-03 07:12:02 UTC
Alternatively, on systems with a lot of memory (16GB in my test), "dnf changelog <package>" dies in Segmentation Fault.

Comment 3 Renaud Métrich 2022-02-03 08:57:16 UTC
So I went with experimenting on RHEL9 (beta 0 DVD) and Fedora Rawhide as well, trying "dnf changelog tree" command.

1. RHEL9 works with DVD repos (baseos + appstream), which do not contain much packages

  Total packages: 6,298

2. RHEL9 **fails** with BaseOS repo from RHEL8

  Total packages: 9,538

3. Fedora Rawhide works with Fedora repos, which contain a lot of packages

  Total packages: 68,180

4. Fedora Rawhide **fails** with BaseOS repo from RHEL8


From this, we can state that there seems to be something wrong with the repository itself.
AppStream seems to work fine.

Comment 5 Jaroslav Mracek 2022-03-15 13:43:29 UTC
*** Bug 2008233 has been marked as a duplicate of this bug. ***

Comment 6 amatej 2022-03-23 07:46:17 UTC
The main cause here is that the BaseOS repo from RHEL8 is just too big. Uncompressed other.xml has around 3GB.

Due to the way libsolv parsing works we have to load it all into memory at once so there is currently no way to make this work on a 2GB RAM machine (if you want to load the changelogs - other.xml).

The other very related issue is that even if we have enough RAM available there is a Segmentation Fault because of an integer overflow caused by so many strings in the other.xml. Libsolv is kind of built on top of Ids (type integer) so this may be problematic to resolve.

I made an upstream issue: https://github.com/openSUSE/libsolv/issues/493

Comment 21 Renaud Métrich 2022-07-01 06:04:15 UTC
Hello,

The issue doesn't only affects the minor functionality "changelog", but also "repodiff", which is far more critical, e.g.:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb --args /usr/libexec/platform-python /usr/bin/dnf repodiff --repofrompath=o,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.4/x86_64/baseos/os/ --repofrompath=n,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.6/x86_64/baseos/os/ --repo-old=o --repo-new=n --simple --size
[...]
(gdb) run
[...]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff33c3328 in data_addid (sx=<optimized out>, xd=<optimized out>, xd=<optimized out>) at src/repodata.c:3138
3138	  *dp++ = x & 127;
(gdb) bt
(gdb) bt
#0  0x00007ffff33c3328 in data_addid (sx=<optimized out>, xd=<optimized out>, xd=<optimized out>) at src/repodata.c:3138
#1  0x00007ffff33c4ef6 in repodata_serialize_key (data=data@entry=0x555556422268, newincore=newincore@entry=0x7fffffffbe30, newvincore=newvincore@entry=0x7fffffffbe40, schema=schema@entry=0x555555cbbb70, val=1739564, key=<optimized out>, key=<optimized out>) at src/repodata.c:3424
#2  0x00007ffff33cf21f in repodata_internalize (data=data@entry=0x555556422268) at src/repodata.c:3697
#3  0x00007ffff317b5b8 in repo_add_rpmmd (repo=repo@entry=0x55555623c920, fp=fp@entry=0x555555ce1070, language=<optimized out>, language@entry=0x0, flags=flags@entry=16) at ext/repo_rpmmd.c:1165
#4  0x00007ffff41fd844 in load_other_cb (repo=repo@entry=0x55555623c920, fp=fp@entry=0x555555ce1070) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:468
#5  0x00007ffff4200001 in load_ext (sack=0x5555561fd900, hrepo=0x555555a42530, which_repodata=_HY_REPODATA_OTHER, suffix=<optimized out>, which_filename=<optimized out>, cb=0x7ffff41fd830 <load_other_cb(Repo*, FILE*)>, error=0x7fffffffc658) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:430
#6  0x00007ffff42008a6 in dnf_sack_load_repo (sack=0x5555561fd900, repo=0x555555a42530, flags=31, error=0x7fffffffc718) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:1812
#7  0x00007fffe545964e in load_repo(_SackObject*, _object*, _object*) () from /usr/lib64/python3.6/site-packages/hawkey/_hawkey.so
#8  0x00007ffff753a004 in PyCFunction_Call (func=<built-in method load_repo of Sack object at remote 0x7fffdef54ce0>, args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.6.8-45.el8.x86_64/Objects/methodobject.c:103
 :
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 27 Renaud Métrich 2022-09-13 15:09:28 UTC
I can confirm the CDN has been fixed:

- "dnf check-update --changelog nss" works fine on a RHEL8.6 latest
- "/usr/libexec/platform-python /usr/bin/dnf repodiff --repofrompath=o,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.4/x86_64/baseos/os/ --repofrompath=n,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.6/x86_64/baseos/os/ --repo-old=o --repo-new=n --simple --size" works fine as well

I pushed a note in the linked KCS.

Now I don't know if the BZ should be closed as "currentrelease" or not.

Comment 28 Daniel Alley 2022-09-13 15:11:53 UTC
Probably "notabug" or "upstream" would be appropriate

Comment 29 amatej 2022-09-14 10:41:14 UTC
Thanks for resolving the issue!

We would like to keep the bz open if that is not a problem for anyone.
The bug is still present in libdnf (libsolv), it will still crash if the repo is big enough.

Comment 30 Renaud Métrich 2022-09-14 10:42:51 UTC
Sure, feel free to keep it open.
I will inform the customers on my side.

Comment 31 amatej 2022-09-19 13:57:58 UTC
*** Bug 2126516 has been marked as a duplicate of this bug. ***

Comment 33 Daniel Alley 2022-09-19 17:01:30 UTC
Re: 03171061

It sounds like they're using Satellite (hence "CV" or "Content View") 

In that event, the user may need to follow these directions: https://bugzilla.redhat.com/show_bug.cgi?id=2122344

This will be a recommended part of the Satellite 6.12 upgrade but they should be able to run it on Satellite 6.11 as well.

Comment 34 Daniel Alley 2022-09-19 17:05:26 UTC
re: my previous comment, two more points

It's not clear which version they are running, if it is something like 6.9 there isn't pre-baked solution, if it is 6.10 the instructions may be a bit different (and I would probably recommend upgrading to 6.11 first in that case).

And afterwards: once running that script, the relevant repos ought to be re-published.

Comment 35 IBM Bug Proxy 2022-09-19 17:31:34 UTC
Created attachment 1912972 [details]
sosreport

Comment 36 IBM Bug Proxy 2022-09-19 17:31:36 UTC
Created attachment 1912973 [details]
sysctl output


------- Comment on attachment From Harish.N.J 2022-09-14 01:33 EDT-------


Attached sysctl output attached.

Comment 38 Abhijeet Joshi 2023-03-06 09:28:59 UTC
Hello Team,

Is there any update we can provide to the customer? 

Thank you.

Comment 39 Daniel Alley 2023-03-06 14:39:00 UTC
@abjoshi What update are you looking for?  For users updating from the CDN directly, this problem will have ceased months ago.  For users updating via Red Hat Satellite, there are some procedures to be followed which will address the problem, mentioned in comment 33 and comment 34.

Comment 40 Jaroslav Mracek 2023-03-27 11:11:20 UTC
Summary of the issue
DNF is still unable to process too large changelogs, but the issue was resolved by a change in the RHEL infrastructure. The current available RHEL metadata has reasonable size of changelogs. It means that issue cannot be reproduce in RHEl anymore.

According to libsolv upstream (dnf solver that reads changelogs), support of too large change logs will be very difficult and sooner or later we will end up with additional limit. Additionally infrastructure had a difficulty to generate such a metadata due to resource requirement. Because the issue has workaround using alternative way how to generate metadata, there is a low chance to have such a patch in near future.

I am closing the bug but I have difficulty with a reason - I could use - current release, because the issue was already resolved by metadata update (comment 27), or cantfix because we have no solution to support of large changelogs. But CURRENTRELEASE should create less questions about the state in RHEL distribution.

Comment 41 Red Hat Bugzilla 2023-09-18 04:30:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.