Bug 2040170
| Summary: | dnf check-update --changelog <package-name> causes OOM kill on 2GB RAM system supported for RHEL8 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Masahiro Matsuya <mmatsuya> | ||||||
| Component: | libsolv | Assignee: | amatej | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | swm-qe | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 8.5 | CC: | abjoshi, amatej, amepatil, bugproxy, dalley, luka, maxwell, mbanas, nico.van.roijen, oliver, r1ch11, redhatbugzilla, rmetrich, scott.worthington, sujagtap | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 2064584 (view as bug list) | Environment: | |||||||
| Last Closed: | 2023-03-27 11:11:20 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 2064584 | ||||||||
| Attachments: |
|
||||||||
|
Description
Masahiro Matsuya
2022-01-13 07:25:23 UTC
Alternatively, on systems with a lot of memory (16GB in my test), "dnf changelog <package>" dies in Segmentation Fault. So I went with experimenting on RHEL9 (beta 0 DVD) and Fedora Rawhide as well, trying "dnf changelog tree" command. 1. RHEL9 works with DVD repos (baseos + appstream), which do not contain much packages Total packages: 6,298 2. RHEL9 **fails** with BaseOS repo from RHEL8 Total packages: 9,538 3. Fedora Rawhide works with Fedora repos, which contain a lot of packages Total packages: 68,180 4. Fedora Rawhide **fails** with BaseOS repo from RHEL8 From this, we can state that there seems to be something wrong with the repository itself. AppStream seems to work fine. *** Bug 2008233 has been marked as a duplicate of this bug. *** The main cause here is that the BaseOS repo from RHEL8 is just too big. Uncompressed other.xml has around 3GB. Due to the way libsolv parsing works we have to load it all into memory at once so there is currently no way to make this work on a 2GB RAM machine (if you want to load the changelogs - other.xml). The other very related issue is that even if we have enough RAM available there is a Segmentation Fault because of an integer overflow caused by so many strings in the other.xml. Libsolv is kind of built on top of Ids (type integer) so this may be problematic to resolve. I made an upstream issue: https://github.com/openSUSE/libsolv/issues/493 Hello, The issue doesn't only affects the minor functionality "changelog", but also "repodiff", which is far more critical, e.g.: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # gdb --args /usr/libexec/platform-python /usr/bin/dnf repodiff --repofrompath=o,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.4/x86_64/baseos/os/ --repofrompath=n,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.6/x86_64/baseos/os/ --repo-old=o --repo-new=n --simple --size [...] (gdb) run [...] Program received signal SIGSEGV, Segmentation fault. 0x00007ffff33c3328 in data_addid (sx=<optimized out>, xd=<optimized out>, xd=<optimized out>) at src/repodata.c:3138 3138 *dp++ = x & 127; (gdb) bt (gdb) bt #0 0x00007ffff33c3328 in data_addid (sx=<optimized out>, xd=<optimized out>, xd=<optimized out>) at src/repodata.c:3138 #1 0x00007ffff33c4ef6 in repodata_serialize_key (data=data@entry=0x555556422268, newincore=newincore@entry=0x7fffffffbe30, newvincore=newvincore@entry=0x7fffffffbe40, schema=schema@entry=0x555555cbbb70, val=1739564, key=<optimized out>, key=<optimized out>) at src/repodata.c:3424 #2 0x00007ffff33cf21f in repodata_internalize (data=data@entry=0x555556422268) at src/repodata.c:3697 #3 0x00007ffff317b5b8 in repo_add_rpmmd (repo=repo@entry=0x55555623c920, fp=fp@entry=0x555555ce1070, language=<optimized out>, language@entry=0x0, flags=flags@entry=16) at ext/repo_rpmmd.c:1165 #4 0x00007ffff41fd844 in load_other_cb (repo=repo@entry=0x55555623c920, fp=fp@entry=0x555555ce1070) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:468 #5 0x00007ffff4200001 in load_ext (sack=0x5555561fd900, hrepo=0x555555a42530, which_repodata=_HY_REPODATA_OTHER, suffix=<optimized out>, which_filename=<optimized out>, cb=0x7ffff41fd830 <load_other_cb(Repo*, FILE*)>, error=0x7fffffffc658) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:430 #6 0x00007ffff42008a6 in dnf_sack_load_repo (sack=0x5555561fd900, repo=0x555555a42530, flags=31, error=0x7fffffffc718) at /usr/src/debug/libdnf-0.63.0-8.el8.x86_64/libdnf/dnf-sack.cpp:1812 #7 0x00007fffe545964e in load_repo(_SackObject*, _object*, _object*) () from /usr/lib64/python3.6/site-packages/hawkey/_hawkey.so #8 0x00007ffff753a004 in PyCFunction_Call (func=<built-in method load_repo of Sack object at remote 0x7fffdef54ce0>, args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.6.8-45.el8.x86_64/Objects/methodobject.c:103 : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- I can confirm the CDN has been fixed: - "dnf check-update --changelog nss" works fine on a RHEL8.6 latest - "/usr/libexec/platform-python /usr/bin/dnf repodiff --repofrompath=o,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.4/x86_64/baseos/os/ --repofrompath=n,http://rhsm-pulp.corp.redhat.com/content/dist/rhel8/8.6/x86_64/baseos/os/ --repo-old=o --repo-new=n --simple --size" works fine as well I pushed a note in the linked KCS. Now I don't know if the BZ should be closed as "currentrelease" or not. Probably "notabug" or "upstream" would be appropriate Thanks for resolving the issue! We would like to keep the bz open if that is not a problem for anyone. The bug is still present in libdnf (libsolv), it will still crash if the repo is big enough. Sure, feel free to keep it open. I will inform the customers on my side. *** Bug 2126516 has been marked as a duplicate of this bug. *** Re: 03171061 It sounds like they're using Satellite (hence "CV" or "Content View") In that event, the user may need to follow these directions: https://bugzilla.redhat.com/show_bug.cgi?id=2122344 This will be a recommended part of the Satellite 6.12 upgrade but they should be able to run it on Satellite 6.11 as well. re: my previous comment, two more points It's not clear which version they are running, if it is something like 6.9 there isn't pre-baked solution, if it is 6.10 the instructions may be a bit different (and I would probably recommend upgrading to 6.11 first in that case). And afterwards: once running that script, the relevant repos ought to be re-published. Created attachment 1912972 [details]
sosreport
Created attachment 1912973 [details]
sysctl output
------- Comment on attachment From Harish.N.J 2022-09-14 01:33 EDT-------
Attached sysctl output attached.
Hello Team, Is there any update we can provide to the customer? Thank you. @abjoshi What update are you looking for? For users updating from the CDN directly, this problem will have ceased months ago. For users updating via Red Hat Satellite, there are some procedures to be followed which will address the problem, mentioned in comment 33 and comment 34. Summary of the issue DNF is still unable to process too large changelogs, but the issue was resolved by a change in the RHEL infrastructure. The current available RHEL metadata has reasonable size of changelogs. It means that issue cannot be reproduce in RHEl anymore. According to libsolv upstream (dnf solver that reads changelogs), support of too large change logs will be very difficult and sooner or later we will end up with additional limit. Additionally infrastructure had a difficulty to generate such a metadata due to resource requirement. Because the issue has workaround using alternative way how to generate metadata, there is a low chance to have such a patch in near future. I am closing the bug but I have difficulty with a reason - I could use - current release, because the issue was already resolved by metadata update (comment 27), or cantfix because we have no solution to support of large changelogs. But CURRENTRELEASE should create less questions about the state in RHEL distribution. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |