Bug 1084174
Summary: | [perf] memory consumption during filelist processing too large for low-end (256 MB) systems | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dennis Gilmore <dennis> |
Component: | libsolv | Assignee: | Honza Silhan <jsilhan> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | akozumpl, awilliam, dennis, jsilhan, packaging-team-maint, pbrobinson, pnemade, rholy, robatino |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | hawkey-0.4.13-1.fc20 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-04-16 09:31:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418, 1043119 |
Description
Dennis Gilmore
2014-04-03 18:24:35 UTC
Dennis, if we are to look into this we will need easy, direct access to the hardware, best option: root ssh access to a fresh f20 box. Lowering prio for the moment. get me an ssh key, and ill get you access. the system only works in rawhide. Dennis, why is this blocking F21? DNF is not to become the nonexperimental default until F22 [1]. [1] https://fedoraproject.org/wiki/Features/DNF I can not provide you an ssh key yet to save you some work as no regular DNF contributor is scheduled to work on this yet as it seemed low priority (only one user complaining so far). Note this is distinctly different from saying it is not a bug. That page does not refer to either F21 or F22 and has not been edited since October 2012. Ales I had it block F21 because dnf gets installed by default in f20 up, it is visable to the end user because of refreshing the metadata. It gives users on systems with limited memory an negative experience. Analyzing this, it turns out on the basic F20 repo (the biggest one yet present on all f20 systems) the rss jumps to 290 MB in the peak when libsolv internalizes the filelists: 99.52% (292,189,931B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->92.54% (271,699,626B) 0x3382048D3C: solv_realloc (in /usr/lib64/libsolv.so.0) | ->35.65% (104,681,472B) 0x33820386B4: ??? (in /usr/lib64/libsolv.so.0) | | ->35.65% (104,681,472B) 0x3382039B78: ??? (in /usr/lib64/libsolv.so.0) | | | ->35.65% (104,681,472B) 0x3382046B13: repodata_internalize (in /usr/lib64/libsolv.so.0) | | | ->35.65% (104,681,472B) 0x3F2841D818: repo_add_rpmmd (in /usr/lib64/libsolvext.so.0) | | | | ->35.65% (104,681,472B) 0xCFCCE6F: load_filelists_cb (sack.c:378) | | | | | ->35.65% (104,681,472B) 0xCFCCD68: load_ext (sack.c:359) | | | | | ->35.65% (104,681,472B) 0xCFCE45C: hy_sack_load_yum_repo (sack.c:878) | | | | | ->35.65% (104,681,472B) 0xCDAFB56: load_yum_repo (sack-py.c:420) Possible ways to tackle this: * see if something simple and not affecting performance can be done on the libsolv side to make the problem less severe (mls has kindly agreed to give it a shot for us, SUSE does not use filelsits) * implement special strategy in libsolv to loading filelists in constrained environments even at the prize of being very slow. * drop using filelists in Fedora. This shameful artifact of our packaging approach should have been dealt with a long time ago. This activity should be driven by FPC. * disable filelists in systems with critically low memory. Some DNF operations will not fully work but the OOMs should be gone. * disable 'automatic makecache' for systems with less then 1gb overall memory. Dropping the blockers per: http://docs.fedoraproject.org/en-US/Fedora/20/html/Release_Notes/sect-Release_Notes-Welcome_to_Fedora_.html Dennis, please point to concrete blocker guidelines if you want this marked as a blocker. Also dropping the ARM tracker. The machines need to be speced properly. Adding back the ARM tracker. This is something we need to track for these platforms. It's likely the cloud guys will be interested too as there's lots of low memory cloud instance use cases. It should be a blocker as it has the ability for users to be affected by core functionality that we're asking them to test to make it ready to be default. If you want to be a default utility in the core OS you need to be able to play nicely with _ALL_ the systems we support and the "The machines need to be speced properly." PoV is just wrong.... if you disagree that systems with 256Mb of RAM are specced wrongly you can take it to FESCo and get them to change the devices we support or else you can fix it. If you think the support of filelists is a "shameful artifact of our packaging approach" why haven't you approached the FPC to get the problem fixed? If you have what is the ticket number so it can be reviewed? Ultimately yum supports this just fine on these configurations so there's no reason why DNF shouldn't. Just because you are fortunate enough to be able to afford a machine with large amounts of memory it doesn't mean everyone else is able to do so. Ultimately blaming everything else when the problem is dnf or it's dependencies is the wrong attitude to have. (In reply to Peter Robinson from comment #9) > It should be a blocker as it has the ability for users to be affected by > core functionality that we're asking them to test to make it ready to be > default. If you want to be a default utility in the core OS you need to be > able to play nicely with _ALL_ the systems we support and the "The machines > need to be speced properly." PoV is just wrong.... if you disagree that > systems with 256Mb of RAM are specced wrongly you can take it to FESCo and > get them to change the devices we support or else you can fix it. Please link to the mentioned list of supported devices/specs by FESCO. The only thing we have until then is: http://docs.fedoraproject.org/en-US/Fedora/20/html/Release_Notes/sect-Release_Notes-Welcome_to_Fedora_.html Also, DNF supports ARMs with 256 MB just fine, just not with very large repos. (In reply to Ales Kozumplik from comment #10) > (In reply to Peter Robinson from comment #9) > > It should be a blocker as it has the ability for users to be affected by > > core functionality that we're asking them to test to make it ready to be > > default. If you want to be a default utility in the core OS you need to be > > able to play nicely with _ALL_ the systems we support and the "The machines > > need to be speced properly." PoV is just wrong.... if you disagree that > > systems with 256Mb of RAM are specced wrongly you can take it to FESCo and > > get them to change the devices we support or else you can fix it. > > Please link to the mentioned list of supported devices/specs by FESCO. > > The only thing we have until then is: > > http://docs.fedoraproject.org/en-US/Fedora/20/html/Release_Notes/sect- > Release_Notes-Welcome_to_Fedora_.html > > Also, DNF supports ARMs with 256 MB just fine, just not with very large > repos. https://fedorahosted.org/fesco/ticket/387 is where FESCo decided that the XO-1 was the lowest supported i686 system, it only has 256MB ram. it provides the baseline that Fedora needs to run in 256MB ram. dnf doesn't work with the default enabled fedora repos. Please leave this bug blocking ARMTracker as its used by the ARM team to track issues on arm systems. please leave the F21 blocker there its up to QA to ultimately decide if the issue is a blocker or not. it is an issue that will be visable for users in F21, even if they never use dnf because it is installed by default and the metadata refresh process gets killed. "please leave the F21 blocker there its up to QA to ultimately decide if the issue is a blocker or not" this is not in fact the case: "Blocker Bug Meetings are not owned by any one team in Fedora. They are a collaborative effort between Release Engineering, Quality Assurance, Development, and Project Management." https://fedoraproject.org/wiki/QA:SOP_Blocker_Bug_Meeting those three teams together are responsible for blocker determinations; the escalation path isn't formally defined (it probably should be) but informally has usually been to a consensus at go/no-go meeting. Ales: "Dropping the blockers" please don't do this. In the Fedora blocker bug process, a bug that is blocking a blocker tracker bug but with no other 'special' attributes is a *proposed* blocker, and that status should never be withdrawn except by a) the proposer or b) a proper vote under the relevant SOP - https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Reviewing_blocker_bugs. Please read that SOP for more information on the Fedora blocker process. Thanks! (In reply to Adam Williamson from comment #13) > Ales: "Dropping the blockers" > > please don't do this. In the Fedora blocker bug process, a bug that is > blocking a blocker tracker bug but with no other 'special' attributes is a > *proposed* blocker, and that status should never be withdrawn except by a) > the proposer or b) a proper vote under the relevant SOP - > https://fedoraproject.org/wiki/QA: > SOP_blocker_bug_process#Reviewing_blocker_bugs. > > Please read that SOP for more information on the Fedora blocker process. > Thanks! Thanks for clarifying Adam, forgot how this worked and it in fact looked not right to me that bugs can be tagged blockers ad-hoc. Moving to Jan. Jan, Michael just told me he pushed two patches to libsolv today that should alleviate this, please go ahead with the rawhide rebase (you can close bug 1074126 then too). If the fix doesn't work out I'll guide you through the next steps. (Note from Michael: in my little solv example file I "switch" over to the solv file afer I have written it to disk, so that the paging mechansm is enabled. Maybe hawkey might also want to do that.) Dennis or someone else, could you please try the new version of libsolv (libsolv-0.6.0-0.git05baf54.fc21) and tell me if problem still exists? hawkey-0.4.12-1.fc21.armv7hl dnf-0.4.20-1.fc21.noarch libsolv-0.6.0-0.git05baf54.fc21.armv7hl libcomps-0.1.6-8.fc21.armv7hl [ 2280.708063] dnf (704) used greatest stack depth: 3624 bytes left [ 2610.474818] dnf invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [ 2610.483453] dnf cpuset=/ mems_allowed=0 [ 2610.487594] CPU: 0 PID: 698 Comm: dnf Not tainted 3.15.0-0.rc0.git9.1.fc21.armv7hl #1 [ 2610.495938] [<c0017de0>] (unwind_backtrace) from [<c001291c>] (show_stack+0x18/0x1c) [ 2610.504126] [<c001291c>] (show_stack) from [<c071d548>] (dump_stack+0x8c/0xb8) [ 2610.511808] [<c071d548>] (dump_stack) from [<c071a438>] (dump_header.isra.10+0x80/0x424) [ 2610.520394] [<c071a438>] (dump_header.isra.10) from [<c0139768>] (oom_kill_process+0x7c/0x578) [ 2610.529514] [<c0139768>] (oom_kill_process) from [<c013a34c>] (out_of_memory+0x50c/0x55c) [ 2610.538182] [<c013a34c>] (out_of_memory) from [<c013f0d4>] (__alloc_pages_nodemask+0x880/0xbb8) [ 2610.547385] [<c013f0d4>] (__alloc_pages_nodemask) from [<c0160e3c>] (handle_mm_fault+0x908/0xa90) [ 2610.556807] [<c0160e3c>] (handle_mm_fault) from [<c0726af8>] (do_page_fault.part.11+0x14c/0x388) [ 2610.566104] [<c0726af8>] (do_page_fault.part.11) from [<c0726d68>] (do_page_fault+0x34/0xa4) [ 2610.575034] [<c0726d68>] (do_page_fault) from [<c00083d8>] (do_DataAbort+0x3c/0xa0) [ 2610.583155] [<c00083d8>] (do_DataAbort) from [<c0725474>] (__dabt_usr+0x34/0x40) [ 2610.590985] Exception stack(0xc13dffb0 to 0xc13dfff8) [ 2610.596363] ffa0: b0635000 00d32ff6 b0634fff ffffffda [ 2610.605017] ffc0: befd9914 00000075 00000000 befd9914 a64c0304 015ad0a4 aaa8a639 00116683 [ 2610.613692] ffe0: 00d32fff befd9860 b676ab3c b6769788 600f0010 ffffffff [ 2610.620696] Mem-info: [ 2610.623154] DMA per-cpu: [ 2610.625883] CPU 0: hi: 90, btch: 15 usd: 73 [ 2610.630992] active_anon:12507 inactive_anon:12566 isolated_anon:0 active_file:46 inactive_file:108 isolated_file:0 unevictable:0 dirty:0 writeback:1261 unstable:0 free:477 slab_reclaimable:3990 slab_unreclaimable:28476 mapped:38 shmem:1 pagetables:537 bounce:0 free_cma:0 [ 2610.664051] DMA free:1908kB min:1952kB low:2440kB high:2928kB active_anon:50028kB inactive_anon:50264kB active_file:184kB inactive_file:432kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:261120kB managed:239696kB mlocked:0kB dirty:0kB writeback:5044kB mapped:152kB shmem:4kB slab_reclaimable:15960kB slab_unreclaimable:113904kB kernel_stack:808kB pagetables:2148kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1397 all_unreclaimable? yes [ 2610.707915] lowmem_reserve[]: 0 0 0 0 [ 2610.711932] DMA: 477*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 1908kB [ 2610.724589] 1560 total pagecache pages [ 2610.728595] 1401 pages in swap cache [ 2610.732415] Swap cache stats: add 44097, delete 42696, find 6261/7490 [ 2610.739235] Free swap = 0kB [ 2610.742322] Total swap = 124996kB [ 2610.757425] 65536 pages of RAM [ 2610.760805] 831 free pages [ 2610.763740] 5612 reserved pages [ 2610.767112] 30381 slab pages [ 2610.770206] 33039 pages shared [ 2610.773480] 1401 pages swap cached [ 2610.777116] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 2610.785473] [ 276] 0 276 5151 14 12 63 0 systemd-journal [ 2610.794737] [ 287] 0 287 3792 0 7 221 0 lvmetad [ 2610.803309] [ 302] 0 302 3391 4 7 286 -1000 systemd-udevd [ 2610.812427] [ 375] 0 375 3397 6 6 79 -1000 auditd [ 2610.820905] [ 390] 0 390 49591 0 74 201 0 rsyslogd [ 2610.829575] [ 397] 0 397 4336 1 11 211 0 abrtd [ 2610.837964] [ 399] 0 399 11410 120 15 233 0 NetworkManager [ 2610.847171] [ 403] 0 403 4248 4 11 146 0 abrt-watch-log [ 2610.856374] [ 407] 0 407 1201 2 5 89 0 smartd [ 2610.864851] [ 411] 0 411 9482 0 12 206 0 ModemManager [ 2610.873875] [ 415] 0 415 494 1 5 13 0 rngd [ 2610.882170] [ 424] 81 424 3043 31 6 74 -900 dbus-daemon [ 2610.891100] [ 427] 32 427 1244 13 5 68 0 rpcbind [ 2610.899697] [ 440] 0 440 2640 15 6 42 0 systemd-logind [ 2610.908903] [ 442] 999 442 17434 0 16 733 0 polkitd [ 2610.917472] [ 447] 0 447 1557 21 6 127 0 crond [ 2610.925858] [ 450] 0 450 841 0 4 41 0 atd [ 2610.934064] [ 467] 0 467 3748 3 9 136 0 login [ 2610.942450] [ 468] 0 468 1033 4 5 20 0 agetty [ 2610.950928] [ 470] 0 470 1272 2 5 56 0 oddjobd [ 2610.959498] [ 471] 0 471 2275 16 7 107 0 certmonger [ 2610.968340] [ 476] 0 476 2591 4 8 141 -1000 sshd [ 2610.976687] [ 484] 29 484 1374 3 5 184 0 rpc.statd [ 2610.985444] [ 485] 0 485 5111 4 12 1935 0 dhclient [ 2610.994107] [ 543] 0 543 1241 0 5 96 0 systemd [ 2611.002702] [ 548] 0 548 8793 6 10 512 0 (sd-pam) [ 2611.011362] [ 550] 0 550 1462 4 7 214 0 bash [ 2611.019682] [ 633] 0 633 4112 46 10 120 0 sssd [ 2611.027977] [ 634] 0 634 10101 43 15 356 0 sssd_be [ 2611.036544] [ 635] 0 635 6806 33 16 113 0 sssd_nss [ 2611.045204] [ 636] 0 636 3899 32 10 117 0 sssd_pam [ 2611.053862] [ 637] 0 637 4076 30 11 135 0 sssd_ssh [ 2611.062520] [ 638] 0 638 4794 34 12 131 0 sssd_pac [ 2611.071180] [ 640] 0 640 5415 11 11 213 0 sshd [ 2611.079583] [ 645] 217600001 645 2973 0 7 100 0 systemd [ 2611.088504] [ 647] 217600001 647 8860 9 11 513 0 (sd-pam) [ 2611.097532] [ 649] 217600001 649 5415 37 11 189 0 sshd [ 2611.106195] [ 651] 217600001 651 3171 4 7 211 0 bash [ 2611.114858] [ 693] 217600001 693 3062 93 7 18 0 top [ 2611.123425] [ 698] 0 698 58701 23050 119 22994 0 dnf [ 2611.131621] Out of memory: Kill process 698 (dnf) score 491 or sacrifice child [ 2611.139490] Killed process 698 (dnf) total-vm:234804kB, anon-rss:92140kB, file-rss:60kB trying to run dnf update New version of Hawkey is released (hawkey-0.4.13-1.fc21) that should reduce memory consumption. Please, try that out. Let's call it ON_QA, then. Dennis? with updated hawkey the process wass not killed and when doing a "dnf makecache" according to top memory usage topped out at 72% So far it seems to have made the experience better, thanks hawkey-0.4.13-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/hawkey-0.4.13-1.fc20 Good to hear that. Mls takes credit for solving this bug. hawkey-0.4.13-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. |