Description of problem: During the Fedora compose process it runs xz inside various mock setups. When using it inside a i386 mock on x86_64 hardware (with 4 HT cores) it fails. This doesn't happen when used on x86_64 or arm. Version-Release number of selected component (if applicable): xz-5.2.0-2.fc22.i686 How reproducible: always Steps to Reproduce: 1. mock -r fedora-22-i386 --init 2. mock -r fedora-22-i386 --shell 3. cd /root 4. find . -print0 | cpio --null --quiet -H newc -o | xz -T8 --check=crc32 -9 > ../foo.xz Actual results: xz: (stdin): Cannot allocate memory Expected results: valid file Additional info: If you change it to use -T7 instead of -T8 it works. -T0 also fails in the same way as -T8.
It also succeeds if you leave -T8, but change -9 to -4.
Some notes from the xz author on IRC: <Larhzu> Hey <Larhzu> It runs out of memory or most likely out of address space on 32-bit CPU. <Larhzu> xz -T8 -9 needs amost 10 GiB of memory. <adamw> zoiks. thanks <Larhzu> One solution is to set a memory usage limit with --memlimit-compress=3GiB or something like that which makes xz scale the settings down. <adamw> yeah, i just now found a thread about that <Larhzu> It's a question if xz should do that automatically on 32-bit OS. <adamw> if i try -M 256M I get: <adamw> xz: Memory usage limit is too low for the given filter setup. <adamw> xz: 1,250 MiB of memory is required. The limit is 256 MiB. <adamw> so that means it has a hard minimum req of 1.25GB for that configuration? <Larhzu> (Also, the threading implementation is a bit too memory hungry relative to what it could be, but that's a more complex issue to optimize.) <Larhzu> More like the maximum. <Larhzu> It doesn't allocate it all until it would be needed. <Larhzu> That's why you can run out of memory in the middle of a file. <Larhzu> This applies specifically to threaded encoding. In other situations the memory usually is allocated before doing anything. <adamw> aha, i see. thanks <Larhzu> But with threads, new threads are started as needed. <Larhzu> Since the threading is done by splitting the data into blocks, many threads aren't needed with smaller files. <Larhzu> With xz -9 the default block size (can be changed with --block-size) is 192 MiB. <Larhzu> So for every 192 MiB of input a new thread gets started unless an already running thread has become idle. <adamw> would you accept a bug report that xz should automatically tone things down on a 32-bit platform? <Larhzu> On 32-bit CPU -T3 maybe be a bit much with -9, -T2 will do. With -6 or so, many threads will work. <Larhzu> Well <Larhzu> The memory usage limit is a controversial topic in xz. <Larhzu> Others like it and others hate it. <Larhzu> So whatever is done, someone will be annoyed or even angry. <adamw> heh, we're familiar with that one... <Larhzu> Perhaps it's worse with the decompression side having a limit. <Larhzu> If there is a limit and xz adjusts things automatically, then some people may say that xz thinks too much on its own and should instead give an error. <Larhzu> It could also be said that xz should read resource limits to find out how much memory kernel would allow it to allocate, but currently that isn't done. <Larhzu> (It's also a bit OS-specific.) <Larhzu> Anyway, I'm open to suggestions.
So basically this turns out to be a very squishy issue for xz; the combination of high compression level and lots of threads that lorax requests requires a *lot* of memory, and xz apparently has users who don't want it to make a 'best effort' in that case, they want it to fail (and making a 'best effort' is in any case somewhat tricky). Larhzu is currently looking into one thing that would help us out for this particular case: for a 32-bit xz process, treat the 'maximum' memory for the purpose of the options that limit memory usage as 'the host system's memory *up to ~4GiB*', not 'the host system's memory'. Right now we can't just pass '-M 80%' and be happy, because the host system could have more than 4GiB of RAM if we're using a PAE kernel or running a 32-bit xz on a 64-bit host (which is actually what we do, when building the installer images, via mock). If he can do that, we can just make pungi or lorax or something pass '-M 80%' or similar and that would probably be a good-enough solution, so I guess we can use this bug to track the viability/progress of that fix. For a short-term fix to the problem of 'the 22 Alpha images are busted', we'll probably just have releng force a 4GiB limit in for all arches, that should be good enough for the short term; that will be tracked in the original bug - https://bugzilla.redhat.com/show_bug.cgi?id=1196481 .
It does sound a bit like NOTABUG (w.r.t xz). Can the package do something like: threads=8 %ifarch %{arm} %{ix86} # Be a bit less aggressive on 32 bit machines threads=4 %endif xz -T$threads ...
lorax *can* do something like ifarch ix86 -M 3G, but it's kind of icky, and i think we achieved a general agreement with the xz author that at least some improvement is possible upstream. the IRC chat went on longer, I can forward the whole thing to you if you like.
Some news from upstream: <Larhzu> adamw: I don't really have news about the out-of-address-space problem. I think a future version will have the feature of not stopping if running out of memory as long as at least one thread is running, but I don't try to predict when. adamw: Making a simple change to --memlimit is an option still and could be done quickly, but I'm not any wiser about what that simple change should be. <Larhzu> It doesn't sound so simple to figure that out.
<adamw> Larhzu: how much variance are we talking about? <Larhzu> adamw: 32-bit Linux usually has 3/1 gig memory split, giving about 3 GiB to address space to userspace. adamw: x86-64 Linux gives full 4 GiB to 32-bit apps. adamw: 32-bit kernel can be something else than 3/1, e.g. I used to built 2/2 kernels a decade ago. <Larhzu> Since that avoided bigmem support for systems with exactly 1 GiB RAM.
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle. Changing version to '23'. (As we did not run this process for some time, it could affect also pre-Fedora 23 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
This is still a live issue and, AFAICT, never has gotten resolved upstream (reading https://git.tukaani.org/?p=xz.git;a=blob;f=NEWS;hb=HEAD there's no reference to any change that would address this AFAICS).
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'.
According to the low priority of this request and as it did not bother any user for years, I am closing this tracker. If you think this issue should be handled and investigated, feel free to reopen it.
Well, we do still have to work around this in lorax: https://github.com/weldr/lorax/commit/99e575e61b59330b9e0979a874fa41b07d1c9a56 that code is still there today. It is better than the original hack, which limited us even on arches that could use much more memory. But we were expecting, all the way back in 2015, that this would be properly handled upstream so we didn't have to write in the limit ourselves downstream.
Thank you for the reply, leaving this ticket open for tracking reasons and will contact upstream about this.
Contacted upstream maintainer via personal email, as it is the official communication channel. Waiting for response.
No response until now.
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.
Hi Adam, Could you please check if this has been touched since you reported it in 2015? I'm not entirely sure what I should look for, so I'm asking you.
Lukas: last update was just less than a year ago, see above. I guess we can ask Ondrej if upstream ever replied.
Thank you, Adam, Unfortunately, Ondrej is no longer part of the Red Hat, but he didn't get any response from the upstream. I can write them again and see if there is any change, but could you wrap up the main thing you want them to fix? From the IRC communication you've pasted here in the comments, there were a lot of questions on how you would imagine it to work, but I can't tell what the solution is. Thanks again
I don't recall all the ins and outs of the issue, but comment #3 looks like it has the main ask: "Larhzu is currently looking into one thing that would help us out for this particular case: for a 32-bit xz process, treat the 'maximum' memory for the purpose of the options that limit memory usage as 'the host system's memory *up to ~4GiB*', not 'the host system's memory'. Right now we can't just pass '-M 80%' and be happy, because the host system could have more than 4GiB of RAM if we're using a PAE kernel or running a 32-bit xz on a 64-bit host (which is actually what we do, when building the installer images, via mock)."
Oh, btw, thinking about it, we don't build 32-bit images for Fedora any more. Not 32-bit ARM, not 32-bit Intel. So at the Fedora level the issue is kind of academic now, I guess. The problem still exists and theoretically ought to be addressed, but it's probably not really an issue for us any more.
Okay, I will close this BZ in that case. We have a smaller capacity in the team right now and since you don't build the 32-bit image anymore, this issue has a small priority for us. Thanks for letting us know