Bug 1196786 - Running xz -T8 from inside a i386 mock fails with xz: (stdin): Cannot allocate memory
Summary: Running xz -T8 from inside a i386 mock fails with xz: (stdin): Cannot allocat...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: xz
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Matej Mužila
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-26 18:27 UTC by Brian Lane
Modified: 2022-09-29 08:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-29 08:15:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Brian Lane 2015-02-26 18:27:01 UTC
Description of problem:
During the Fedora compose process it runs xz inside various mock setups. When using it inside a i386 mock on x86_64 hardware (with 4 HT cores) it fails. This doesn't happen when used on x86_64 or arm.

Version-Release number of selected component (if applicable):
xz-5.2.0-2.fc22.i686

How reproducible:
always

Steps to Reproduce:
1. mock -r fedora-22-i386 --init
2. mock -r fedora-22-i386 --shell
3. cd /root
4. find . -print0 | cpio --null --quiet -H newc -o | xz -T8 --check=crc32 -9 > ../foo.xz


Actual results:
xz: (stdin): Cannot allocate memory

Expected results:
valid file

Additional info:
If you change it to use -T7 instead of -T8 it works. -T0 also fails in the same way as -T8.

Comment 1 Adam Williamson 2015-02-26 18:29:03 UTC
It also succeeds if you leave -T8, but change -9 to -4.

Comment 2 Adam Williamson 2015-02-26 19:00:31 UTC
Some notes from the xz author on IRC:

<Larhzu> Hey
<Larhzu> It runs out of memory or most likely out of address space on 32-bit CPU.
<Larhzu> xz -T8 -9 needs amost 10 GiB of memory.
<adamw> zoiks. thanks
<Larhzu> One solution is to set a memory usage limit with --memlimit-compress=3GiB or something like that which makes xz scale the settings down.
<adamw> yeah, i just now found a thread about that
<Larhzu> It's a question if xz should do that automatically on 32-bit OS.
<adamw> if i try -M 256M I get:
<adamw> xz: Memory usage limit is too low for the given filter setup.
<adamw> xz: 1,250 MiB of memory is required. The limit is 256 MiB.
<adamw> so that means it has a hard minimum req of 1.25GB for that configuration?
<Larhzu> (Also, the threading implementation is a bit too memory hungry relative to what it could be, but that's a more complex issue to optimize.)
<Larhzu> More like the maximum.
<Larhzu> It doesn't allocate it all until it would be needed.
<Larhzu> That's why you can run out of memory in the middle of a file.
<Larhzu> This applies specifically to threaded encoding. In other situations the memory usually is allocated before doing anything.
<adamw> aha, i see. thanks
<Larhzu> But with threads, new threads are started as needed.
<Larhzu> Since the threading is done by splitting the data into blocks, many threads aren't needed with smaller files.
<Larhzu> With xz -9 the default block size (can be changed with --block-size) is 192 MiB.
<Larhzu> So for every 192 MiB of input a new thread gets started unless an already running thread has become idle.
<adamw> would you accept a bug report that xz should automatically tone things down on a 32-bit platform?
<Larhzu> On 32-bit CPU -T3 maybe be a bit much with -9, -T2 will do. With -6 or so, many threads will work.
<Larhzu> Well
<Larhzu> The memory usage limit is a controversial topic in xz.
<Larhzu> Others like it and others hate it.
<Larhzu> So whatever is done, someone will be annoyed or even angry.
<adamw> heh, we're familiar with that one...
<Larhzu> Perhaps it's worse with the decompression side having a limit.
<Larhzu> If there is a limit and xz adjusts things automatically, then some people may say that xz thinks too much on its own and should instead give an error.
<Larhzu> It could also be said that xz should read resource limits to find out how much memory kernel would allow it to allocate, but currently that isn't done.
<Larhzu> (It's also a bit OS-specific.)
<Larhzu> Anyway, I'm open to suggestions.

Comment 3 Adam Williamson 2015-02-26 20:24:25 UTC
So basically this turns out to be a very squishy issue for xz; the combination of high compression level and lots of threads that lorax requests requires a *lot* of memory, and xz apparently has users who don't want it to make a 'best effort' in that case, they want it to fail (and making a 'best effort' is in any case somewhat tricky).

Larhzu is currently looking into one thing that would help us out for this particular case: for a 32-bit xz process, treat the 'maximum' memory for the purpose of the options that limit memory usage as 'the host system's memory *up to ~4GiB*', not 'the host system's memory'. Right now we can't just pass '-M 80%' and be happy, because the host system could have more than 4GiB of RAM if we're using a PAE kernel or running a 32-bit xz on a 64-bit host (which is actually what we do, when building the installer images, via mock).

If he can do that, we can just make pungi or lorax or something pass '-M 80%' or similar and that would probably be a good-enough solution, so I guess we can use this bug to track the viability/progress of that fix.

For a short-term fix to the problem of 'the 22 Alpha images are busted', we'll probably just have releng force a 4GiB limit in for all arches, that should be good enough for the short term; that will be tracked in the original bug - https://bugzilla.redhat.com/show_bug.cgi?id=1196481 .

Comment 4 Richard W.M. Jones 2015-02-26 20:28:42 UTC
It does sound a bit like NOTABUG (w.r.t xz).  Can the package do something
like:

threads=8
%ifarch %{arm} %{ix86}
# Be a bit less aggressive on 32 bit machines
threads=4
%endif
xz -T$threads ...

Comment 5 Adam Williamson 2015-02-26 20:30:25 UTC
lorax *can* do something like ifarch ix86 -M 3G, but it's kind of icky, and i think we achieved a general agreement with the xz author that at least some improvement is possible upstream. the IRC chat went on longer, I can forward the whole thing to you if you like.

Comment 6 Adam Williamson 2015-03-15 16:58:35 UTC
Some news from upstream:

<Larhzu> adamw: I don't really have news about the out-of-address-space problem. I think a future version will have the feature of not stopping if running out of memory as long as at least one thread is running, but I don't try to predict when.
 adamw: Making a simple change to --memlimit is an option still and could be done quickly, but I'm not any wiser about what that simple change should be.
<Larhzu> It doesn't sound so simple to figure that out.

Comment 7 Adam Williamson 2015-03-15 17:03:51 UTC
<adamw> Larhzu: how much variance are we talking about?
<Larhzu> adamw: 32-bit Linux usually has 3/1 gig memory split, giving about 3 GiB to address space to userspace.
 adamw: x86-64 Linux gives full 4 GiB to 32-bit apps.
 adamw: 32-bit kernel can be something else than 3/1, e.g. I used to built 2/2 kernels a decade ago.
<Larhzu> Since that avoided bigmem support for systems with exactly 1 GiB RAM.

Comment 8 Jan Kurik 2015-07-15 14:29:27 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 9 Fedora Admin XMLRPC Client 2015-08-11 05:34:25 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Fedora End Of Life 2016-11-24 11:29:44 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2016-12-20 13:17:23 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 12 Adam Williamson 2017-05-02 23:52:12 UTC
This is still a live issue and, AFAICT, never has gotten resolved upstream (reading https://git.tukaani.org/?p=xz.git;a=blob;f=NEWS;hb=HEAD there's no reference to any change that would address this AFAICS).

Comment 13 Jan Kurik 2017-08-15 08:42:34 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 14 Ondrej Dubaj 2021-10-07 07:16:03 UTC
According to the low priority of this request and as it did not bother any user for years, I am closing this tracker. If you think this issue should be handled and investigated, feel free to reopen it.

Comment 15 Adam Williamson 2021-10-07 20:00:29 UTC
Well, we do still have to work around this in lorax:

https://github.com/weldr/lorax/commit/99e575e61b59330b9e0979a874fa41b07d1c9a56

that code is still there today. It is better than the original hack, which limited us even on arches that could use much more memory. But we were expecting, all the way back in 2015, that this would be properly handled upstream so we didn't have to write in the limit ourselves downstream.

Comment 16 Ondrej Dubaj 2021-10-08 07:06:13 UTC
Thank you for the reply, leaving this ticket open for tracking reasons and will contact upstream about this.

Comment 17 Ondrej Dubaj 2021-10-08 08:34:30 UTC
Contacted upstream maintainer via personal email, as it is the official communication channel. Waiting for response.

Comment 18 Ondrej Dubaj 2021-10-18 12:46:07 UTC
No response until now.

Comment 19 Fedora Admin user for bugzilla script actions 2021-12-07 00:23:14 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 20 Lukas Javorsky 2022-09-23 16:51:37 UTC
Hi Adam,

Could you please check if this has been touched since you reported it in 2015?

I'm not entirely sure what I should look for, so I'm asking you.

Comment 21 Adam Williamson 2022-09-23 17:47:56 UTC
Lukas: last update was just less than a year ago, see above. I guess we can ask Ondrej if upstream ever replied.

Comment 22 Lukas Javorsky 2022-09-27 11:30:48 UTC
Thank you, Adam,

Unfortunately, Ondrej is no longer part of the Red Hat, but he didn't get any response from the upstream.

I can write them again and see if there is any change, but could you wrap up the main thing you want them to fix?
From the IRC communication you've pasted here in the comments, there were a lot of questions on how you would imagine it to work, but I can't tell what the solution is.

Thanks again

Comment 23 Adam Williamson 2022-09-27 19:41:45 UTC
I don't recall all the ins and outs of the issue, but comment #3 looks like it has the main ask:

"Larhzu is currently looking into one thing that would help us out for this particular case: for a 32-bit xz process, treat the 'maximum' memory for the purpose of the options that limit memory usage as 'the host system's memory *up to ~4GiB*', not 'the host system's memory'. Right now we can't just pass '-M 80%' and be happy, because the host system could have more than 4GiB of RAM if we're using a PAE kernel or running a 32-bit xz on a 64-bit host (which is actually what we do, when building the installer images, via mock)."

Comment 24 Adam Williamson 2022-09-27 19:44:11 UTC
Oh, btw, thinking about it, we don't build 32-bit images for Fedora any more. Not 32-bit ARM, not 32-bit Intel. So at the Fedora level the issue is kind of academic now, I guess. The problem still exists and theoretically ought to be addressed, but it's probably not really an issue for us any more.

Comment 25 Lukas Javorsky 2022-09-29 08:15:19 UTC
Okay, I will close this BZ in that case. We have a smaller capacity in the team right now and since you don't build the 32-bit image anymore, this issue has a small priority for us.

Thanks for letting us know


Note You need to log in before you can comment on or make changes to this bug.