Bug 1729382 - "error: create archive failed: cpio: write failed - Cannot allocate memory" when writing rpms
Summary: "error: create archive failed: cpio: write failed - Cannot allocate memory" w...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: rawhide
Hardware: i686
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-12 07:25 UTC by Dan Horák
Modified: 2019-09-30 14:11 UTC (History)
15 users (show)

Fixed In Version: rpm-4.15.0-0.rc1.1.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-10 01:20:55 UTC


Attachments (Terms of Use)
root.log (138.61 KB, text/plain)
2019-07-12 07:26 UTC, Dan Horák
no flags Details
build.log (652.47 KB, text/plain)
2019-07-12 07:26 UTC, Dan Horák
no flags Details


Links
System ID Priority Status Summary Last Updated
Github rpm-software-management rpm issues 804 None None None 2019-08-05 11:52:13 UTC

Description Dan Horák 2019-07-12 07:25:39 UTC
Description of problem:
When building new collectd rpms in Rawhide I got the following error failing the build. Although per mock's log the resulting rpms were written out.

...
Provides: collectd-zookeeper-debuginfo = 5.9.0-1.fc31 collectd-zookeeper-debuginfo(x86-32) = 5.9.0-1.fc31 debuginfo(build-id) = 8537c07f73788f2dcd14367171c0eec9efc808d2
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Recommends: collectd-debugsource(x86-32) = 5.9.0-1.fc31
Checking for unpackaged file(s): /usr/lib/rpm/check-files /builddir/build/BUILDROOT/collectd-5.9.0-1.fc31.i386
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
Wrote: /builddir/build/RPMS/collectd-drbd-5.9.0-1.fc31.i686.rpm
Wrote: /builddir/build/RPMS/collectd-gps-5.9.0-1.fc31.i686.rpm
Wrote: /builddir/build/RPMS/collectd-notify_email-5.9.0-1.fc31.i686.rpm
...

It happened 3 times in a row, twice on a (different) VMs, once on build-hw5 (with 128GB memory).

https://koji.fedoraproject.org/koji/taskinfo?taskID=36190075
https://koji.fedoraproject.org/koji/taskinfo?taskID=36190328
https://koji.fedoraproject.org/koji/taskinfo?taskID=36190805


Version-Release number of selected component (if applicable):
rpm-4.15.0-0.beta.1.fc31.i686

How reproducible:
was 100% yesterday, but no such problem today or the day before yesterday

Comment 1 Dan Horák 2019-07-12 07:26:11 UTC
Created attachment 1589781 [details]
root.log

Comment 2 Dan Horák 2019-07-12 07:26:46 UTC
Created attachment 1589782 [details]
build.log

Comment 3 Remi Collet 2019-07-16 13:25:37 UTC
Same with https://koji.fedoraproject.org/koji/taskinfo?taskID=36285998

Comment 4 DJ Delorie 2019-07-19 17:04:15 UTC
Seen yesterday on glibc builds...

These builds failed:
https://koji.fedoraproject.org/koji/taskinfo?taskID=36333585 on buildhw-04.phx2.fedoraproject.org
https://koji.fedoraproject.org/koji/taskinfo?taskID=36329825 on buildhw-02.phx2.fedoraproject.org

These builds worked:
https://koji.fedoraproject.org/koji/taskinfo?taskID=36333043 on buildvm-30.phx2.fedoraproject.org
https://koji.fedoraproject.org/koji/taskinfo?taskID=36335876 on buildvm-19.phx2.fedoraproject.org

Comment 5 Dan Horák 2019-07-19 17:18:24 UTC
I wonder if it could be some memory management issue, too much fragmentation or such, for the 32-bit user space (in chroot) on x86_64 machine. Or maybe the error message from cpio is just bogus ... Also interesting might be to know, if the written rpms are correct or corrupted.

Comment 6 DJ Delorie 2019-07-19 17:23:08 UTC
I know it's a tiny sample size, but I note that the failed glibc builds were all on "hw" hosts, and the working ones were all on "vm" hosts... maybe running a vm on a 64-bit host gives you more usable address space?

Comment 7 Dan Horák 2019-07-19 17:42:22 UTC
DJ, you seem to be right, somehow I got confused with my builds, they were all on hw, same with Remi. I think that's a useful information.

Comment 8 Florian Weimer 2019-07-23 11:31:03 UTC
This is likely caused by the auto-scaling in OpenMP:

rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
{
    rpmRC rc = RPMRC_OK;
    Package pkg;

    /* Run binary creation in parallel */
    #pragma omp parallel
    #pragma omp single
    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
        #pragma omp task
        {
        pkg->rc = packageBinary(spec, pkg, cookie, cheating, &pkg->filename);

…

OpenMP cannot take into account the resource usage of each individual thread.  With 48 processors on the builders, there are less then 83 MiB per thread under the most ideal circumstances for a 32-bit process.  That might not be enough, depending on how much RPM tries do in-memory, in a non-streaming fashion.

Comment 9 Panu Matilainen 2019-07-29 09:50:42 UTC
Right, compression can consume significant amount of memory, and when you have a lots of cpus it's quite easy to eat up all the memory. There's already an RFE to take the amount of memory into consideration for parallelisation settings (see bug 1118734), probably need to do something about that now.

FWIW, all parallel activity (including OpenMP) in rpm honors the $RPM_BUILD_NCPUS environment variable and %_smp_ncpus_max macro, so as a workaround for packages commonly hitting this you can do something like this in the spec:

%global _smp_ncpus_max 16

Comment 10 Panu Matilainen 2019-07-29 10:02:41 UTC
Just remembered the XZ compression code has special logic for handling 32bit system memory use for parallel compression, but that doesn't trigger here because we're not parallel compressing a single stream but multiple independent streams. There's also a macro for limiting XZ memory use, but only during decompression. Argh.

Comment 11 Florian Weimer 2019-07-29 10:27:29 UTC
Hasn't rawhide switched to zstd, so different code paths are used?

For XZ, the command line tool prints the amount of memory used for compressing.  Can this information be used to limit the number of processors to something like 3 GiB divided by the amount of memory required for compression?

I don't know if zstd even allows one to compute the memory requires beforehand, based on the compression flags.

Comment 12 Richard W.M. Jones 2019-08-02 13:13:58 UTC
Here's one which happened in Fedora Rawhide just minutes ago (on i686):

https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432

Weirdly an earlier build of the exact same package against what must be
a very similar buildroot (from perhaps 30 mins before) on the same architecture
was successful.  Is it possible this bug is non-deterministic?

Comment 13 Florian Weimer 2019-08-02 13:16:48 UTC
(In reply to Richard W.M. Jones from comment #12)
> Here's one which happened in Fedora Rawhide just minutes ago (on i686):
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432
> 
> Weirdly an earlier build of the exact same package against what must be
> a very similar buildroot (from perhaps 30 mins before) on the same
> architecture
> was successful.  Is it possible this bug is non-deterministic?

It depends on then number of CPUs on the builder.  The VM builders have fewer CPUs and aren't affected.

Comment 14 Michael Cronenworth 2019-08-04 15:46:47 UTC
I'm seeing this issue with wine more frequently. I got it twice in a row with the latest update attempt. Wine is not that large of a package. The largest package is about 68MB.

Build 2: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791708
Parent: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791693

Comment 15 Dan Horák 2019-08-05 07:26:04 UTC
The problem isn't the size of the rpms, but the number of sub-packages/rpms it produces, which is quite high for wine.

Comment 16 Panu Matilainen 2019-08-05 11:50:01 UTC
FWIW there's now an upstream ticket on this for release tracking: https://github.com/rpm-software-management/rpm/issues/804

Comment 17 Ben Cotton 2019-08-13 17:04:31 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 18 Ben Cotton 2019-08-13 17:54:44 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 19 Panu Matilainen 2019-08-23 11:29:16 UTC
This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the solution in rawhide a bit before -rc1 release.

Comment 20 Lukas Slebodnik 2019-08-26 18:42:10 UTC
(In reply to Panu Matilainen from comment #19)
> This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the
> solution in rawhide a bit before -rc1 release.

Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ?
It caused problems in samba
https://koji.fedoraproject.org/koji/taskinfo?taskID=37291661

Comment 21 Panu Matilainen 2019-08-27 05:50:31 UTC
> Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ?

Nope, .5 is just test-driving the change. F31 will go straight to 4.15.0-rc (obviously with the same fix) once it's released, hopefully today.

Comment 22 Fedora Update System 2019-08-28 11:28:44 UTC
FEDORA-2019-e4b6ffd824 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824

Comment 23 Fedora Update System 2019-08-29 21:01:45 UTC
rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824

Comment 24 Fedora Update System 2019-09-10 01:20:55 UTC
rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Sandro Mani 2019-09-29 10:58:16 UTC
I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on F32 [1]. The failure apparently happens when packing the debuginfo subpackage, which consist of some 50 *.debug files totaling approx 700MB. I've tried with %global _smp_ncpus_max 4 but it makes no difference.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932

Comment 26 Panu Matilainen 2019-09-30 07:57:35 UTC
(In reply to Sandro Mani from comment #25)
> I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on
> F32 [1]. The failure apparently happens when packing the debuginfo
> subpackage, which consist of some 50 *.debug files totaling approx 700MB.
> I've tried with %global _smp_ncpus_max 4 but it makes no difference.
> 
> [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932

Four is all the CPUs that builder has in the first place, and four is what rpm caps the thread limit to on 32bit architectures to try avoid exceeding the ~2GB worth of usable address space, so no wonder trying to set it to 4 again makes no difference. The failing builder has a mere 2.5GB of physical memory so I guess it could be hitting that, or it could be the difference between address space for 32bit process under 64bit kernel (in case of i686 builders) vs 32bit kernel.

Anyway, if it fails with 4, then just set the limit lower. I'd start with two and go down to one if all else fails. Note that you don't want to change %_smp_ncpus_max but %_smp_build_nthreads instead (this is a new tunable in 4.15), this way other parts of the build will still be parallel, it's just the number of threads that will be affected.

Comment 27 Sandro Mani 2019-09-30 08:47:07 UTC
Thanks, I'll give it a shot.

Comment 28 Kevin Fenzi 2019-09-30 11:58:02 UTC
Do note that the armv7 fedora builders have actually 24GB memory. They are vm's using the lpae kernel...

Comment 29 Panu Matilainen 2019-09-30 12:24:41 UTC
Oh, off by 0 :) Thanks for correcting, it's important since it means it's clearly not shortage of memory but address space again. Makes sense as Linux doesn't easily -ENOMEM otherwise.

If we're hitting address space limits with just four threads, then I'd say the compression code is using excessive amounts of memory. The existing zstd code makes no attempt to control the memory use, it probably should.

Comment 30 Panu Matilainen 2019-09-30 13:19:07 UTC
Annoyingly all the interesting size-estimation functionality in zstd are in "experimental, static linking only" section of API which means that rpm cannot rely on using them. Which I guess means we'll just have to guess.

Comment 31 Sandro Mani 2019-09-30 14:11:50 UTC
%global _smp_build_nthreads 2 did the trick, thanks!


Note You need to log in before you can comment on or make changes to this bug.