Bug 1729382

Summary:

"error: create archive failed: cpio: write failed - Cannot allocate memory" when writing rpms

Product:

[Fedora] Fedora

Reporter:

Dan Horák <dan>

Component:

rpm

Assignee:

Packaging Maintenance Team <packaging-team-maint>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

high

Version:

rawhide

CC:

dj, fedora, fweimer, igor.raits, kevin, lslebodn, manisandro, mike, mjw, packaging-team-maint, pmatilai, pmoravco, rjones, vmukhame, yaneti

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

i686

OS:

Unspecified

Whiteboard:

Fixed In Version:

rpm-4.15.0-0.rc1.1.fc31

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-09-10 01:20:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
root.log	none
build.log	none

Description Dan Horák 2019-07-12 07:25:39 UTC

Description of problem:
When building new collectd rpms in Rawhide I got the following error failing the build. Although per mock's log the resulting rpms were written out.

...
Provides: collectd-zookeeper-debuginfo = 5.9.0-1.fc31 collectd-zookeeper-debuginfo(x86-32) = 5.9.0-1.fc31 debuginfo(build-id) = 8537c07f73788f2dcd14367171c0eec9efc808d2
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Recommends: collectd-debugsource(x86-32) = 5.9.0-1.fc31
Checking for unpackaged file(s): /usr/lib/rpm/check-files /builddir/build/BUILDROOT/collectd-5.9.0-1.fc31.i386
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
BUILDSTDERR: error: create archive failed: cpio: write failed - Cannot allocate memory
Wrote: /builddir/build/RPMS/collectd-drbd-5.9.0-1.fc31.i686.rpm
Wrote: /builddir/build/RPMS/collectd-gps-5.9.0-1.fc31.i686.rpm
Wrote: /builddir/build/RPMS/collectd-notify_email-5.9.0-1.fc31.i686.rpm
...

It happened 3 times in a row, twice on a (different) VMs, once on build-hw5 (with 128GB memory).

https://koji.fedoraproject.org/koji/taskinfo?taskID=36190075
https://koji.fedoraproject.org/koji/taskinfo?taskID=36190328
https://koji.fedoraproject.org/koji/taskinfo?taskID=36190805


Version-Release number of selected component (if applicable):
rpm-4.15.0-0.beta.1.fc31.i686

How reproducible:
was 100% yesterday, but no such problem today or the day before yesterday

Comment 1 Dan Horák 2019-07-12 07:26:11 UTC

Created attachment 1589781 [details]
root.log

Comment 2 Dan Horák 2019-07-12 07:26:46 UTC

Created attachment 1589782 [details]
build.log

Comment 3 Remi Collet 2019-07-16 13:25:37 UTC

Same with https://koji.fedoraproject.org/koji/taskinfo?taskID=36285998

Comment 4 DJ Delorie 2019-07-19 17:04:15 UTC

Seen yesterday on glibc builds...

These builds failed:
https://koji.fedoraproject.org/koji/taskinfo?taskID=36333585 on buildhw-04.phx2.fedoraproject.org
https://koji.fedoraproject.org/koji/taskinfo?taskID=36329825 on buildhw-02.phx2.fedoraproject.org

These builds worked:
https://koji.fedoraproject.org/koji/taskinfo?taskID=36333043 on buildvm-30.phx2.fedoraproject.org
https://koji.fedoraproject.org/koji/taskinfo?taskID=36335876 on buildvm-19.phx2.fedoraproject.org

Comment 5 Dan Horák 2019-07-19 17:18:24 UTC

I wonder if it could be some memory management issue, too much fragmentation or such, for the 32-bit user space (in chroot) on x86_64 machine. Or maybe the error message from cpio is just bogus ... Also interesting might be to know, if the written rpms are correct or corrupted.

Comment 6 DJ Delorie 2019-07-19 17:23:08 UTC

I know it's a tiny sample size, but I note that the failed glibc builds were all on "hw" hosts, and the working ones were all on "vm" hosts... maybe running a vm on a 64-bit host gives you more usable address space?

Comment 7 Dan Horák 2019-07-19 17:42:22 UTC

DJ, you seem to be right, somehow I got confused with my builds, they were all on hw, same with Remi. I think that's a useful information.

Comment 8 Florian Weimer 2019-07-23 11:31:03 UTC

This is likely caused by the auto-scaling in OpenMP:

rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
{
    rpmRC rc = RPMRC_OK;
    Package pkg;

    /* Run binary creation in parallel */
    #pragma omp parallel
    #pragma omp single
    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
        #pragma omp task
        {
        pkg->rc = packageBinary(spec, pkg, cookie, cheating, &pkg->filename);

…

OpenMP cannot take into account the resource usage of each individual thread.  With 48 processors on the builders, there are less then 83 MiB per thread under the most ideal circumstances for a 32-bit process.  That might not be enough, depending on how much RPM tries do in-memory, in a non-streaming fashion.

Comment 9 Panu Matilainen 2019-07-29 09:50:42 UTC

Right, compression can consume significant amount of memory, and when you have a lots of cpus it's quite easy to eat up all the memory. There's already an RFE to take the amount of memory into consideration for parallelisation settings (see bug 1118734), probably need to do something about that now.

FWIW, all parallel activity (including OpenMP) in rpm honors the $RPM_BUILD_NCPUS environment variable and %_smp_ncpus_max macro, so as a workaround for packages commonly hitting this you can do something like this in the spec:

%global _smp_ncpus_max 16

Comment 10 Panu Matilainen 2019-07-29 10:02:41 UTC

Just remembered the XZ compression code has special logic for handling 32bit system memory use for parallel compression, but that doesn't trigger here because we're not parallel compressing a single stream but multiple independent streams. There's also a macro for limiting XZ memory use, but only during decompression. Argh.

Comment 11 Florian Weimer 2019-07-29 10:27:29 UTC

Hasn't rawhide switched to zstd, so different code paths are used?

For XZ, the command line tool prints the amount of memory used for compressing.  Can this information be used to limit the number of processors to something like 3 GiB divided by the amount of memory required for compression?

I don't know if zstd even allows one to compute the memory requires beforehand, based on the compression flags.

Comment 12 Richard W.M. Jones 2019-08-02 13:13:58 UTC

Here's one which happened in Fedora Rawhide just minutes ago (on i686):

https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432

Weirdly an earlier build of the exact same package against what must be
a very similar buildroot (from perhaps 30 mins before) on the same architecture
was successful.  Is it possible this bug is non-deterministic?

Comment 13 Florian Weimer 2019-08-02 13:16:48 UTC

(In reply to Richard W.M. Jones from comment #12)
> Here's one which happened in Fedora Rawhide just minutes ago (on i686):
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432
> 
> Weirdly an earlier build of the exact same package against what must be
> a very similar buildroot (from perhaps 30 mins before) on the same
> architecture
> was successful.  Is it possible this bug is non-deterministic?

It depends on then number of CPUs on the builder.  The VM builders have fewer CPUs and aren't affected.

Comment 14 Michael Cronenworth 2019-08-04 15:46:47 UTC

I'm seeing this issue with wine more frequently. I got it twice in a row with the latest update attempt. Wine is not that large of a package. The largest package is about 68MB.

Build 2: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791708
Parent: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791693

Comment 15 Dan Horák 2019-08-05 07:26:04 UTC

The problem isn't the size of the rpms, but the number of sub-packages/rpms it produces, which is quite high for wine.

Comment 16 Panu Matilainen 2019-08-05 11:50:01 UTC

FWIW there's now an upstream ticket on this for release tracking: https://github.com/rpm-software-management/rpm/issues/804

Comment 17 Ben Cotton 2019-08-13 17:04:31 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 18 Ben Cotton 2019-08-13 17:54:44 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 19 Panu Matilainen 2019-08-23 11:29:16 UTC

This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the solution in rawhide a bit before -rc1 release.

Comment 20 Lukas Slebodnik 2019-08-26 18:42:10 UTC

(In reply to Panu Matilainen from comment #19)
> This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the
> solution in rawhide a bit before -rc1 release.

Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ?
It caused problems in samba
https://koji.fedoraproject.org/koji/taskinfo?taskID=37291661

Comment 21 Panu Matilainen 2019-08-27 05:50:31 UTC

> Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ?

Nope, .5 is just test-driving the change. F31 will go straight to 4.15.0-rc (obviously with the same fix) once it's released, hopefully today.

Comment 22 Fedora Update System 2019-08-28 11:28:44 UTC

FEDORA-2019-e4b6ffd824 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824

Comment 23 Fedora Update System 2019-08-29 21:01:45 UTC

rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824

Comment 24 Fedora Update System 2019-09-10 01:20:55 UTC

rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Sandro Mani 2019-09-29 10:58:16 UTC

I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on F32 [1]. The failure apparently happens when packing the debuginfo subpackage, which consist of some 50 *.debug files totaling approx 700MB. I've tried with %global _smp_ncpus_max 4 but it makes no difference.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932

Comment 26 Panu Matilainen 2019-09-30 07:57:35 UTC

(In reply to Sandro Mani from comment #25)
> I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on
> F32 [1]. The failure apparently happens when packing the debuginfo
> subpackage, which consist of some 50 *.debug files totaling approx 700MB.
> I've tried with %global _smp_ncpus_max 4 but it makes no difference.
> 
> [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932

Four is all the CPUs that builder has in the first place, and four is what rpm caps the thread limit to on 32bit architectures to try avoid exceeding the ~2GB worth of usable address space, so no wonder trying to set it to 4 again makes no difference. The failing builder has a mere 2.5GB of physical memory so I guess it could be hitting that, or it could be the difference between address space for 32bit process under 64bit kernel (in case of i686 builders) vs 32bit kernel.

Anyway, if it fails with 4, then just set the limit lower. I'd start with two and go down to one if all else fails. Note that you don't want to change %_smp_ncpus_max but %_smp_build_nthreads instead (this is a new tunable in 4.15), this way other parts of the build will still be parallel, it's just the number of threads that will be affected.

Comment 27 Sandro Mani 2019-09-30 08:47:07 UTC

Thanks, I'll give it a shot.

Comment 28 Kevin Fenzi 2019-09-30 11:58:02 UTC

Do note that the armv7 fedora builders have actually 24GB memory. They are vm's using the lpae kernel...

Comment 29 Panu Matilainen 2019-09-30 12:24:41 UTC

Oh, off by 0 :) Thanks for correcting, it's important since it means it's clearly not shortage of memory but address space again. Makes sense as Linux doesn't easily -ENOMEM otherwise.

If we're hitting address space limits with just four threads, then I'd say the compression code is using excessive amounts of memory. The existing zstd code makes no attempt to control the memory use, it probably should.

Comment 30 Panu Matilainen 2019-09-30 13:19:07 UTC

Annoyingly all the interesting size-estimation functionality in zstd are in "experimental, static linking only" section of API which means that rpm cannot rely on using them. Which I guess means we'll just have to guess.

Comment 31 Sandro Mani 2019-09-30 14:11:50 UTC

%global _smp_build_nthreads 2 did the trick, thanks!

Comment 32 Sandro Mani 2020-04-14 22:40:32 UTC

So I'm back with this, again with mingw-python-qt5: first %global _smp_build_nthreads 2 worked, some releases later I had to reduce to %global _smp_build_nthreads 1, and now even with that I end up with "create archive failed: cpio: write failed - Cannot allocate memory" while extracting debug info. Any other ideas?

Build is https://koji.fedoraproject.org/koji/buildinfo?buildID=1492224

Comment 33 Panu Matilainen 2020-04-15 11:13:39 UTC

I suppose you could try toning down the compression level. The Fedora default of w19.zstdio is a rather extreme one and consumes gobs of memory and processing power for the last few % gains. Something like:

%global _binary_payload w13.zstdio

...can make several minutes worth of difference in extreme cases. If that still fails, lower it further down (zstd upstream default is actually just 3).

At _smp_build_nthreads 1, zstd should be able to assess the available memory reasonably but I suspect it still assumes it has all of the available memory to itself, which is not the case when used from rpm - with a large package like this there's considerable amount of "rpm stuff" in the memory too.

Comment 34 Sandro Mani 2020-04-15 14:05:00 UTC

That worked, thanks!