Bug 1729382
Summary: | "error: create archive failed: cpio: write failed - Cannot allocate memory" when writing rpms | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dan Horák <dan> | ||||||
Component: | rpm | Assignee: | Packaging Maintenance Team <packaging-team-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rawhide | CC: | dj, fedora, fweimer, igor.raits, kevin, lslebodn, manisandro, mike, mjw, packaging-team-maint, pmatilai, pmoravco, rjones, vmukhame, yaneti | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | rpm-4.15.0-0.rc1.1.fc31 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-09-10 01:20:55 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Dan Horák
2019-07-12 07:25:39 UTC
Created attachment 1589781 [details]
root.log
Created attachment 1589782 [details]
build.log
Seen yesterday on glibc builds... These builds failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=36333585 on buildhw-04.phx2.fedoraproject.org https://koji.fedoraproject.org/koji/taskinfo?taskID=36329825 on buildhw-02.phx2.fedoraproject.org These builds worked: https://koji.fedoraproject.org/koji/taskinfo?taskID=36333043 on buildvm-30.phx2.fedoraproject.org https://koji.fedoraproject.org/koji/taskinfo?taskID=36335876 on buildvm-19.phx2.fedoraproject.org I wonder if it could be some memory management issue, too much fragmentation or such, for the 32-bit user space (in chroot) on x86_64 machine. Or maybe the error message from cpio is just bogus ... Also interesting might be to know, if the written rpms are correct or corrupted. I know it's a tiny sample size, but I note that the failed glibc builds were all on "hw" hosts, and the working ones were all on "vm" hosts... maybe running a vm on a 64-bit host gives you more usable address space? DJ, you seem to be right, somehow I got confused with my builds, they were all on hw, same with Remi. I think that's a useful information. This is likely caused by the auto-scaling in OpenMP: rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating) { rpmRC rc = RPMRC_OK; Package pkg; /* Run binary creation in parallel */ #pragma omp parallel #pragma omp single for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) { #pragma omp task { pkg->rc = packageBinary(spec, pkg, cookie, cheating, &pkg->filename); … OpenMP cannot take into account the resource usage of each individual thread. With 48 processors on the builders, there are less then 83 MiB per thread under the most ideal circumstances for a 32-bit process. That might not be enough, depending on how much RPM tries do in-memory, in a non-streaming fashion. Right, compression can consume significant amount of memory, and when you have a lots of cpus it's quite easy to eat up all the memory. There's already an RFE to take the amount of memory into consideration for parallelisation settings (see bug 1118734), probably need to do something about that now. FWIW, all parallel activity (including OpenMP) in rpm honors the $RPM_BUILD_NCPUS environment variable and %_smp_ncpus_max macro, so as a workaround for packages commonly hitting this you can do something like this in the spec: %global _smp_ncpus_max 16 Just remembered the XZ compression code has special logic for handling 32bit system memory use for parallel compression, but that doesn't trigger here because we're not parallel compressing a single stream but multiple independent streams. There's also a macro for limiting XZ memory use, but only during decompression. Argh. Hasn't rawhide switched to zstd, so different code paths are used? For XZ, the command line tool prints the amount of memory used for compressing. Can this information be used to limit the number of processors to something like 3 GiB divided by the amount of memory required for compression? I don't know if zstd even allows one to compute the memory requires beforehand, based on the compression flags. Here's one which happened in Fedora Rawhide just minutes ago (on i686): https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432 Weirdly an earlier build of the exact same package against what must be a very similar buildroot (from perhaps 30 mins before) on the same architecture was successful. Is it possible this bug is non-deterministic? (In reply to Richard W.M. Jones from comment #12) > Here's one which happened in Fedora Rawhide just minutes ago (on i686): > > https://koji.fedoraproject.org/koji/taskinfo?taskID=36761432 > > Weirdly an earlier build of the exact same package against what must be > a very similar buildroot (from perhaps 30 mins before) on the same > architecture > was successful. Is it possible this bug is non-deterministic? It depends on then number of CPUs on the builder. The VM builders have fewer CPUs and aren't affected. I'm seeing this issue with wine more frequently. I got it twice in a row with the latest update attempt. Wine is not that large of a package. The largest package is about 68MB. Build 2: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791708 Parent: https://koji.fedoraproject.org/koji/taskinfo?taskID=36791693 The problem isn't the size of the rpms, but the number of sub-packages/rpms it produces, which is quite high for wine. FWIW there's now an upstream ticket on this for release tracking: https://github.com/rpm-software-management/rpm/issues/804 This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'. This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31. This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the solution in rawhide a bit before -rc1 release. (In reply to Panu Matilainen from comment #19) > This should be fixed in rpm >= 4.15.0-0.beta.4, just test-driving the > solution in rawhide a bit before -rc1 release. Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ? It caused problems in samba https://koji.fedoraproject.org/koji/taskinfo?taskID=37291661 > Do you plan to release rpm-4.15.0-0.beta.5 also in f31 ?
Nope, .5 is just test-driving the change. F31 will go straight to 4.15.0-rc (obviously with the same fix) once it's released, hopefully today.
FEDORA-2019-e4b6ffd824 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824 rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e4b6ffd824 rpm-4.15.0-0.rc1.1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report. I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on F32 [1]. The failure apparently happens when packing the debuginfo subpackage, which consist of some 50 *.debug files totaling approx 700MB. I've tried with %global _smp_ncpus_max 4 but it makes no difference. [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932 (In reply to Sandro Mani from comment #25) > I'm hitting this regularly on armv7 when trying to build mingw-python-qt5 on > F32 [1]. The failure apparently happens when packing the debuginfo > subpackage, which consist of some 50 *.debug files totaling approx 700MB. > I've tried with %global _smp_ncpus_max 4 but it makes no difference. > > [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1391932 Four is all the CPUs that builder has in the first place, and four is what rpm caps the thread limit to on 32bit architectures to try avoid exceeding the ~2GB worth of usable address space, so no wonder trying to set it to 4 again makes no difference. The failing builder has a mere 2.5GB of physical memory so I guess it could be hitting that, or it could be the difference between address space for 32bit process under 64bit kernel (in case of i686 builders) vs 32bit kernel. Anyway, if it fails with 4, then just set the limit lower. I'd start with two and go down to one if all else fails. Note that you don't want to change %_smp_ncpus_max but %_smp_build_nthreads instead (this is a new tunable in 4.15), this way other parts of the build will still be parallel, it's just the number of threads that will be affected. Thanks, I'll give it a shot. Do note that the armv7 fedora builders have actually 24GB memory. They are vm's using the lpae kernel... Oh, off by 0 :) Thanks for correcting, it's important since it means it's clearly not shortage of memory but address space again. Makes sense as Linux doesn't easily -ENOMEM otherwise. If we're hitting address space limits with just four threads, then I'd say the compression code is using excessive amounts of memory. The existing zstd code makes no attempt to control the memory use, it probably should. Annoyingly all the interesting size-estimation functionality in zstd are in "experimental, static linking only" section of API which means that rpm cannot rely on using them. Which I guess means we'll just have to guess. %global _smp_build_nthreads 2 did the trick, thanks! So I'm back with this, again with mingw-python-qt5: first %global _smp_build_nthreads 2 worked, some releases later I had to reduce to %global _smp_build_nthreads 1, and now even with that I end up with "create archive failed: cpio: write failed - Cannot allocate memory" while extracting debug info. Any other ideas? Build is https://koji.fedoraproject.org/koji/buildinfo?buildID=1492224 I suppose you could try toning down the compression level. The Fedora default of w19.zstdio is a rather extreme one and consumes gobs of memory and processing power for the last few % gains. Something like: %global _binary_payload w13.zstdio ...can make several minutes worth of difference in extreme cases. If that still fails, lower it further down (zstd upstream default is actually just 3). At _smp_build_nthreads 1, zstd should be able to assess the available memory reasonably but I suspect it still assumes it has all of the available memory to itself, which is not the case when used from rpm - with a large package like this there's considerable amount of "rpm stuff" in the memory too. That worked, thanks! |