Red Hat Bugzilla – Bug 1259874
SIGSGEV during rpm transaction
Last modified: 2016-10-12 05:40:48 EDT
Description of problem:
lorax crashes with SIGSEGV while attempting to run the rpm transaction. The crash occurs in fpLookupSubdir, at the line
if (fp->subDirId == 0)
The call to rpmts_Run comes from dnf, from lorax's call to dnf.base.Base.do_transaction.
This is blocking rawhide composes.
Here is the core file: https://dshea.fedorapeople.org/lorax-crash.core.xz
Version-Release number of selected component (if applicable):
Steps to Reproduce:
lorax -p Fedora -v 24 -r 24 -s https://kojipkgs.fedoraproject.org/compose/rawhide/latest-Fedora-/compose/Everything/x86_64/os --logfile=lorax-build.log f24-lorax
Hmmm, can you please double check the version of rpm you use. rpm-4.12.90-7.fc24.x86_64 is in rawhide for weeks while I actually updated rpm on Wednesday to rpm-4.13.0-0.rc1.1.
Upgraded to rpm-4.13.0-0.rc1.1.fc24.x86_64. I had run a dnf upgrade before all of this so I'm not sure what was holding it back the first time.
Same crash, new core file: https://dshea.fedorapeople.org/lorax-crash-2.core.xz
Couldn't really pinpoint the problem yet. But if I use a --cachedir the error goes away (just to reveal others). This is really annoying (as I have to download all the packages over and over again during testing) and hints that there might be something outside of rpm broken
I can reproduce this on a f22 host with:
master branch of lorax
It happens if I use the kojipkgs repo above, but does not happen if I use http://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/x86_64/os/
So this may be a problem with how the rpms or repo are created.
Ok, sorry for being slow, this bug 1208296, switch to http and it works.
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.
More information and reason for this action is here:
Looking at bug 1208296... multiple processes/threads, curl which like rpm also uses NSS which is notoriously picky over forks and the like, and switching from https to http "fixing" it.
Life's too short to debug these things when there's a working solution (by removing the unnecessary multiprocessing in lorax) deployed for more than a year now.