Hide Forgot
Description of problem: During our automated image creation, "dnf -y upgrade" sometimes fails after the "cleanup" phase without any apparent error message. Using strace shows a write() that fails with EPIPE towards the end. Version-Release number of selected component (if applicable): dnf-1.1.7-2.fc23.noarch How reproducible: Only in a very specific setting, hard to reproduce elsewhere Steps to Reproduce: 1. git clone https://github.com/mvollmer/cockpit.git 2. cd cockpit 3. git checkout 5bf3073f489e4b9438309e0dd2d6fa0da2f3e86d 4. cd test 5. sudo ./vm-prep 3. ./vm-create fedora-23 -v Actual results: dnf -y upgrade runs and then ends like this: Cleanup : libnl3-3.2.27-0.1.fc23.x86_64 268/275 Cleanup : libpng-2:1.6.17-2.fc23.x86_64 269/275 Cleanup : bash-4.3.42-1.fc23.x86_64 270/275 Cleanup : glibc-common-2.22-3.fc23.x86_64 271/275 Cleanup : nss-softokn-freebl-3.20.0-1.0.fc23.x86_64 272/275 Cleanup : glibc-2.22-3.fc23.x86_64 273/275 Cleanup : tzdata-2015g-1.fc23.noarch 274/275 Cleanup : libgcc-5.1.1-4.fc23.x86_64 275/275 1039 blocks [1458118831.96] EVENT: Domain 'fedora-23-iqwr' (ID 1075) Shutdown Finished vm-create: setup failed with code 1 Expected results: dnf enters the "verify" phase and completes successfully Additional info: I have captured a strace log: https://fedorapeople.org/~mvo/dnf.log (739 MB) It has a EPIPE in it, but I have no idea whether that causes the failure. Some lines from the log: 1005 pipe([50, 51]) = 0 1005 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd8231cf9d0) = 1832 ... 1005 write(51, "hare/man/cy/man1\n/usr/share/man/"..., 4096 <unfinished ...> ... 1832 exit_group(0) = ? 1005 <... write resumed> ) = -1 EPIPE (Broken pipe) ... 1005 exit_group(1) = ? 1005 +++ exited with 1 +++
I've seen this several times in a row in a test VM.
It looks like dnf runs "mandb -q" (via sh) and writes a lot of stuff into its stdin, but mandb -q doesn't read stdin.
Thanks for the report. It could happen because of new initializing of rpmdb cache through libsolv unofficial way. This should fix it: https://github.com/openSUSE/libsolv/pull/123
(In reply to Jan Silhan from comment #3) > Thanks for the report. It could happen because of new initializing of rpmdb > cache through libsolv unofficial way. This should fix it: > https://github.com/openSUSE/libsolv/pull/123 Note that EPIPE happens with "mandb", not "rpmdb".
This is not a bug in libsolv or dnf, but a bug in the man-db package. It uses the new filetrigger mechanism to update the mandb if a man page is installed. But the filtrigger scriptlet is supposed to read the trigger files from stdin, which it currently doesn't do. The resulting SIGPIPE terminates the complete software stack, as the transaction is done by the man dnf process. So, two things IMHO need to be done: 1) the man-db package needs to read (and discard) the file list from stdin (i.e. cat > /dev/null) 2) rpm should also be changed so that it does not die when a scriptlet misbehaves.
This is already fixed in newer rpm versions, but the installed rpm-4.13.0-0.rc1.3.fc23 package does not have the fix.
Yes, this is fixed since 4.13.0-0.rc1.6.fc23.