Bug 1318223 - dnf -y upgrade fails without error message
Summary: dnf -y upgrade fails without error message
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-16 10:10 UTC by Marius Vollmer
Modified: 2016-05-20 11:50 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-20 11:50:08 UTC
Type: Bug


Attachments (Terms of Use)

Description Marius Vollmer 2016-03-16 10:10:33 UTC
Description of problem:

During our automated image creation, "dnf -y upgrade" sometimes fails after the "cleanup" phase without any apparent error message.

Using strace shows a write() that fails with EPIPE towards the end.
 
Version-Release number of selected component (if applicable):
dnf-1.1.7-2.fc23.noarch

How reproducible:
Only in a very specific setting, hard to reproduce elsewhere

Steps to Reproduce:
1. git clone https://github.com/mvollmer/cockpit.git
2. cd cockpit
3. git checkout 5bf3073f489e4b9438309e0dd2d6fa0da2f3e86d
4. cd test
5. sudo ./vm-prep
3. ./vm-create fedora-23 -v

Actual results:
dnf -y upgrade runs and then ends like this:

  Cleanup     : libnl3-3.2.27-0.1.fc23.x86_64                           268/275 
  Cleanup     : libpng-2:1.6.17-2.fc23.x86_64                           269/275 
  Cleanup     : bash-4.3.42-1.fc23.x86_64                               270/275 
  Cleanup     : glibc-common-2.22-3.fc23.x86_64                         271/275 
  Cleanup     : nss-softokn-freebl-3.20.0-1.0.fc23.x86_64               272/275 
  Cleanup     : glibc-2.22-3.fc23.x86_64                                273/275 
  Cleanup     : tzdata-2015g-1.fc23.noarch                              274/275 
  Cleanup     : libgcc-5.1.1-4.fc23.x86_64                              275/275 
1039 blocks
[1458118831.96] EVENT: Domain 'fedora-23-iqwr' (ID 1075) Shutdown Finished
vm-create: setup failed with code 1


Expected results:
dnf enters the "verify" phase and completes successfully

Additional info:
I have captured a strace log: https://fedorapeople.org/~mvo/dnf.log (739 MB) 

It has a EPIPE in it, but I have no idea whether that causes the failure.  Some lines from the log:

1005  pipe([50, 51])                    = 0
1005  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd8231cf9d0) = 1832
...
1005  write(51, "hare/man/cy/man1\n/usr/share/man/"..., 4096 <unfinished ...>
...
1832  exit_group(0)                     = ?
1005  <... write resumed> )             = -1 EPIPE (Broken pipe)
...
1005  exit_group(1)                     = ?
1005  +++ exited with 1 +++

Comment 1 Stef Walter 2016-03-16 12:17:12 UTC
I've seen this several times in a row in a test VM.

Comment 2 Marius Vollmer 2016-03-16 12:40:07 UTC
It looks like dnf runs "mandb -q" (via sh) and writes a lot of stuff into its stdin, but mandb -q doesn't read stdin.

Comment 3 Honza Silhan 2016-03-29 14:36:36 UTC
Thanks for the report. It could happen because of new initializing of rpmdb cache through libsolv unofficial way. This should fix it: https://github.com/openSUSE/libsolv/pull/123

Comment 4 Marius Vollmer 2016-03-30 07:21:04 UTC
(In reply to Jan Silhan from comment #3)
> Thanks for the report. It could happen because of new initializing of rpmdb
> cache through libsolv unofficial way. This should fix it:
> https://github.com/openSUSE/libsolv/pull/123

Note that EPIPE happens with "mandb", not "rpmdb".

Comment 5 Michael Schröder 2016-05-20 10:55:29 UTC
This is not a bug in libsolv or dnf, but a bug in the man-db package. It uses the new filetrigger mechanism to update the mandb if a man page is installed. But the filtrigger scriptlet is supposed to read the trigger files from stdin, which it currently doesn't do. The resulting SIGPIPE terminates the complete software stack, as the transaction is done by the man dnf process.

So, two things IMHO need to be done:
1) the man-db package needs to read (and discard) the file list from stdin (i.e. cat > /dev/null)
2) rpm should also be changed so that it does not die when a scriptlet misbehaves.

Comment 6 Michael Schröder 2016-05-20 11:45:47 UTC
This is already fixed in newer rpm versions, but the installed rpm-4.13.0-0.rc1.3.fc23 package does not have the fix.

Comment 7 Ľuboš Kardoš 2016-05-20 11:50:08 UTC
Yes, this is fixed since 4.13.0-0.rc1.6.fc23.


Note You need to log in before you can comment on or make changes to this bug.