1514608 – Package installation fails with "cpio: open" error

Bug 1514608 - Package installation fails with "cpio: open" error

Summary: Package installation fails with "cpio: open" error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	26
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Packaging Maintenance Team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-17 21:21 UTC by Ralf Ertzinger
Modified:	2018-02-06 15:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	rpm-4.14.1-1.fc27
Clone Of:
Environment:
Last Closed:	2018-02-06 15:33:03 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace -f -ff -s 200 -o /tmp/rpm.strace rpm -ihv /var/cache/dnf/updates-2854b3113b7a3c6c/packages/sip-4.19.3-4.fc26.x86_64.rpm (8.09 MB, text/plain) 2017-11-17 21:21 UTC, Ralf Ertzinger	no flags	Details
Requested rpm -v outputs (15.67 KB, text/plain) 2017-11-20 18:33 UTC, Ralf Ertzinger	no flags	Details
strace rpm (5.81 MB, text/plain) 2017-11-21 12:44 UTC, Andrew	no flags	Details
log of dnf downgrade rpm (6.43 KB, text/plain) 2017-11-21 13:04 UTC, Andrew	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1534975	1	None	None	None	2021-01-20 06:05:38 UTC

Internal Links: 1534975

Description Ralf Ertzinger 2017-11-17 21:21:37 UTC

Created attachment 1354364 [details]
strace -f -ff -s 200 -o /tmp/rpm.strace rpm -ihv /var/cache/dnf/updates-2854b3113b7a3c6c/packages/sip-4.19.3-4.fc26.x86_64.rpm

Issue first noticed with dnf, but is reproducible using plain rpm.


$ rpm -q sip
package sip is not installed
$ sudo rpm -ihv /var/cache/dnf/updates-2854b3113b7a3c6c/packages/sip-4.19.3-4.fc26.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:sip-4.19.3-4.fc26                ################################# [100%]
error: unpacking of archive failed on file /usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194: cpio: open
error: sip-4.19.3-4.fc26.x86_64: install failed

This happens with multiple packages. For each given package the file that fails to install is always the same. Other packages install fine.

- The file system is not full
- Apart from /boot there are no other subfilesystems
- fsck on the file system shows no error.

Attached is an strace of an rpm installation attempt of the above package.

Comment 1 Ralf Ertzinger 2017-11-17 21:23:13 UTC

Relevant installed packages:

rpm-libs-4.13.0.2-1.fc26.x86_64
rpm-4.13.0.2-1.fc26.x86_64
libdb-5.3.28-24.fc26.x86_64

Comment 2 Ralf Ertzinger 2017-11-19 18:27:40 UTC

Manually extracting the very same rpm files with

rpm2cpio | cpio -id

on the same file system works fine.

Comment 3 Panu Matilainen 2017-11-20 10:05:19 UTC

What does 'rpm -Kv <packages>' say about packages failing that way?

Comment 4 Panu Matilainen 2017-11-20 10:09:42 UTC

Also the output of 'rpm -Uvv <package>' might provide further clues.

Comment 5 Andrew 2017-11-20 13:45:08 UTC

Found same for three (random?) packages while usual `dnf update`:
  qt5-qtwebengine, kernel-devel, perf

Example:

  # rpm -Uhv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm
  Preparing...                          ################################# [100%]
  Updating / installing...
     1:qt5-qtwebengine-5.9.2-2.fc26     ################################# [ 50%]
  error: unpacking of archive failed on file /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12c8f4: cpio: open
  error: qt5-qtwebengine-5.9.2-2.fc26.x86_64: install failed
  error: qt5-qtwebengine-5.9.1-5.fc26.x86_64: erase skipped

rpm2cpio unpack is fine, but have this (packed hardlinks?):

  ./usr/bin/qwebengine_convert_dict
  cpio: ./usr/lib64/qt5/bin/qwebengine_convert_dict linked to ./usr/bin/qwebengine_convert_dict
  ./usr/lib64/qt5/bin/qwebengine_convert_dict

  # ll -i ./usr/lib64/qt5/bin/qwebengine_convert_dict ./usr/bin/qwebengine_convert_dict
  143522 -rwxr-xr-x. 2 root root 250872 Nov 20 14:21 ./usr/bin/qwebengine_convert_dict
  143522 -rwxr-xr-x. 2 root root 250872 Nov 20 14:21 ./usr/lib64/qt5/bin/qwebengine_convert_dict

As requested:

  # rpm -Kv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm
  ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm:
      Header V3 RSA/SHA256 Signature, key ID 64dab85d: OK
      Header SHA1 digest: OK (9bf0ceebe0787d9dd8fdd0820001cf9aae439445)
      V3 RSA/SHA256 Signature, key ID 64dab85d: OK
      MD5 digest: OK (299b44fddf3b3cfdc2a1780476d8e37a)

  # rpm -Uvv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm
  ...
  D: lsetfilecon: (/usr/share/qt5/translations/qtwebengine_locales/zh-CN.pak;5a12ccbd, system_u:object_r:usr_t:s0) 
  D: create     100644  1 (   0,   0)230581 /usr/share/qt5/translations/qtwebengine_locales/zh-TW.pak;5a12ccbd
  ufdio:       8 writes,   230581 total bytes in 0.000121 secs
  D: Plugin: calling hook fsm_file_prepare in selinux plugin
  D: lsetfilecon: (/usr/share/qt5/translations/qtwebengine_locales/zh-TW.pak;5a12ccbd, system_u:object_r:usr_t:s0) 
  D: create     100755  2 (   0,   0)     0 /usr/bin/qwebengine_convert_dict;5a12ccbd
  D: create     100755  2 (   0,   0)250872 /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12ccbd
  fdio:    4731 reads, 136326444 total bytes in 3.352210 secs
  error: unpacking of archive failed on file /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12ccbd: cpio: open
  ufdio:       6 reads,    47528 total bytes in 0.000015 secs
  error: qt5-qtwebengine-5.9.2-2.fc26.x86_64: install failed
  D: ========== +++ qt5-qtwebengine-5.9.1-5.fc26 x86_64-linux 0x2
  error: qt5-qtwebengine-5.9.1-5.fc26.x86_64: erase skipped
  D: running post-transaction scripts
  ...

There is a bunch of similar open issues with cpio rename:

https://bugzilla.redhat.com/show_bug.cgi?id=1500984
https://bugzilla.redhat.com/show_bug.cgi?id=1397543
https://bugzilla.redhat.com/show_bug.cgi?id=1493741
https://bugzilla.redhat.com/show_bug.cgi?id=1409904

And one for digest:
https://bugzilla.redhat.com/show_bug.cgi?id=1174137

Comment 6 Ralf Ertzinger 2017-11-20 18:33:12 UTC

Created attachment 1356045 [details]
Requested  rpm -v outputs

rpm -Kv and rpm -Uvv outputs, as requested.

Comment 7 Ralf Ertzinger 2017-11-20 18:38:02 UTC

I concur that this probably has something to do with hardlinks, in my sip package test case the .pyo file also is a hard link:

(output of rpm2cpio | cpio -id of the rpm file):

$ find -type f -printf '%i %p\n'

967491 ./usr/lib64/python2.7/site-packages/sip.pyi
967492 ./usr/lib64/python2.7/site-packages/sip.so
967493 ./usr/lib64/python2.7/site-packages/sipconfig.py
967494 ./usr/lib64/python2.7/site-packages/sipdistutils.py
967505 ./usr/lib64/python2.7/site-packages/sipconfig.pyo
967505 ./usr/lib64/python2.7/site-packages/sipconfig.pyc
967506 ./usr/lib64/python2.7/site-packages/sipdistutils.pyo
967506 ./usr/lib64/python2.7/site-packages/sipdistutils.pyc
967498 ./usr/share/doc/sip/NEWS
967499 ./usr/share/doc/sip/README
967502 ./usr/share/licenses/sip/LICENSE
967503 ./usr/share/licenses/sip/LICENSE-GPL2
967504 ./usr/share/licenses/sip/LICENSE-GPL3


As you can see sipconfig.pyo and sipconfig.pyc share an inode.

Comment 8 Ralf Ertzinger 2017-11-20 20:06:50 UTC

Not sure if this is relevant in any way, but the underlying file system is btrfs in my case.

Comment 9 Andrew 2017-11-20 21:59:40 UTC

Mine is ext4, enough of free space.

Comment 10 Panu Matilainen 2017-11-21 09:42:46 UTC

Right, thanks for the data. 

I presume the problem goes away if you downgrade to rpm 4.13.0.1? 

The only thing that I can think of causing this is a new sanity check regarding hardlinks failing in some situations, which would manifest as an cpio open failure (a bit hysterically yes). Now if I could only figure out what exactly it is that makes it fail, I haven't been able to reproduce this so far.

Comment 11 Panu Matilainen 2017-11-21 12:15:17 UTC

Okay so here's the deal, visible in the strace in initial desscription when you know what to look for:

umask(0577)                             = 022
open("/usr/lib64/python2.7/site-packages/sipconfig.pyc;5a0f5194", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0666) = 32
fcntl(32, F_SETFD, FD_CLOEXEC)          = 0
umask(022)                              = 0577
close(32)                               = 0
lstat("/usr/lib64/python2.7/site-packages/sipconfig.pyc", 0x7ffd74b4b440) = -1 ENOENT (No such file or directory)
rename("/usr/lib64/python2.7/site-packages/sipconfig.pyc;5a0f5194", "/usr/lib64/python2.7/site-packages/sipconfig.pyc") = 0
link("/usr/lib64/python2.7/site-packages/sipconfig.pyc", "/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194") = 0
umask(0577)                             = 022
open("/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194", O_WRONLY|O_CREAT|O_APPEND, 0666) = 32
fcntl(32, F_SETFD, FD_CLOEXEC)          = 0
umask(022)                              = 0577
fstat(32, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
                           ^^^^
close(32)                               = 0
lstat("/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
                 ^^^^

That file is supposed to have 0200 permissions as it was only just created a couple of syscalls earlier under 0577 umask, so rpm thinks there's something fishy going on and bails out. So actually this is exactly what the new sanity check is *supposed* to catch: somebody/something is allegedly goofing around behind it's back. 

Simplified, this is what is supposed to happen:

# umask 577; touch f; umask 022; ln f l; ls -l f l
--w-------. 2 root root 0 Nov 21 14:05 f
--w-------. 2 root root 0 Nov 21 14:05 l

...yet somehow in this case, the file ends up created as if the first umask never occurred (or was 022 which would be the default), or something changed the files permissions in the fleeting moment between its creation and reopen. Doing so reproducably seems extremely unlikely of course, so the question remains: what the heck?

Just to get it straight, are these physical systems or some sort of virtual/container/somethingelse? Can you reproduce at will on other hosts too?

Comment 12 Ralf Ertzinger 2017-11-21 12:42:40 UTC

Mine is a physical host, and (thankfully) the only machine exhibiting this behaviour so far.

I'll see if the simple shell example above on the box in question and see if that also does funny things with umask.

Comment 13 Andrew 2017-11-21 12:44:49 UTC

Created attachment 1356669 [details]
strace rpm

my output from:

# strace -f -s 200 -o /tmp/rpm-strace-1.txt rpm -Uvh /var/cache/dnf/updates-2854b3113b7a3c6c/packages/qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm

Reproducible on two identical Fedora 26 installation, laptops, no any VM/containers (except systemd's default cgroups, etc).

Comment 14 Andrew 2017-11-21 13:04:55 UTC

Created attachment 1356672 [details]
log of dnf downgrade rpm

I can confirm that downgrading rpm 4.13.0.2-1.fc26 -> rpm-4.13.0.1-4.fc26 fixes that.

Had also a bunch of similar errors during downgrade, attaching.

Comment 15 Panu Matilainen 2017-11-21 14:33:31 UTC

Right, the strace from comment #13 shows the same symptoms, only with differently wrong permissions:

28421 open("/usr/lib64/qt5/bin/qwebengine_convert_dict;5a141f1b", O_WRONLY|O_CREAT|O_APPEND, 0666) = 32
28421 fcntl(32, F_SETFD, FD_CLOEXEC)    = 0
28421 umask(022)                        = 0577
28421 fstat(32, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
28421 close(32)                         = 0
28421 lstat("/usr/lib64/qt5/bin/qwebengine_convert_dict;5a141f1b", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0

qwebengine_convert_dict and sipconfig.pyo would of course have different permissions eventually, but at this point that shouldn't make any difference at all, both should simply be 0200 here. The initial umask value on system is different between Andrew's and Ralf's logs, but that shouldn't make any difference either since rpm always explicitly sets it as necessary, as too is visible in the logs. And initial umask doesn't make any difference in my tests either (which is how it should be).

I still haven't been able to reproduce anything like it and am at loss for immediate ideas where this such behavior could be coming from (this looks so weird that it's likely either something blindingly simple and obvious that I'm just not seeing, or something dumbfoundingly complex :)  So I guess we can just try gathering more data to hopefully find an explanation eventually. Things like kernel versions, system upgrade history... and of course anything you can think of that might be special about these systems.

If you guys can cross-test each-others failing cases (with rpm 4.13.0.2), that would tell us something about it. Also running under valgrind might bring up something. A sosreport (from 'sos' package) will collect a huge amount of data from the system that might be helpful, but I certainly understand if that feels uncomfortable so by no means required. 

If/when you downgrade to rpm 4.13.0.1, you'll want to reinstall the python sub-packages afterwards to get the pyc/pyo hardlinks to install correctly although it shouldn't matter for any functionality.

As yet another data point, rpm 4.14 (so Fedora >= 27) has the same exact new check and thus is expected to behave identically.

Comment 16 Ralf Ertzinger 2017-11-21 16:06:53 UTC

I'll see what I can find out.

One "good" thing about this is that due to the failing transactions the impacted packages are never marked as installed/updated in the RPM database, so just running an update again (with a non-broken RPM) should hopefully sort everything out.

Comment 17 Andrew 2017-11-21 18:34:34 UTC

(In reply to Panu Matilainen from comment #15)
> If you guys can cross-test each-others failing cases (with rpm 4.13.0.2),
> that would tell us something about it.

You mean running `rpm -ivh sip-4.19.3-4.fc26.x86_64.rpm` for me and the same with qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm for Ralf, correct?


(In reply to Ralf Ertzinger from comment #16)
> just running an update again (with a non-broken RPM) should hopefully
> sort everything out.

That's what I already did after rpm downgrade. At least the final update transaction in dnf seems fine (with blacklisted rpm package).

Comment 18 Ralf Ertzinger 2017-11-21 19:58:17 UTC

So.... unless I'm being incredibly dense something is very, very wrong here.


Unffected machine:
------------------

[sun ethan ~/temp/foo :) 8]$ umask
0077
[sun ethan ~/temp/foo :) 9]$ umask 0577
[sun ethan ~/temp/foo :) 10]$ touch f
[sun ethan ~/temp/foo :) 11]$ ls -l
total 1
--w-------. 1 sun users 0 Nov 21 19:43 f
[sun ethan ~/temp/foo :) 12]$ umask 0077
[sun ethan ~/temp/foo :) 13]$ ln f l
[sun ethan ~/temp/foo :) 14]$ ls -l
total 1
--w-------. 2 sun users 0 Nov 21 19:43 f
--w-------. 2 sun users 0 Nov 21 19:43 l
[sun ethan ~/temp/foo :) 15]$ uname -r
4.13.9-200.fc26.x86_64



Affected machine:
-----------------

[sun faith ~/temp/foo :) 14]$ umask
0077
[sun faith ~/temp/foo :) 15]$ umask 0577
[sun faith ~/temp/foo :) 16]$ touch f
[sun faith ~/temp/foo :) 17]$ ls -l
total 0
-rw-------. 1 sun users 0 Nov 21 20:43 f
[sun faith ~/temp/foo :) 18]$ umask 0077
[sun faith ~/temp/foo :) 19]$ ln f l
[sun faith ~/temp/foo :) 20]$ ls -l
total 0
-rw-------. 2 sun users 0 Nov 21 20:43 f
-rw-------. 2 sun users 0 Nov 21 20:43 l
[sun faith ~/temp/foo :) 21]$ uname -r
4.13.12-200.fc26.x86_64





Digging some more.... it seems my _entire_ root file system suddenly has grown some POSIX-ACLs that I'm very sure it didn't have before.

[sun faith ~ :) 28]$ cd /
[sun faith / :) 29]$ getfacl .
# file: .
# owner: root
# group: root
user::r-x
group::r-x
other::r-x
default:user::r-x
default:group::r-x
default:other::r-x


Now, in order to track this problem down I did do a full "tar up the entire /, remake the file system, tar everything back", and I did use the --acls flag in doing so, and somehow, somewhere, I might have messed things up and spread POSIX acls all over my FS.

I'm pretty sure I don't actually need those, so I'll clean that up.

Comment 19 Ralf Ertzinger 2017-11-21 20:14:38 UTC

Booted into a rescue system, did a "setfacl -b -R /mnt/sysroot".

Booted back into my normal system, and.... everything is fine. I can install packages again.

Now, if someone has an explanation for this that doesn't involve ghosts, or me sleepwalking and sprinkling ACLs all over my file systems I'm all ears.
On this particular system dumping all ACLs isn't a big issue, as I'm not using them, but I have other systems where I do, and doing a forced cleanup there is going to be very annoying. Thankfully (again) none of my other systems have exhibited this behaviour so far.

Comment 20 Panu Matilainen 2017-11-22 10:11:34 UTC

Whoa. POSIX ACL's never even crossed my mind,  but sure enough it's actually explained right there in umask(2) man page that they can override umask. Million thanks for investigating and figuring it out Ralf! 

No doubt Andrew's case is the same, only with different default ACLs.

As to what could cause such a thing... bugs in backup / other system software or admin/install scripts, package scriptlets, human error and all come to mind. Rpm doesn't support setting acl's so it should be off the hook here, but there are a few packages that do so manually from their scriptlets in Fedora, and who knows what else is in other repos etc.

I'll need to think a bit what to take home from this. 
There's already a better fix for CVE-2017-7501 upstream that eliminates the need for the check that is tripping up here, but restricted access to files during unpacking them is a security requirement from rpm's side so it'll probably need to do an explicit chmod() too. Also the error behavior here is quite terrible as it gives no hint whatsoever what is actually wrong, but then that'll go away with the revised upstream fix.

Comment 21 Andrew 2017-11-22 13:05:51 UTC

Can't verify full recover for now, but yes, I also have unknown ACLs:

# ll / | grep +
drwxr-xr-x+   3 andrew andrew  4096 May 11  2016 auto
dr-xr-xr-x+   5 root   root    4096 Nov 21 14:58 boot
drwxr-xr-x+   6 root   root    4096 May  2  2017 grub2
drwx------+   2 root   root   16384 Jul  1  2013 lost+found
drwxr-xr-x+   2 root   root    4096 Feb 10  2017 media
drwxr-xr-x+   4 root   root    4096 Feb 10  2017 mnt
drwxr-xr-x+   8 root   root    4096 Nov 20 16:52 opt
dr-xr-x---+  35 root   root    4096 Nov 19 21:13 root
drwxr-xr-x+   2 root   root    4096 Feb 10  2017 srv
drwxr-xr-x+  21 root   root    4096 Aug 10 20:02 usr

# getfacl usr
# file: usr
# owner: root
# group: root
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:other::r-x

Comment 22 Ralf Ertzinger 2017-11-22 18:31:51 UTC

I've poked around in the scripts section of my installed rpms, but so far nothing has shown up. I also haven't run any software that I'd expect to do anything remotely related to this.

Andrew: any chance you could deduct from the affected files on your system when this has started? I've pretty much wiped everything on my side yesterday when I tested my hypothesis.

I think this cannot have gone on for long, otherwise I'd have noticed this before.

Comment 23 Andrew 2017-11-25 11:25:15 UTC

Did some checks on one installation. RPM scriptlets seem not guilty, this command:

  $ rpm -qa --scripts | grep -i acl

gave only this (not) relevant output:

  # Apply ACL to the journal directory
  setfacl -Rnm g:wheel:rx,d:g:wheel:rx,g:adm:rx,d:g:adm:rx /var/log/journal/ &>/dev/null || :

Also listed and sorted all ACLed files. Earliest were modified (used `ls -cl` for check) on May 10. Files are present in different locations incl some which should not be touched by any package. And it was not a distro update.

My suspect is the manual movement of entire OS I might do that time. Some bad flags for tar, maybe.

I'll remove these ACLs and do an update to F27 soon. Will check again after that.
Any suggestions are welcome.

Comment 24 Fedora Update System 2018-02-01 12:05:22 UTC

rpm-4.14.1-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf91c45f15

Comment 25 Fedora Update System 2018-02-01 19:32:23 UTC

rpm-4.14.1-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf91c45f15

Comment 26 Fedora Update System 2018-02-06 15:33:03 UTC

rpm-4.14.1-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.