Created attachment 1354364 [details] strace -f -ff -s 200 -o /tmp/rpm.strace rpm -ihv /var/cache/dnf/updates-2854b3113b7a3c6c/packages/sip-4.19.3-4.fc26.x86_64.rpm Issue first noticed with dnf, but is reproducible using plain rpm. $ rpm -q sip package sip is not installed $ sudo rpm -ihv /var/cache/dnf/updates-2854b3113b7a3c6c/packages/sip-4.19.3-4.fc26.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:sip-4.19.3-4.fc26 ################################# [100%] error: unpacking of archive failed on file /usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194: cpio: open error: sip-4.19.3-4.fc26.x86_64: install failed This happens with multiple packages. For each given package the file that fails to install is always the same. Other packages install fine. - The file system is not full - Apart from /boot there are no other subfilesystems - fsck on the file system shows no error. Attached is an strace of an rpm installation attempt of the above package.
Relevant installed packages: rpm-libs-4.13.0.2-1.fc26.x86_64 rpm-4.13.0.2-1.fc26.x86_64 libdb-5.3.28-24.fc26.x86_64
Manually extracting the very same rpm files with rpm2cpio | cpio -id on the same file system works fine.
What does 'rpm -Kv <packages>' say about packages failing that way?
Also the output of 'rpm -Uvv <package>' might provide further clues.
Found same for three (random?) packages while usual `dnf update`: qt5-qtwebengine, kernel-devel, perf Example: # rpm -Uhv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:qt5-qtwebengine-5.9.2-2.fc26 ################################# [ 50%] error: unpacking of archive failed on file /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12c8f4: cpio: open error: qt5-qtwebengine-5.9.2-2.fc26.x86_64: install failed error: qt5-qtwebengine-5.9.1-5.fc26.x86_64: erase skipped rpm2cpio unpack is fine, but have this (packed hardlinks?): ./usr/bin/qwebengine_convert_dict cpio: ./usr/lib64/qt5/bin/qwebengine_convert_dict linked to ./usr/bin/qwebengine_convert_dict ./usr/lib64/qt5/bin/qwebengine_convert_dict # ll -i ./usr/lib64/qt5/bin/qwebengine_convert_dict ./usr/bin/qwebengine_convert_dict 143522 -rwxr-xr-x. 2 root root 250872 Nov 20 14:21 ./usr/bin/qwebengine_convert_dict 143522 -rwxr-xr-x. 2 root root 250872 Nov 20 14:21 ./usr/lib64/qt5/bin/qwebengine_convert_dict As requested: # rpm -Kv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 64dab85d: OK Header SHA1 digest: OK (9bf0ceebe0787d9dd8fdd0820001cf9aae439445) V3 RSA/SHA256 Signature, key ID 64dab85d: OK MD5 digest: OK (299b44fddf3b3cfdc2a1780476d8e37a) # rpm -Uvv ../qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm ... D: lsetfilecon: (/usr/share/qt5/translations/qtwebengine_locales/zh-CN.pak;5a12ccbd, system_u:object_r:usr_t:s0) D: create 100644 1 ( 0, 0)230581 /usr/share/qt5/translations/qtwebengine_locales/zh-TW.pak;5a12ccbd ufdio: 8 writes, 230581 total bytes in 0.000121 secs D: Plugin: calling hook fsm_file_prepare in selinux plugin D: lsetfilecon: (/usr/share/qt5/translations/qtwebengine_locales/zh-TW.pak;5a12ccbd, system_u:object_r:usr_t:s0) D: create 100755 2 ( 0, 0) 0 /usr/bin/qwebengine_convert_dict;5a12ccbd D: create 100755 2 ( 0, 0)250872 /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12ccbd fdio: 4731 reads, 136326444 total bytes in 3.352210 secs error: unpacking of archive failed on file /usr/lib64/qt5/bin/qwebengine_convert_dict;5a12ccbd: cpio: open ufdio: 6 reads, 47528 total bytes in 0.000015 secs error: qt5-qtwebengine-5.9.2-2.fc26.x86_64: install failed D: ========== +++ qt5-qtwebengine-5.9.1-5.fc26 x86_64-linux 0x2 error: qt5-qtwebengine-5.9.1-5.fc26.x86_64: erase skipped D: running post-transaction scripts ... There is a bunch of similar open issues with cpio rename: https://bugzilla.redhat.com/show_bug.cgi?id=1500984 https://bugzilla.redhat.com/show_bug.cgi?id=1397543 https://bugzilla.redhat.com/show_bug.cgi?id=1493741 https://bugzilla.redhat.com/show_bug.cgi?id=1409904 And one for digest: https://bugzilla.redhat.com/show_bug.cgi?id=1174137
Created attachment 1356045 [details] Requested rpm -v outputs rpm -Kv and rpm -Uvv outputs, as requested.
I concur that this probably has something to do with hardlinks, in my sip package test case the .pyo file also is a hard link: (output of rpm2cpio | cpio -id of the rpm file): $ find -type f -printf '%i %p\n' 967491 ./usr/lib64/python2.7/site-packages/sip.pyi 967492 ./usr/lib64/python2.7/site-packages/sip.so 967493 ./usr/lib64/python2.7/site-packages/sipconfig.py 967494 ./usr/lib64/python2.7/site-packages/sipdistutils.py 967505 ./usr/lib64/python2.7/site-packages/sipconfig.pyo 967505 ./usr/lib64/python2.7/site-packages/sipconfig.pyc 967506 ./usr/lib64/python2.7/site-packages/sipdistutils.pyo 967506 ./usr/lib64/python2.7/site-packages/sipdistutils.pyc 967498 ./usr/share/doc/sip/NEWS 967499 ./usr/share/doc/sip/README 967502 ./usr/share/licenses/sip/LICENSE 967503 ./usr/share/licenses/sip/LICENSE-GPL2 967504 ./usr/share/licenses/sip/LICENSE-GPL3 As you can see sipconfig.pyo and sipconfig.pyc share an inode.
Not sure if this is relevant in any way, but the underlying file system is btrfs in my case.
Mine is ext4, enough of free space.
Right, thanks for the data. I presume the problem goes away if you downgrade to rpm 4.13.0.1? The only thing that I can think of causing this is a new sanity check regarding hardlinks failing in some situations, which would manifest as an cpio open failure (a bit hysterically yes). Now if I could only figure out what exactly it is that makes it fail, I haven't been able to reproduce this so far.
Okay so here's the deal, visible in the strace in initial desscription when you know what to look for: umask(0577) = 022 open("/usr/lib64/python2.7/site-packages/sipconfig.pyc;5a0f5194", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0666) = 32 fcntl(32, F_SETFD, FD_CLOEXEC) = 0 umask(022) = 0577 close(32) = 0 lstat("/usr/lib64/python2.7/site-packages/sipconfig.pyc", 0x7ffd74b4b440) = -1 ENOENT (No such file or directory) rename("/usr/lib64/python2.7/site-packages/sipconfig.pyc;5a0f5194", "/usr/lib64/python2.7/site-packages/sipconfig.pyc") = 0 link("/usr/lib64/python2.7/site-packages/sipconfig.pyc", "/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194") = 0 umask(0577) = 022 open("/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194", O_WRONLY|O_CREAT|O_APPEND, 0666) = 32 fcntl(32, F_SETFD, FD_CLOEXEC) = 0 umask(022) = 0577 fstat(32, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 ^^^^ close(32) = 0 lstat("/usr/lib64/python2.7/site-packages/sipconfig.pyo;5a0f5194", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 ^^^^ That file is supposed to have 0200 permissions as it was only just created a couple of syscalls earlier under 0577 umask, so rpm thinks there's something fishy going on and bails out. So actually this is exactly what the new sanity check is *supposed* to catch: somebody/something is allegedly goofing around behind it's back. Simplified, this is what is supposed to happen: # umask 577; touch f; umask 022; ln f l; ls -l f l --w-------. 2 root root 0 Nov 21 14:05 f --w-------. 2 root root 0 Nov 21 14:05 l ...yet somehow in this case, the file ends up created as if the first umask never occurred (or was 022 which would be the default), or something changed the files permissions in the fleeting moment between its creation and reopen. Doing so reproducably seems extremely unlikely of course, so the question remains: what the heck? Just to get it straight, are these physical systems or some sort of virtual/container/somethingelse? Can you reproduce at will on other hosts too?
Mine is a physical host, and (thankfully) the only machine exhibiting this behaviour so far. I'll see if the simple shell example above on the box in question and see if that also does funny things with umask.
Created attachment 1356669 [details] strace rpm my output from: # strace -f -s 200 -o /tmp/rpm-strace-1.txt rpm -Uvh /var/cache/dnf/updates-2854b3113b7a3c6c/packages/qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm Reproducible on two identical Fedora 26 installation, laptops, no any VM/containers (except systemd's default cgroups, etc).
Created attachment 1356672 [details] log of dnf downgrade rpm I can confirm that downgrading rpm 4.13.0.2-1.fc26 -> rpm-4.13.0.1-4.fc26 fixes that. Had also a bunch of similar errors during downgrade, attaching.
Right, the strace from comment #13 shows the same symptoms, only with differently wrong permissions: 28421 open("/usr/lib64/qt5/bin/qwebengine_convert_dict;5a141f1b", O_WRONLY|O_CREAT|O_APPEND, 0666) = 32 28421 fcntl(32, F_SETFD, FD_CLOEXEC) = 0 28421 umask(022) = 0577 28421 fstat(32, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 28421 close(32) = 0 28421 lstat("/usr/lib64/qt5/bin/qwebengine_convert_dict;5a141f1b", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 qwebengine_convert_dict and sipconfig.pyo would of course have different permissions eventually, but at this point that shouldn't make any difference at all, both should simply be 0200 here. The initial umask value on system is different between Andrew's and Ralf's logs, but that shouldn't make any difference either since rpm always explicitly sets it as necessary, as too is visible in the logs. And initial umask doesn't make any difference in my tests either (which is how it should be). I still haven't been able to reproduce anything like it and am at loss for immediate ideas where this such behavior could be coming from (this looks so weird that it's likely either something blindingly simple and obvious that I'm just not seeing, or something dumbfoundingly complex :) So I guess we can just try gathering more data to hopefully find an explanation eventually. Things like kernel versions, system upgrade history... and of course anything you can think of that might be special about these systems. If you guys can cross-test each-others failing cases (with rpm 4.13.0.2), that would tell us something about it. Also running under valgrind might bring up something. A sosreport (from 'sos' package) will collect a huge amount of data from the system that might be helpful, but I certainly understand if that feels uncomfortable so by no means required. If/when you downgrade to rpm 4.13.0.1, you'll want to reinstall the python sub-packages afterwards to get the pyc/pyo hardlinks to install correctly although it shouldn't matter for any functionality. As yet another data point, rpm 4.14 (so Fedora >= 27) has the same exact new check and thus is expected to behave identically.
I'll see what I can find out. One "good" thing about this is that due to the failing transactions the impacted packages are never marked as installed/updated in the RPM database, so just running an update again (with a non-broken RPM) should hopefully sort everything out.
(In reply to Panu Matilainen from comment #15) > If you guys can cross-test each-others failing cases (with rpm 4.13.0.2), > that would tell us something about it. You mean running `rpm -ivh sip-4.19.3-4.fc26.x86_64.rpm` for me and the same with qt5-qtwebengine-5.9.2-2.fc26.x86_64.rpm for Ralf, correct? (In reply to Ralf Ertzinger from comment #16) > just running an update again (with a non-broken RPM) should hopefully > sort everything out. That's what I already did after rpm downgrade. At least the final update transaction in dnf seems fine (with blacklisted rpm package).
So.... unless I'm being incredibly dense something is very, very wrong here. Unffected machine: ------------------ [sun ethan ~/temp/foo :) 8]$ umask 0077 [sun ethan ~/temp/foo :) 9]$ umask 0577 [sun ethan ~/temp/foo :) 10]$ touch f [sun ethan ~/temp/foo :) 11]$ ls -l total 1 --w-------. 1 sun users 0 Nov 21 19:43 f [sun ethan ~/temp/foo :) 12]$ umask 0077 [sun ethan ~/temp/foo :) 13]$ ln f l [sun ethan ~/temp/foo :) 14]$ ls -l total 1 --w-------. 2 sun users 0 Nov 21 19:43 f --w-------. 2 sun users 0 Nov 21 19:43 l [sun ethan ~/temp/foo :) 15]$ uname -r 4.13.9-200.fc26.x86_64 Affected machine: ----------------- [sun faith ~/temp/foo :) 14]$ umask 0077 [sun faith ~/temp/foo :) 15]$ umask 0577 [sun faith ~/temp/foo :) 16]$ touch f [sun faith ~/temp/foo :) 17]$ ls -l total 0 -rw-------. 1 sun users 0 Nov 21 20:43 f [sun faith ~/temp/foo :) 18]$ umask 0077 [sun faith ~/temp/foo :) 19]$ ln f l [sun faith ~/temp/foo :) 20]$ ls -l total 0 -rw-------. 2 sun users 0 Nov 21 20:43 f -rw-------. 2 sun users 0 Nov 21 20:43 l [sun faith ~/temp/foo :) 21]$ uname -r 4.13.12-200.fc26.x86_64 Digging some more.... it seems my _entire_ root file system suddenly has grown some POSIX-ACLs that I'm very sure it didn't have before. [sun faith ~ :) 28]$ cd / [sun faith / :) 29]$ getfacl . # file: . # owner: root # group: root user::r-x group::r-x other::r-x default:user::r-x default:group::r-x default:other::r-x Now, in order to track this problem down I did do a full "tar up the entire /, remake the file system, tar everything back", and I did use the --acls flag in doing so, and somehow, somewhere, I might have messed things up and spread POSIX acls all over my FS. I'm pretty sure I don't actually need those, so I'll clean that up.
Booted into a rescue system, did a "setfacl -b -R /mnt/sysroot". Booted back into my normal system, and.... everything is fine. I can install packages again. Now, if someone has an explanation for this that doesn't involve ghosts, or me sleepwalking and sprinkling ACLs all over my file systems I'm all ears. On this particular system dumping all ACLs isn't a big issue, as I'm not using them, but I have other systems where I do, and doing a forced cleanup there is going to be very annoying. Thankfully (again) none of my other systems have exhibited this behaviour so far.
Whoa. POSIX ACL's never even crossed my mind, but sure enough it's actually explained right there in umask(2) man page that they can override umask. Million thanks for investigating and figuring it out Ralf! No doubt Andrew's case is the same, only with different default ACLs. As to what could cause such a thing... bugs in backup / other system software or admin/install scripts, package scriptlets, human error and all come to mind. Rpm doesn't support setting acl's so it should be off the hook here, but there are a few packages that do so manually from their scriptlets in Fedora, and who knows what else is in other repos etc. I'll need to think a bit what to take home from this. There's already a better fix for CVE-2017-7501 upstream that eliminates the need for the check that is tripping up here, but restricted access to files during unpacking them is a security requirement from rpm's side so it'll probably need to do an explicit chmod() too. Also the error behavior here is quite terrible as it gives no hint whatsoever what is actually wrong, but then that'll go away with the revised upstream fix.
Can't verify full recover for now, but yes, I also have unknown ACLs: # ll / | grep + drwxr-xr-x+ 3 andrew andrew 4096 May 11 2016 auto dr-xr-xr-x+ 5 root root 4096 Nov 21 14:58 boot drwxr-xr-x+ 6 root root 4096 May 2 2017 grub2 drwx------+ 2 root root 16384 Jul 1 2013 lost+found drwxr-xr-x+ 2 root root 4096 Feb 10 2017 media drwxr-xr-x+ 4 root root 4096 Feb 10 2017 mnt drwxr-xr-x+ 8 root root 4096 Nov 20 16:52 opt dr-xr-x---+ 35 root root 4096 Nov 19 21:13 root drwxr-xr-x+ 2 root root 4096 Feb 10 2017 srv drwxr-xr-x+ 21 root root 4096 Aug 10 20:02 usr # getfacl usr # file: usr # owner: root # group: root user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:other::r-x
I've poked around in the scripts section of my installed rpms, but so far nothing has shown up. I also haven't run any software that I'd expect to do anything remotely related to this. Andrew: any chance you could deduct from the affected files on your system when this has started? I've pretty much wiped everything on my side yesterday when I tested my hypothesis. I think this cannot have gone on for long, otherwise I'd have noticed this before.
Did some checks on one installation. RPM scriptlets seem not guilty, this command: $ rpm -qa --scripts | grep -i acl gave only this (not) relevant output: # Apply ACL to the journal directory setfacl -Rnm g:wheel:rx,d:g:wheel:rx,g:adm:rx,d:g:adm:rx /var/log/journal/ &>/dev/null || : Also listed and sorted all ACLed files. Earliest were modified (used `ls -cl` for check) on May 10. Files are present in different locations incl some which should not be touched by any package. And it was not a distro update. My suspect is the manual movement of entire OS I might do that time. Some bad flags for tar, maybe. I'll remove these ACLs and do an update to F27 soon. Will check again after that. Any suggestions are welcome.
rpm-4.14.1-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf91c45f15
rpm-4.14.1-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf91c45f15
rpm-4.14.1-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.