Description of problem: My F17 i686 system update ends with yum crash after "Running Transaction Test" with "Bus error" (SIGBUS) error message. Narrowing problem, I find that: - certain operations as package installation, deletion, verificatiob and RPM DB rebuild works fine - problem is with "php-common" package - operations as "rpm -V php-common", "rpm -e php-common", "rpm -Uv /tmp/php-common-5.4.6-2.fc17.i686.rpm" with him ends with same error. When I tried it just erase from RPM DB, same error: # rpm -e --justdb php-common BDB2053 Freeing read locks for locker 0x1c5: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1c7: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1c8: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1c9: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1ca: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1cb: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1cc: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1cd: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1ce: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1cf: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1d0: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1d1: 22565/3076258176 BDB2053 Freeing read locks for locker 0x1d2: 22565/3076258176 Chyba na sběrnici (neplatná adresa) (SIGBUS) (rem: czech "Chyba na sběrnici (neplatná adresa)" is translatable as "Bus error (invalid address)") I tried debug this last command (delete "php-common just in RPM DB, without modify package files on filesystem) i got: # gdb rpm GNU gdb (GDB) Fedora (7.4.50.20120120-49.fc17) Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/rpm...Reading symbols from /usr/lib/debug/bin/rpm.debug...done. done. (gdb) run -e --justdb php-common Starting program: /usr/bin/rpm -e --justdb php-common [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". Program received signal SIGBUS, Bus error. SHA256_Update (ctx=0x8838160, input=0xb70bd000 <Address 0xb70bd000 out of bounds>, inputLen=61472) at sha512.c:444 444 memcpy(B, input, SHA256_BLOCK_LENGTH); (gdb) bt #0 SHA256_Update (ctx=0x8838160, input=0xb70bd000 <Address 0xb70bd000 out of bounds>, inputLen=61472) at sha512.c:444 #1 0xb7392b27 in SHA256_Update (cx=0x8838160, input=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., inputLen=65568) at loader.c:990 #2 0xb7378931 in NSC_DigestUpdate (ulPartLen=65568, pPart=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., hSession=17) at pkcs11c.c:1388 #3 NSC_DigestUpdate (hSession=17, pPart=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., ulPartLen=65568) at pkcs11c.c:1376 #4 0xb765b7cd in PK11_DigestOp (context=0x887db80, in=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., inLen=65568) at pk11cxt.c:819 #5 0xb764d683 in HASH_Update (context=0x88e70d8, src=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., len=len@entry=65568) at sechash.c:422 #6 0x4e62d1ce in rpmDigestUpdate (ctx=0x88e1f58, data=data@entry=0xb70bc000, len=len@entry=65568) at digest.c:201 #7 0x4e63986f in rpmDoDigest (algo=8, fn=0x883dc00 "/etc/php.ini", asAscii=0, digest=0xbf8a4cc8 "\230M\212\277\220M\212\277", fsizep=0x0) at rpmfileutil.c:184 #8 0x4e607745 in handleOverlappedFiles (fi=0x88248b8, p=0x88265e8, ht=0x886e9a0, ts=0x881e6f8) at transaction.c:541 #9 rpmtsPrepare (ts=0x881e6f8) at transaction.c:1309 #10 rpmtsRun (ts=ts@entry=0x881e6f8, okProbs=okProbs@entry=0x0, ignoreSet=0) at transaction.c:1418 #11 0x4e5f89f0 in rpmcliTransaction (ts=ts@entry=0x881e6f8, numPackages=numPackages@entry=1, ia=0x4e6226e0, ia=0x4e6226e0) at rpminstall.c:273 #12 0x4e5f8deb in rpmErase (ts=ts@entry=0x881e6f8, ia=ia@entry=0x4e6226e0, argv=0x87f5a08) at rpminstall.c:692 #13 0x080491e0 in main (argc=4, argv=0xbf8a4fd4) at rpmqv.c:258 (gdb) I suspected some problem with file "/etc/php.ini", as backtrace show its first lines, and rpm had it open: # lsof|grep php rpm 22565 root mem REG 8,4 65568 789946 /etc/php.ini rpm 22565 root 25r REG 8,4 65568 789946 /etc/php.ini My circumstance was acknowledged with: # LANG=C cp -a /etc/php.ini /boot/php.ini.BCKP cp: reading `/etc/php.ini': Input/output error cp: failed to extend `/boot/php.ini.BCKP': Input/output error # LANG=C cat /etc/php.ini >/dev/null cat: /etc/php.ini: Input/output error and finally after rewriting this file with one extrahed from rpm package: cat /tmp/php.ini >/etc/php.ini solved my problem and I was able do all operations with RPM subsystem ("rpm -e php-common", "yum update", ...) But there is perhaps one or several minor bugs in RPM code: - I/O errors should be reported (namely in "rpm -V PackageName" operation) - when operate just on RPM DB, why are accessed files on filesystem? IMO they should not be accessed. - why "rpm -U PackageFilename.rpm" operation supervise files from old package? When it did owerwrite old files (as I did with /etc/php.ini manually), then problem should go away. (but... - this point is foolish, as rpm perhaps a) old files not owerwrite, but delete at end, in order to revert transaction when some comes bad; b) /etc/php.ini is config file and rpm should ascertain if it was changed in comparison with DB information and decide its reservation or renaming to .rpmsave or so) Version-Release number of selected component (if applicable): rpm-4.9.1.3-7.fc17.i686
Rpm needs to look at previous package contents when its a %config file which might've been modified and thus could need a backup. --justdb only affects writing of the new contents, not everything rpm do. Obviously rpm shouldn't crash in case of io-error on a file, but these kind of errors are rare and hard to reproduce/systematically test against. Do you happen to have a way of reproducing the "Input/output error" on that file?
*** Bug 877742 has been marked as a duplicate of this bug. ***
FWIW if somebody can easily reproduce disk IO errors, it wouldn't hurt to test with rpm >= 4.10.1 (ie Fedora >= 18), as the newer versions no longer use mmap() in this context.
dmsetup simulates IO errors with the error target. I've never tried to use device-mapper directly - just use lvm2. But I'll give it a try - should be educational. o I'll try to make a script that creates a file with a few megs, creates a loopback device, then a device mapper device that replaces selected sectors with error. o then that bad_disk is mounted on /var/test, and a test.rpm installs files to /var/test. o somehow, we ensure that the simulated bad blocks are in one of the files. I think debugfs can provide block mapping for files, and then maybe the dmsetup script above can be smart enough to *add* specific bad blocks while mounted. But that may be too advanced for me. Might have to create a few, and reinstall the rpm until it hits one.
Example block device with errors here: http://mbroz.fedorapeople.org/talks/DeviceMapperBasics/dm.pdf
Created attachment 648127 [details] SRPM of test case for rpm Created a simple test case for this rpm bug. Uploaded abrt from running it to bug#878201 (abrt doesn't like to let you specify an existing bug). Instructions: 1) inspect, build SRPM 2) install RPM 3) cd /var/testrpm 4) sh -x testrpm.sh [root@sdg testrpm]# sh -x testrpm.sh ++ losetup -f + loopdev=/dev/loop0 + '[' /dev/loop0 '!=' /dev/loop0 ']' + dd if=/dev/zero of=testdisk.img bs=1k count=2048 2048+0 records in 2048+0 records out 2097152 bytes (2.1 MB) copied, 0.0355833 s, 58.9 MB/s + mke2fs -F testdisk.img mke2fs 1.42.3 (14-May-2012) Discarding device blocks: done Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 256 inodes, 2048 blocks 102 blocks (4.98%) reserved for the super user First data block=1 Maximum filesystem blocks=2097152 1 block group 8192 blocks per group, 8192 fragments per group 256 inodes per group Allocating group tables: done Writing inode tables: done Writing superblocks and filesystem accounting information: done + mv /mnt/testrpm /mnt/testrpm.save + mkdir -p /mnt/testrpm + mount -o loop testdisk.img /mnt/testrpm + rsync -rav /mnt/testrpm.save/ /mnt/testrpm sending incremental file list ./ testrpm.dat sent 1048796 bytes received 34 bytes 2097660.00 bytes/sec total size is 1048576 speedup is 1.00 + rpm -V testrpm + umount /mnt/testrpm + losetup /dev/loop0 /home/stuart/testrpm/testdisk.img + dmsetup create bad_disk testrpm.dm + mount /dev/mapper/bad_disk /mnt/testrpm + rpm -V testrpm testrpm.sh: line 28: 2813 Bus error (core dumped) rpm -V testrpm + umount /mnt/testrpm + dmsetup remove bad_disk + losetup -d /dev/loop0 + rmdir /mnt/testrpm + mv /mnt/testrpm.save /mnt/testrpm + rpm -V testrpm BDB2053 Freeing read locks for locker 0x501: 2813/3078138240 BDB2053 Freeing read locks for locker 0x502: 2813/3078138240 BDB2053 Freeing read locks for locker 0x503: 2813/3078138240 BDB2053 Freeing read locks for locker 0x504: 2813/3078138240 BDB2053 Freeing read locks for locker 0x505: 2813/3078138240 BDB2053 Freeing read locks for locker 0x506: 2813/3078138240 BDB2053 Freeing read locks for locker 0x507: 2813/3078138240 BDB2053 Freeing read locks for locker 0x508: 2813/3078138240 BDB2053 Freeing read locks for locker 0x509: 2813/3078138240 BDB2053 Freeing read locks for locker 0x50a: 2813/3078138240 BDB2053 Freeing read locks for locker 0x50b: 2813/3078138240 BDB2053 Freeing read locks for locker 0x50c: 2813/3078138240 BDB2053 Freeing read locks for locker 0x50d: 2813/3078138240 BDB2053 Freeing read locks for locker 0x50e: 2813/3078138240 [root@sdg testrpm]#
test case provided. Can install testrpm and run repeatedly until fixed.
*** Bug 878201 has been marked as a duplicate of this bug. ***
Ooh, nice. Thanks for the very nice reproducer. And education :) I had no idea I had such disk-error simulation tools right under my nose all this time. The good news is that rpm >= 4.10.1 doesn't crash on it, not using mmap() is where the difference comes from. It doesn't report an actual IO error on it though, just a "cannot compute" question mark but that's still a whole lot better than crashing: [root@localhost testrpm]# sh testrpm.sh 2>&1|grep /testrpm S.5....T. /var/testrpm/testrpm.sh S.?...... /mnt/testrpm/testrpm.dat S.5....T. /var/testrpm/testrpm.sh S.5....T. /var/testrpm/testrpm.sh [root@localhost testrpm]#
Is there a performance difference between mmap and read for verify?
I would like to see an E for Error replace the 5 when there is an IO Error. Although a '?' meets the specification ("the test could not be performed"), IO errors, when they occur, are something you *really* want to know about.
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Closing NEXTRELEASE, there are no updates planned for F17 this point anymore. As for better indication of IO-errors, that's more of an upstream issue.