Bug 854798

Summary: rpm crash with SIGBUS in certain operations when package file has I/O error(s)
Product: [Fedora] Fedora Reporter: Frantisek Hanzlik <franta>
Component: rpmAssignee: Fedora Packaging Toolset Team <packaging-team>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: ffesti, jzeleny, packaging-team, pknirsch, pmatilai, stuart
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-04 07:39:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SRPM of test case for rpm none

Description Frantisek Hanzlik 2012-09-06 02:18:37 UTC
Description of problem:
My F17 i686 system update ends with yum crash after "Running Transaction Test" with "Bus error" (SIGBUS) error message. Narrowing problem, I find that:

- certain operations as package installation, deletion, verificatiob and RPM DB rebuild works fine

- problem is with "php-common" package - operations as "rpm -V php-common", "rpm -e php-common", "rpm -Uv /tmp/php-common-5.4.6-2.fc17.i686.rpm" with him ends with same error.
When I tried it just erase from RPM DB, same error:
# rpm -e --justdb php-common
BDB2053 Freeing read locks for locker 0x1c5: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1c7: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1c8: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1c9: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1ca: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1cb: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1cc: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1cd: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1ce: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1cf: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1d0: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1d1: 22565/3076258176
BDB2053 Freeing read locks for locker 0x1d2: 22565/3076258176
Chyba na sběrnici (neplatná adresa) (SIGBUS)
(rem: czech "Chyba na sběrnici (neplatná adresa)" is translatable as "Bus error (invalid address)")

I tried debug this last command (delete "php-common just in RPM DB, without modify package files on filesystem) i got:

# gdb rpm
GNU gdb (GDB) Fedora (7.4.50.20120120-49.fc17)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/rpm...Reading symbols from /usr/lib/debug/bin/rpm.debug...done.
done.
(gdb) run -e --justdb php-common
Starting program: /usr/bin/rpm -e --justdb php-common
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGBUS, Bus error.
SHA256_Update (ctx=0x8838160, input=0xb70bd000 <Address 0xb70bd000 out of bounds>, inputLen=61472) at sha512.c:444
444             memcpy(B, input, SHA256_BLOCK_LENGTH);
(gdb) bt
#0  SHA256_Update (ctx=0x8838160, input=0xb70bd000 <Address 0xb70bd000 out of bounds>, inputLen=61472) at sha512.c:444
#1  0xb7392b27 in SHA256_Update (cx=0x8838160, 
    input=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini   ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., inputLen=65568) at loader.c:990
#2  0xb7378931 in NSC_DigestUpdate (ulPartLen=65568, 
    pPart=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini   ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., hSession=17) at pkcs11c.c:1388
#3  NSC_DigestUpdate (hSession=17, 
    pPart=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini   ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., ulPartLen=65568) at pkcs11c.c:1376
#4  0xb765b7cd in PK11_DigestOp (context=0x887db80, 
    in=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini   ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., 
    inLen=65568) at pk11cxt.c:819
#5  0xb764d683 in HASH_Update (context=0x88e70d8, 
    src=0xb70bc000 "[PHP]\n\n", ';' <repeats 19 times>, "\n; About php.ini   ;\n", ';' <repeats 19 times>, "\n; PHP's initialization file, generally called php.ini, is responsible for\n; configuring many of the aspects of PHP's behavior.\n\n; PHP"..., len=len@entry=65568) at sechash.c:422
#6  0x4e62d1ce in rpmDigestUpdate (ctx=0x88e1f58, data=data@entry=0xb70bc000, len=len@entry=65568) at digest.c:201
#7  0x4e63986f in rpmDoDigest (algo=8, fn=0x883dc00 "/etc/php.ini", asAscii=0, 
    digest=0xbf8a4cc8 "\230M\212\277\220M\212\277", fsizep=0x0) at rpmfileutil.c:184
#8  0x4e607745 in handleOverlappedFiles (fi=0x88248b8, p=0x88265e8, ht=0x886e9a0, ts=0x881e6f8) at transaction.c:541
#9  rpmtsPrepare (ts=0x881e6f8) at transaction.c:1309
#10 rpmtsRun (ts=ts@entry=0x881e6f8, okProbs=okProbs@entry=0x0, ignoreSet=0) at transaction.c:1418
#11 0x4e5f89f0 in rpmcliTransaction (ts=ts@entry=0x881e6f8, numPackages=numPackages@entry=1, ia=0x4e6226e0, ia=0x4e6226e0)
    at rpminstall.c:273
#12 0x4e5f8deb in rpmErase (ts=ts@entry=0x881e6f8, ia=ia@entry=0x4e6226e0, argv=0x87f5a08) at rpminstall.c:692
#13 0x080491e0 in main (argc=4, argv=0xbf8a4fd4) at rpmqv.c:258
(gdb)

I suspected some problem with file "/etc/php.ini", as backtrace show its first lines, and rpm had it open:

# lsof|grep php
rpm       22565         root  mem       REG        8,4     65568     789946 /etc/php.ini
rpm       22565         root   25r      REG        8,4     65568     789946 /etc/php.ini

My circumstance was acknowledged with:
# LANG=C cp -a /etc/php.ini /boot/php.ini.BCKP
cp: reading `/etc/php.ini': Input/output error
cp: failed to extend `/boot/php.ini.BCKP': Input/output error
# LANG=C cat /etc/php.ini >/dev/null
cat: /etc/php.ini: Input/output error

and finally after rewriting this file with one extrahed from rpm package:
cat /tmp/php.ini >/etc/php.ini
solved my problem and I was able do all operations with RPM subsystem
("rpm -e php-common", "yum update", ...)

But there is perhaps one or several minor bugs in RPM code:

- I/O errors should be reported (namely in "rpm -V PackageName" operation)

- when operate just on RPM DB, why are accessed files on filesystem? IMO they should not be accessed.

- why "rpm -U PackageFilename.rpm" operation supervise files from old package?
When it did owerwrite old files (as I did with /etc/php.ini manually), then problem should go away.
(but... - this point is foolish, as rpm perhaps a) old files not owerwrite, but delete at end, in order to revert transaction when some comes bad; b) /etc/php.ini is config file and rpm should ascertain if it was changed in comparison with DB information and decide its reservation or renaming to .rpmsave or so)

Version-Release number of selected component (if applicable):
rpm-4.9.1.3-7.fc17.i686

Comment 1 Panu Matilainen 2012-10-15 09:06:56 UTC
Rpm needs to look at previous package contents when its a %config file which might've been modified and thus could need a backup. --justdb only affects writing of the new contents, not everything rpm do.

Obviously rpm shouldn't crash in case of io-error on a file, but these kind of errors are rare and hard to reproduce/systematically test against. Do you happen to have a way of reproducing the "Input/output error" on that file?

Comment 2 Panu Matilainen 2012-11-19 12:03:13 UTC
*** Bug 877742 has been marked as a duplicate of this bug. ***

Comment 3 Panu Matilainen 2012-11-19 12:10:27 UTC
FWIW if somebody can easily reproduce disk IO errors, it wouldn't hurt to test with rpm >= 4.10.1 (ie Fedora >= 18), as the newer versions no longer use mmap() in this context.

Comment 4 Stuart D Gathman 2012-11-19 18:49:01 UTC
dmsetup simulates IO errors with the error target.  I've never tried to use device-mapper directly - just use lvm2.  But I'll give it a try - should be educational.  

 o I'll try to make a script that creates a file with a few megs, creates a loopback device, then a device mapper device that replaces selected sectors with error.

 o then that bad_disk is mounted on /var/test, and a test.rpm installs files to /var/test. 

 o somehow, we ensure that the simulated bad blocks are in one of the files.  I think debugfs can provide block mapping for files, and then maybe the dmsetup script above can be smart enough to *add* specific bad blocks while mounted.  But that may be too advanced for me.  Might have to create a few, and reinstall the rpm until it hits one.

Comment 5 Stuart D Gathman 2012-11-19 18:53:44 UTC
Example block device with errors here:

http://mbroz.fedorapeople.org/talks/DeviceMapperBasics/dm.pdf

Comment 6 Stuart D Gathman 2012-11-19 21:21:23 UTC
Created attachment 648127 [details]
SRPM of test case for rpm

Created a simple test case for this rpm bug.  Uploaded abrt from running it to bug#878201 (abrt doesn't like to let you specify an existing bug).

Instructions:

1) inspect, build SRPM
2) install RPM
3) cd /var/testrpm
4) sh -x testrpm.sh

[root@sdg testrpm]# sh -x testrpm.sh
++ losetup -f
+ loopdev=/dev/loop0
+ '[' /dev/loop0 '!=' /dev/loop0 ']'
+ dd if=/dev/zero of=testdisk.img bs=1k count=2048
2048+0 records in
2048+0 records out
2097152 bytes (2.1 MB) copied, 0.0355833 s, 58.9 MB/s
+ mke2fs -F testdisk.img
mke2fs 1.42.3 (14-May-2012)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
256 inodes, 2048 blocks
102 blocks (4.98%) reserved for the super user
First data block=1
Maximum filesystem blocks=2097152
1 block group
8192 blocks per group, 8192 fragments per group
256 inodes per group

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done

+ mv /mnt/testrpm /mnt/testrpm.save
+ mkdir -p /mnt/testrpm
+ mount -o loop testdisk.img /mnt/testrpm
+ rsync -rav /mnt/testrpm.save/ /mnt/testrpm
sending incremental file list
./
testrpm.dat

sent 1048796 bytes  received 34 bytes  2097660.00 bytes/sec
total size is 1048576  speedup is 1.00
+ rpm -V testrpm
+ umount /mnt/testrpm
+ losetup /dev/loop0 /home/stuart/testrpm/testdisk.img
+ dmsetup create bad_disk testrpm.dm
+ mount /dev/mapper/bad_disk /mnt/testrpm
+ rpm -V testrpm
testrpm.sh: line 28:  2813 Bus error               (core dumped) rpm -V testrpm
+ umount /mnt/testrpm
+ dmsetup remove bad_disk
+ losetup -d /dev/loop0
+ rmdir /mnt/testrpm
+ mv /mnt/testrpm.save /mnt/testrpm
+ rpm -V testrpm
BDB2053 Freeing read locks for locker 0x501: 2813/3078138240
BDB2053 Freeing read locks for locker 0x502: 2813/3078138240
BDB2053 Freeing read locks for locker 0x503: 2813/3078138240
BDB2053 Freeing read locks for locker 0x504: 2813/3078138240
BDB2053 Freeing read locks for locker 0x505: 2813/3078138240
BDB2053 Freeing read locks for locker 0x506: 2813/3078138240
BDB2053 Freeing read locks for locker 0x507: 2813/3078138240
BDB2053 Freeing read locks for locker 0x508: 2813/3078138240
BDB2053 Freeing read locks for locker 0x509: 2813/3078138240
BDB2053 Freeing read locks for locker 0x50a: 2813/3078138240
BDB2053 Freeing read locks for locker 0x50b: 2813/3078138240
BDB2053 Freeing read locks for locker 0x50c: 2813/3078138240
BDB2053 Freeing read locks for locker 0x50d: 2813/3078138240
BDB2053 Freeing read locks for locker 0x50e: 2813/3078138240
[root@sdg testrpm]#

Comment 7 Stuart D Gathman 2012-11-19 21:22:20 UTC
test case provided.  Can install testrpm and run repeatedly until fixed.

Comment 8 Stuart D Gathman 2012-11-19 22:36:41 UTC
*** Bug 878201 has been marked as a duplicate of this bug. ***

Comment 9 Panu Matilainen 2012-11-20 04:45:04 UTC
Ooh, nice. Thanks for the very nice reproducer. And education :) I had no idea I had such disk-error simulation tools right under my nose all this time.

The good news is that rpm >= 4.10.1 doesn't crash on it, not using mmap() is where the difference comes from. It doesn't report an actual IO error on it though, just a "cannot compute" question mark but that's still a whole lot better than crashing:

[root@localhost testrpm]# sh testrpm.sh 2>&1|grep /testrpm
S.5....T.    /var/testrpm/testrpm.sh
S.?......    /mnt/testrpm/testrpm.dat
S.5....T.    /var/testrpm/testrpm.sh
S.5....T.    /var/testrpm/testrpm.sh
[root@localhost testrpm]#

Comment 10 Stuart D Gathman 2012-11-20 16:11:43 UTC
Is there a performance difference between mmap and read for verify?

Comment 11 Stuart D Gathman 2012-11-20 17:23:56 UTC
I would like to see an E for Error replace the 5 when there is an IO Error.  Although a '?' meets the specification ("the test could not be performed"), IO errors, when they occur, are something you *really* want to know about.

Comment 12 Fedora End Of Life 2013-07-04 06:24:44 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Panu Matilainen 2013-07-04 07:39:05 UTC
Closing NEXTRELEASE, there are no updates planned for F17 this point anymore.

As for better indication of IO-errors, that's more of an upstream issue.