Description of problem: Started seeing the following trying to backup /var with rsync: rsync: read errors mapping "/var/lib/rpm/Filemd5s": Input/output error (5) rsync: read errors mapping "/var/lib/rpm/Filemd5s": Input/output error (5) ERROR: var/lib/rpm/Filemd5s failed verification -- update discarded. rsync error: some files could not be transferred (code 23) at main.c(977) [sender=2.6.9] kernel reports lots of: attempt to access beyond end of device sda5: rw=0, want=89128964, limit=1044162 Buffer I/O error on device sda5, logical block 44564481 attempt to access beyond end of device sda5: rw=0, want=3018242, limit=1044162 Buffer I/O error on device sda5, logical block 1509120 attempt to access beyond end of device sda5: rw=0, want=4849666, limit=1044162 Buffer I/O error on device sda5, logical block 2424832 attempt to access beyond end of device sda5: rw=0, want=5242882, limit=1044162 Buffer I/O error on device sda5, logical block 2621440 .... There are no hardware disk errors that I can see. smartctl reports a healthy drive. I've been having lots of problems with RPM database corruption in F7+ (see bug 230362), but perhaps the problem has been kernel related (mmap and/or ext3fs) all along. I've saved the image of the /var (sda5) filesystem if desired.
Yes, I'd like to see an image of the filesystem; e2image for now would probably be sufficient, and certainly more portable. Does e2fsck report any problems with the image? (pls run on a copy to keep the corrupted version pristine..) Thanks, -Eric
Created attachment 208531 [details] e2image of /var filesystem I had made a dd image of the system from the rescue media. You can get a copy from http://www.cora.nwra.com/~orion/orizaba.var.sda5.img.bz2 if you want (269M). I made a loopback of the image to take the e2image. e2fsck on the loopback device: [root@saga test]# e2fsck /dev/loop0 e2fsck 1.40.2 (12-Jul-2007) /var1: clean, 803/130560 files, 144333/522080 blocks [root@saga test]# e2fsck -f -y /dev/loop0 e2fsck 1.40.2 (12-Jul-2007) Pass 1: Checking inodes, blocks, and sizes Inode 24526 has illegal block(s). Clear? yes Illegal block #2577 (44564481) in inode 24526. CLEARED. Illegal block #2578 (1509120) in inode 24526. CLEARED. Illegal block #2580 (2424832) in inode 24526. CLEARED. Illegal block #2582 (2621440) in inode 24526. CLEARED. Illegal block #2583 (720896) in inode 24526. CLEARED. Illegal block #2584 (4063232) in inode 24526. CLEARED. Illegal block #2585 (524288) in inode 24526. CLEARED. Illegal block #2586 (8978432) in inode 24526. CLEARED. Illegal block #2587 (2293760) in inode 24526. CLEARED. Illegal block #2588 (9371648) in inode 24526. CLEARED. Illegal block #2589 (4718592) in inode 24526. CLEARED. Too many illegal blocks in inode 24526. Clear inode? yes Restarting e2fsck from the beginning... Pass 1: Checking inodes, blocks, and sizes Extended attribute block 90375 has reference count 17, should be 16. Fix? yes Pass 2: Checking directory structure Entry 'Filemd5s' in /lib/rpm (22442) has deleted/unused inode 24526. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(99193--99200) -(99361--99363) -(99569--99613) -99616 -(99881--99884) -(99894--99919) -(100090--100092) -(100095--100097) -(100788--100792) -(100796--100798) -(100823--100824) -(100828--100831) -101897 -(101902--101904) -101908 -101912 -(101918--101920) -(101924--101928) -(101932--101936) -(101940--101942) -101944 -(101948--101952) -101956 -(102135--102136) -(102140--102144) -(102148--102152) -(102156--102160) -(102164--102168) -(102172--102176) -(102180--102184) -(102188--102192) -(102196--102198) -(102287--102288) -(102292--102296) -(102300--102304) -(102308--102310) -(103502--103504) -(103508--103512) -(103516--103520) -(103524--103526) -(103566--103568) -(103572--103575) -(103630--103631) -(103694--103695) -(103757--103760) -103764 -(103767--103768) -(103772--103776) -(103780--103784) -(103788--103792) -(103796--103800) -(103804--103805) -(103822--103823) -(103886--103888) -(103892--103896) -(103900--103901) -(103950--103952) -(103956--103959) -(104014--104016) -104020 -(104063--104064) -(104068--104070) -(104076--104080) -(104084--104087) -(104142--104144) -(104148--104152) -(104156--104158) -(104206--104208) -(104212--104216) -(104221--104224) -(104228--104232) -(104236--104240) -(104244--104248) -(104252--104256) -(104260--104264) -(104268--104269) -105300 -105304 -105311 -105319 -105326 -105334 -105341 -105348 -105352 -105359 -(108442--108498) -(108505--108631) -(108735--108737) -(108739--108744) -(111016--111098) -(111113--111142) -(111145--111231) -(111241--111279) -(111649--111706) -(112132--112156) -(113245--113281) -(113289--113362) -(113501--113503) -(113506--113537) -(113545--113599) -(113858--113864) -(113867--113872) -(114969--115002) -(115009--115013) -(115017--115019) -(115281--115301) -(115303--115305) -(115307--115472) -(115481--115546) -(115554--115699) -(115703--115712) -(115714--115802) -(118884--118977) -118985 -(118993--119041) -(119049--119051) -119193 -119201 -119209 -119217 -(119225--119227) -119233 -119241 -119249 -(119257--119260) -(119265--119306) -(119313--119314) -119321 -119329 -(119337--119338) -119345 -119353 -(119361--119362) -(119369--119372) -(119377--119395) -(119489--119490) -(119908--119915) -(120420--120422) -(120932--120939) -(120962--120964) -(121444--121445) -(121956--121960) -(122468--122469) -122879 -(123143--123232) -(123236--123278) -(123281--123392) -(123394--123400) -(123402--123408) -(123739--123744) -123779 -(123786--123792) -(125482--125489) -(125945--125946) -125953 -125961 -(125969--125973) -126137 -126145 -(126254--126273) -(126281--126319) -(126321--126322) -126385 -(126481--126495) -(126497--126499) -(126501--126507) -127153 -127312 -127316 -(127493--127496) -(127811--127820) -(127824--127825) -(127827--127828) -(127832--127834) -(127905--127906) -(128172--128176) -128417 -(128886--128888) -(128897--128917) -(128919--128940) -(128945--128946) -128953 -(129204--129208) -129321 -129329 -(129436--129440) -129537 -(130129--130140) -130145 -130241 -130249 -130257 -(130265--130342) -(130349--130369) -132455 -(135177--135225) -(135227--135680) -(138884--138888) -(138890--138896) -138904 -(138906--138912) -(138955--138960) -(138962--138968) -(138970--138976) -(138978--138984) -(138986--138992) -(143406--143425) -143433 -(143441--143443) -143449 -143457 -(143465--143466) -143473 -(143481--143489) -143497 -143505 -143513 -143521 -(143529--143531) -(143537--143540) -(143545--143553) -(143561--143562) -143569 -143577 -(143585--143617) -143625 -143633 -143641 -(143649--143651) -143657 -143665 -143673 -143681 -143689 -(143697--143754) -(143772--143778) -(144089--144096) -(144128--144129) -(144265--144277) -(144281--144283) -(144615--144616) -(144978--144984) -(145493--145526) -(145854--145860) -146009 -(146026--146030) -(146538--146561) -146569 -146577 -(147071--147216) -(147394--147400) -(147738--147744) -147777 -(147785--147788) -147793 -147801 -147809 -147817 -(148201--148344) -148355 -148357 -148359 -148369 -148371 -(148401--148403) -(148405--148407) -(148409--148415) -148417 -(148420--148422) -148425 -(148433--148437) -148439 -(148557--148560) -149681 -150256 -150437 -(152314--152385) -(152393--152397) -(152401--152402) -(152409--152513) -152521 -(152529--152530) -152537 -(152545--152576) -(152649--152770) -(153686--153688) -(196931--196936) -(196938--196944) -(196964--196968) -(196972--196976) -(196978--196984) -(197106--197112) -(198217--198233) -(198561--198645) -198673 -200176 -(200273--200286) -200313 -200705 -(200729--200739) -(201794--201801) -(201804--201808) -(201858--201864) -(201869--201872) -(201874--201880) -(201882--201888) -(201890--201896) -(201898--201904) -(201922--201928) -(201930--201936) -(216661--216664) -(217101--217103) -(217410--217416) -217424 -(217426--217432) -(218128--218195) -(218201--218222) -(220224--220241) -(220249--220269) -(221701--221704) -(221706--221712) -(221717--221720) -(221722--221728) -(221731--221736) -(221743--221744) -(221747--221800) -(222283--222321) -(224340--224385) -(224393--224419) -(226460--226488) -(227503--227516) -(228530--228545) -(228553--228554) -(228561--228566) -228569 -(229641--229642) -229649 -229657 -229665 -(229673--229697) -(229705--229739) -230746 -230753 -(231770--231853) -(231857--231866) -(232866--232960) -(232969--233089) -(233097--233101) -233105 -233113 -233121 -233129 -233137 -233145 -233153 -234063 -234102 -234122 -235087 -(235146--235162) -(236172--236198) -237213 -(237218--237249) -(237257--237261) -237265 -(237273--237281) -237289 -237833 -(237841--237853) -237857 -237865 -237873 -237881 -237889 -(237897--238103) -238928 -238948 -239075 -(239077--239084) -(240101--240103) -(240105--240113) Fix? yes Free blocks count wrong for group #12 (1085, counted=1429). Fix? yes Free blocks count wrong for group #13 (446, counted=1175). Fix? yes Free blocks count wrong for group #14 (629, counted=1442). Fix? yes Free blocks count wrong for group #15 (2409, counted=2997). Fix? yes Free blocks count wrong for group #16 (2077, counted=2635). Fix? yes Free blocks count wrong for group #17 (3561, counted=3993). Fix? yes Free blocks count wrong for group #18 (1174, counted=1715). Fix? yes Free blocks count wrong for group #24 (6051, counted=6285). Fix? yes Free blocks count wrong for group #26 (5445, counted=5596). Fix? yes Free blocks count wrong for group #27 (2166, counted=2430). Fix? yes Free blocks count wrong for group #28 (7085, counted=7571). Fix? yes Free blocks count wrong for group #29 (7686, counted=7935). Fix? yes Free blocks count wrong (377747, counted=383136). Fix? yes Inode bitmap differences: -24526 Fix? yes Free inodes count wrong for group #12 (1994, counted=1995). Fix? yes Free inodes count wrong (129757, counted=129758). Fix? yes /var1: ***** FILE SYSTEM WAS MODIFIED ***** /var1: 802/130560 files (15.8% non-contiguous), 138944/522080 blocks
yeesh... yep, that's a mess. You say you've hit this on several different machines? Are you running stock kernels? Have you ever run any non-fedora kernel here? Any hint of hardware errors, or have you run tainting modules? Have you mounted this ext3 filesystem with any windows driver under windows? Have you shared the underlying block device out over nbd or iscsi or anything? Does memcheck check out ok? Sorry for all the questions, but if you are hitting this a lot, it seems like there might be something unique about your setup. Thanks, -Eric
And that box passes memtest86 ? What hardware - would it be SI SATA and Nvidia chipset ?
I get rpmdb database corruption a lot. This is the first time I've seen filesystem corruption. Of course, the only reason I noticed it was because is got so bad that the kernel was trying to access beyond the end of the device. A quick "e2fsck -f -n" of our running systems doesn't turn up anything suspicious though, except one system (see below). I had thought the rpm and fs corruption might be related, but maybe they aren't. I'm running (and always run) stock kernels, though at the moment I've been running a fedora kernel with an extra autofs patch from Ian Kent. No indication of hardware errors. I'm running memtest86+ at the moment on this machine with no errors so far. No Windows access or any funny sharing. Chipset is Intel 82830 with 82801CAM IDE U100 controller. It's an old Dell PIII laptop. The one other funny system: saga: e2fsck 1.40.2 (12-Jul-2007) saga: Warning! /dev/rootvg/var is mounted. saga: Warning: skipping journal recovery because doing a read-only filesystem check. saga: Pass 1: Checking inodes, blocks, and sizes saga: Deleted inode 47750 has zero dtime. Fix? no saga: saga: Inodes that were part of a corrupted orphan linked list found. Fix? no saga: saga: Inode 47760 was part of the orphaned inode list. IGNORED. saga: Inode 47770 was part of the orphaned inode list. IGNORED. saga: Extended attribute block 126242 has reference count 24, should be 21. Fix? no saga: saga: Extended attribute block 165488 has reference count 2, should be 1. Fix? no saga: saga: Pass 2: Checking directory structure saga: Pass 3: Checking directory connectivity saga: Pass 4: Checking reference counts saga: Pass 5: Checking group summary information saga: Block bitmap differences: -66050 -98821 -102783 -(104011--104014) -104450 -(105554--105557) -(112640--112644) -131709 saga: Fix? no saga: saga: Free blocks count wrong (82222, counted=77968). saga: Fix? no saga: saga: Inode bitmap differences: -47750 -47760 -47770 saga: Fix? no saga: saga: Free inodes count wrong (123247, counted=122855). saga: Fix? no saga: saga: saga: /dev/rootvg/var: ********** WARNING: Filesystem still has errors ********** saga: saga: /dev/rootvg/var: 3729/126976 files (23.9% non-contiguous), 171730/253952 blocks but this is a machine that has been around forever - first installed FC4 around 2005-09-22 and upgraded through 5 and 6 to 7.
Some of the above output, at least, is attributable to running the check on a live filesystem. -Eric
(In reply to comment #6) > Some of the above output, at least, is attributable to running the check on a > live filesystem. Yeah, the block count wrong and inodes count wrong seem to be from that.
Mandriva got several reports from our users who upgraded to latest version and I can now reproduce on a test machine. Here is the information I collected so far: - Most of the time there is only rpmdb corruption but I already got 5 FS corruptions during my tests over a week - We only get that on people who got 1K blocks, but maybe it only means that the ext3 was formatted with the version of mke2fs we were shipping one year ago (due to a bug in the Mandriva installer, people who installed the 2007 version got 1K blocks instead of the usual 4K) - I could not get any corruption with our 2.6.17 kernel or vanilla 2.6.18.8. - I got FS corruption with our different flavors of 2.6.22.9 and with vanilla 2.6.23.1 and 2.6.21.5 - I only got db corruption but no FS corruption using vanilla 2.6.19.5 To get the corruptions I just do 1. rm -f /var/lib/rpm/__db.00* 2. rpm --rebuilddb 3. for i in /var/lib/rpm/[A-Z][a-z]*; do /usr/lib/rpm/rpmdb_verify $i; done 4. If alredy broken go back to step 1. If FS errors just stop... 5. rpm -qa; for i in /var/lib/rpm/[A-Z][a-z]*; do /usr/lib/rpm/rpmdb_verify $i; done 6. urpme --auto esound (uninstall esound and everything needing it) 7. urpmi --auto task-gnome (install gnome and its dependencies) (urpme and urpmi verify the db by themselves)
Just FYI, here's the Mandriva bug report : http://qa.mandriva.com/show_bug.cgi?id=32547
Pascal, does fsck find any problems w/ the filesystem? If so can you gather an e2image? What kernel version are you using? I'll try your testcase...
Created attachment 260841 [details] e2image after corruption I couldn't physically access the machine for 2 days (due to a strike in France...) so I collected the info of latest corruption "live". This is on a vanilla 2.6.23.1. Kernel log says : EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 102, count = 1 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 89, count = 1 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 104, count = 1 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in datazone - block = 65404929, count = 1 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in datazone - block = 87097344, count = 1 EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 2883584 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in datazone - block = 87097344, count = 1 EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 2949120 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in datazone - block = 87097344, count = 1 EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 3014656 EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in datazone - block = 87097344, count = 1 EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 3080192 fsck output : fsck -n /dev/sda1 fsck 1.40.2 (12-Jul-2007) e2fsck 1.40.2 (12-Jul-2007) Warning! /dev/sda1 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. /dev/sda1 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? no Inode 973827 was part of the orphaned inode list. IGNORED. Inode 973828 was part of the orphaned inode list. IGNORED. Inode 973834 was part of the orphaned inode list. IGNORED. Inode 973835 was part of the orphaned inode list. IGNORED. Inode 973836 was part of the orphaned inode list. IGNORED. Deleted inode 973839 has zero dtime. Fix? no Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 825370: 6881280 Multiply-claimed block(s) in inode 825394: 6750208 Multiply-claimed block(s) in inode 825396: 6815744 Multiply-claimed block(s) in inode 840740: 6815744 Multiply-claimed block(s) in inode 840762: 6750208 Multiply-claimed block(s) in inode 854021: 6881280 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 6 inodes containing multiply-claimed blocks.) File /var/tmp/test (inode #825370, mod time Tue Nov 13 00:22:52 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/lib/rpm/Packages (inode #854021, mod time Fri Nov 16 03:33:33 2007) Clone multiply-claimed blocks? no Delete file? no File /var/lib/urpmi/hdlist.Contrib.cz (inode #825394, mod time Fri Nov 9 19:18:04 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/lib/rpmrebuilddb.8472/Filedigests (inode #840762, mod time Fri Nov 16 08:06:31 2007) Clone multiply-claimed blocks? no Delete file? no File /var/lib/urpmi/hdlist.Non-free.cz (inode #825396, mod time Fri Nov 9 19:18:06 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/lib/rpmrebuilddb.8472/Requireversion (inode #840740, mod time Fri Nov 16 08:06:31 2007) Clone multiply-claimed blocks? no Delete file? no File /var/lib/rpmrebuilddb.8472/Requireversion (inode #840740, mod time Fri Nov 16 08:06:31 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/lib/urpmi/hdlist.Non-free.cz (inode #825396, mod time Fri Nov 9 19:18:06 2007) Clone multiply-claimed blocks? no Delete file? no File /var/lib/rpmrebuilddb.8472/Filedigests (inode #840762, mod time Fri Nov 16 08:06:31 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/lib/urpmi/hdlist.Contrib.cz (inode #825394, mod time Fri Nov 9 19:18:04 2007) Clone multiply-claimed blocks? no Delete file? no File /var/lib/rpm/Packages (inode #854021, mod time Fri Nov 16 03:33:33 2007) has 1 multiply-claimed block(s), shared with 1 file(s): /var/tmp/test (inode #825370, mod time Tue Nov 13 00:22:52 2007) Clone multiply-claimed blocks? no Delete file? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +1 +89 +102 +104 +65536 +262144 +327680 +393216 +786432 +2819840 +3145728 +3211264 +3276800 +3342336 +3407872 +3473408 +3538944 +3604480 +3670016 +3735552 +3801088 +3866624 +3932160 +3997696 +4063232 +4128768 +4194304 +4325376 +4390912 +4456448 +4587520 +4653056 +4784128 +4915200 +4980736 +5111808 +5177344 +5242880 +5373952 +6619136 +6684672 +6706889 -6706974 -(6984383--6984384) -(6984505--6984506) -6984772 -(6993462--6993463) -6993986 -6994241 -(6994568--6994569) -(6994982--6995014) -6995026 -6995459 -7792130 -7794180 -7796225 -7797250 -7797321 -7797761 Fix? no Free blocks count wrong (4370391, counted=4379413). Fix? no Inode bitmap differences: -(973827--973828) -(973834--973836) -973839 Fix? no Free inodes count wrong (845472, counted=845468). Fix? no /dev/sda1: ********** WARNING: Filesystem still has errors ********** /dev/sda1: 178528/1024000 files (2.3% non-contiguous), 3814693/8185084 blocks
Created attachment 260851 [details] e2image after a previous corruption (under 2.6.22.9)
Pascal, has this only been seen in 2.6.22.9 and later, or have you seen it before that? If it looks like 2.6.22.9+, since you have a way to reproduce this, can you try w/ this patch backed out: http://lkml.org/lkml/2007/9/17/266 that patch is in 2.6.22.9, and 2.6.23, but not 2.6.22.8 Orion, what kernel did you originally hit this on? Thanks, -Eric
(In reply to comment #13) > Orion, what kernel did you originally hit this on? Think it was 2.6.23.1. Haven't seen FS corruption again though myself.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage Are you still seeing these errors - comment #14 indicates this may now be reslved for you. If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged.