Bug 170964 - attempt to access beyond end of device, corrupted filesystem
attempt to access beyond end of device, corrupted filesystem
Status: CLOSED DUPLICATE of bug 163470
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-16 10:02 EDT by Frode Tennebø
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-12-13 16:04:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frode Tennebø 2005-10-16 10:02:10 EDT
From Bugzilla Helper:
User-Agent: Opera/8.5 (X11; Linux i686; U; en)

Description of problem:
While looking into bug 163470 I created a big file on my device. This resulted 
in:

Oct 16 04:43:33 leia kernel: attempt to access beyond end of device
Oct 16 04:43:33 leia kernel: dm-0: rw=0, want=26690829744, limit=18874368
Oct 16 04:43:33 leia kernel: EXT3-fs error (device dm-0): ext3_get_inode_loc: 
unable to read inode block - inode=81926, block=3336353717
Oct 16 04:43:33 leia kernel: Aborting journal on device dm-0.
Oct 16 04:43:34 leia kernel: EXT3-fs error (device dm-0) in 
ext3_reserve_inode_write: IO failure
Oct 16 04:43:34 leia kernel: EXT3-fs error (device dm-0) in ext3_dirty_inode: IO 
failure
Oct 16 04:43:34 leia kernel: ext3_abort called.
Oct 16 04:43:34 leia kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: 
Detected aborted journal
Oct 16 04:43:34 leia kernel: Remounting filesystem read-only
Oct 16 04:45:00 leia kernel: bio too big device md1 (16 > 8)

This resulted in (part of) the filesystem becoming corrupt:

[root@leia ~]# ls -ltr /opt
total 1603184
?---------   ? ?    ?             ?            ? xrt
?---------   ? ?    ?             ?            ? wp8
?---------   ? ?    ?             ?            ? vmware
?---------   ? ?    ?             ?            ? uimx
?---------   ? ?    ?             ?            ? toolworks
?---------   ? ?    ?             ?            ? tmp
?---------   ? ?    ?             ?            ? textmakerbeta
?---------   ? ?    ?             ?            ? src
?---------   ? ?    ?             ?            ? sniff
?---------   ? ?    ?             ?            ? RealPlayer7
?---------   ? ?    ?             ?            ? openMSX
?---------   ? ?    ?             ?            ? office60b
?---------   ? ?    ?             ?            ? office52
?---------   ? ?    ?             ?            ? nvg
?---------   ? ?    ?             ?            ? netscape
?---------   ? ?    ?             ?            ? ncd
?---------   ? ?    ?             ?            ? motif
?---------   ? ?    ?             ?            ? lost+found
?---------   ? ?    ?             ?            ? imdb
?---------   ? ?    ?             ?            ? hds
?---------   ? ?    ?             ?            ? hdb
?---------   ? ?    ?             ?            ? games.zip
?---------   ? ?    ?             ?            ? Games
?---------   ? ?    ?             ?            ? G
?---------   ? ?    ?             ?            ? edh
?---------   ? ?    ?             ?            ? dted
?---------   ? ?    ?             ?            ? bxpro-6.0
?---------   ? ?    ?             ?            ? azureus
?---------   ? ?    ?             ?            ? applix
drwxr-xr-x   6 ft   ft         4096 Aug  2  2004 acrobat5
drwxr-xr-x   6 ft   ft         4096 Oct  1 14:34 opera
drwxr-xr-x  10 root root       4096 Oct 16 01:11 arkeia
-rw-r--r--   1 root root 1640030208 Oct 16 04:43 home.tar.bz2
[root@leia ~]# ls -l /opt/applix
ls: /opt/applix: Input/output error

Now, the big problem is that it appeas that I don't have a backup of this 
partation since it's the backup database which is corrupt (ref. bug 163470), so 
I'm a bit fscked here. 

I would appreciate any help in getting out of this fix.


Version-Release number of selected component (if applicable):
kernel-2.6.13-1.1526_FC4

How reproducible:
Didn't try

Steps to Reproduce:

  

Additional info:

[root@leia ~]# df -k /opt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg1-opt    9289080 -73786976294836574068  10449620 101% /opt

[root@leia ~]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid1 md0[1] sdd1[0]
      35842944 blocks [2/2] [UU]

md0 : active raid0 sdc1[1] sda1[0]
      35856832 blocks 64k chunks

unused devices: <none>

[root@leia ~]# fdisk -l /dev/sdc /dev/sda /dev/md0 /dev/sdd

Disk /dev/sdc: 18.3 GB, 18351958528 bytes
255 heads, 63 sectors/track, 2231 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        2231    17920476   fd  Linux raid autodetect

Disk /dev/sda: 18.3 GB, 18373205504 bytes
255 heads, 63 sectors/track, 2233 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1        2233    17936541   fd  Linux raid autodetect

Disk /dev/md0: 36.7 GB, 36717395968 bytes
2 heads, 4 sectors/track, 8964208 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/sdd: 36.7 GB, 36703933952 bytes
64 heads, 32 sectors/track, 35003 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       35003    35843056   fd  Linux raid autodetect


[root@leia ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               vg1
  PV Size               34.18 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              8750
  Free PE               6446
  Allocated PE          2304
  PV UUID               cW7juX-w3Sz-uXkD-hvqk-Kz3N-gHt7-CPdXYz

[root@leia ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/vg1/opt
  VG Name                vg1
  LV UUID                9qqaNV-qPmT-l1Io-y3Gq-mxbL-3tMt-ttQ7AX
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                9.00 GB
  Current LE             2304
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

[root@leia ~]#
Comment 1 Frode Tennebø 2005-10-21 19:22:01 EDT
I did an fsck which did the trick of restoring the file system:

[root@leia etc]# fsck /opt
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
Group descriptors look bad... trying backup blocks...
/dev/vg1/opt contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #5 (13669, counted=1565).
Free blocks count wrong for group #6 (1552, counted=0).
Free blocks count wrong for group #7 (1551, counted=0).
Free blocks count wrong for group #8 (1538, counted=0).
Free blocks count wrong for group #9 (7060, counted=0).
Free blocks count wrong for group #10 (9326, counted=7).
Free blocks count wrong for group #11 (30654, counted=6).
Free blocks count wrong for group #12 (29893, counted=13).
Free blocks count wrong for group #13 (31451, counted=11).
Free blocks count wrong for group #14 (26845, counted=6).
Free blocks count wrong for group #15 (30602, counted=20).
Free blocks count wrong for group #16 (32035, counted=11).
Free blocks count wrong for group #17 (26511, counted=6).
Free blocks count wrong for group #18 (24510, counted=13).
Free blocks count wrong for group #19 (27201, counted=6).
Free blocks count wrong for group #20 (30318, counted=13).
Free blocks count wrong for group #21 (30860, counted=19).
Free blocks count wrong for group #22 (30993, counted=9).
Free blocks count wrong for group #23 (31198, counted=15270).
Free blocks count wrong for group #25 (31182, counted=19731).
Free blocks count wrong (814238, counted=401995).
Free inodes count wrong for group #5 (15939, counted=15938).
Free inodes count wrong for group #25 (16319, counted=14860).
Directories count wrong for group #25 (0, counted=35).
Free inodes count wrong (1123380, counted=1121920).
/dev/vg1/opt: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg1/opt: 57728/1179648 files (0.7% non-contiguous), 1957301/2359296 blocks

I still think it was a bit drastic of the FS to become corrupt that easily. 
Here's from the reconstruction of the md (I'm creating md0 from sda1 and sdc1):

Oct 13 22:24:25 leia kernel: md: md0 stopped.
Oct 13 22:24:25 leia kernel: md: unbind<sda1>
Oct 13 22:24:25 leia kernel: md: export_rdev(sda1)
Oct 13 22:24:39 leia kernel: md: bind<sda1>
Oct 13 22:24:39 leia kernel: md: bind<sdc1>
Oct 13 22:24:39 leia kernel: md0: setting max_sectors to 128, segment boundary 
to 32767
Oct 13 22:24:39 leia kernel: raid0: looking at sdc1
Oct 13 22:24:39 leia kernel: raid0:   comparing sdc1(17920384) with 
sdc1(17920384)
Oct 13 22:24:39 leia kernel: raid0:   END
Oct 13 22:24:39 leia kernel: raid0:   ==> UNIQUE
Oct 13 22:24:39 leia kernel: raid0: 1 zones
Oct 13 22:24:39 leia kernel: raid0: looking at sda1
Oct 13 22:24:39 leia kernel: raid0:   comparing sda1(17936448) with 
sdc1(17920384)
Oct 13 22:24:39 leia kernel: raid0:   NOT EQUAL
Oct 13 22:24:39 leia kernel: raid0:   comparing sda1(17936448) with 
sda1(17936448)
Oct 13 22:24:39 leia kernel: raid0:   END
Oct 13 22:24:39 leia kernel: raid0:   ==> UNIQUE
Oct 13 22:24:39 leia kernel: raid0: 2 zones
Oct 13 22:24:39 leia kernel: raid0: FINAL 2 zones
Oct 13 22:24:39 leia kernel: raid0: zone 1
Oct 13 22:24:39 leia kernel: raid0: checking sda1 ... contained as device 0
Oct 13 22:24:39 leia kernel:   (17936448) is smallest!.
Oct 13 22:24:39 leia kernel: raid0: checking sdc1 ... nope.
Oct 13 22:24:39 leia kernel: raid0: zone->nb_dev: 1, size: 16064
Oct 13 22:24:39 leia kernel: raid0: current zone offset: 17936448
Oct 13 22:24:39 leia kernel: raid0: done.
Oct 13 22:24:39 leia kernel: raid0 : md_size is 35856832 blocks.
Oct 13 22:24:39 leia kernel: raid0 : conf->hash_spacing is 35840768 blocks.
Oct 13 22:24:39 leia kernel: raid0 : nb_zone is 2.
Oct 13 22:24:39 leia kernel: raid0 : Allocating 8 bytes for hash.

Then I'm hot-adding md0 with sdd1:

Oct 13 22:25:55 leia kernel: md: bind<md0>
Oct 13 22:25:55 leia kernel: RAID1 conf printout:
Oct 13 22:25:55 leia kernel:  --- wd:1 rd:2
Oct 13 22:25:55 leia kernel:  disk 0, wo:0, o:1, dev:sdd1
Oct 13 22:25:55 leia kernel:  disk 1, wo:1, o:1, dev:md0
Oct 13 22:25:55 leia kernel: md: syncing RAID array md1
Oct 13 22:25:55 leia kernel: md: minimum _guaranteed_ reconstruction speed: 1000 
KB/sec/disc.
Oct 13 22:25:55 leia kernel: md: using maximum available idle IO bandwith (but 
not more than 200000 KB/sec) for reconstruction.
Oct 13 22:25:55 leia kernel: md: using 128k window, over a total of 35842944 
blocks.
Comment 2 Dave Jones 2005-11-10 14:39:08 EST
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.
Comment 3 Stephen Tweedie 2005-12-13 16:04:59 EST
This looks like a request for help in recovering the fs, not an actual bug
report: the bug here appears to be exactly the same one as in 163470.  

To be honest, if the underlying driver layer is coming up with IO errors, then
it's really not possible for the fs to guard against all forms of corruption. 
Lost data is lost data.  All the fs can do is to spot the problem, Do No Harm by
taking the journal offline and turning the mount point readonly, and then defer
recovery to fsck: and that appears to be working in this case.

So I'll close this report as a dup, but feel free to reopen if there is specific
filesystem behaviour that's occuring here that needs to be corrected.

*** This bug has been marked as a duplicate of 163470 ***

Note You need to log in before you can comment on or make changes to this bug.