Bug 163470

Summary: "bio too big device md1" with data corruption
Product: [Fedora] Fedora Reporter: Frode Tennebø <frodet>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: davej, jonstanley, pfrields, wtogami
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: MassClosed
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-20 04:39:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frode Tennebø 2005-07-17 19:33:22 UTC
Description of problem:

I have the following setup I want to realise:

/dev/md1 (raid1) ---|-- /dev/sdd1 (36.7G)
                    |-- /dev/md0 (raid0) --|-- /dev/sda1 (18.3G)
                                           |-- /dev/sdc1 (18.3G)

On top of md1 I have LVM. I hot added /dev/md0 and during the process I got:

Jul 17 00:38:07 leia kernel: md: bind<md0>
Jul 17 00:38:07 leia kernel: RAID1 conf printout:
Jul 17 00:38:07 leia kernel:  --- wd:1 rd:2
Jul 17 00:38:07 leia kernel:  disk 0, wo:0, o:1, dev:sdd1
Jul 17 00:38:07 leia kernel:  disk 1, wo:1, o:1, dev:md0
Jul 17 00:38:07 leia kernel: ..<6>md: syncing RAID array md1
Jul 17 00:38:07 leia kernel: md: minimum _guaranteed_ reconstruction speed: 1000 
KB/sec/disc.
Jul 17 00:38:07 leia kernel: md: using maximum available idle IO bandwith (but 
not more than 200000 KB/sec) for reconstruction.
Jul 17 00:38:07 leia kernel: md: using 128k window, over a total of 35842944 
blocks.
Jul 17 00:41:56 leia kernel: bio too big device md1 (56 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (32 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (136 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (64 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (224 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (216 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (144 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (32 > 8)
Jul 17 00:41:56 leia kernel: bio too big device md1 (96 > 8)
Jul 17 00:45:00 leia kernel: bio too big device md1 (64 > 8)
Jul 17 00:45:00 leia kernel: bio too big device md1 (184 > 8)
:
:
Jul 17 02:30:36 leia kernel: bio too big device md1 (56 > 8)
Jul 17 02:31:11 leia kernel: bio too big device md1 (48 > 8)
Jul 17 02:35:00 leia kernel: bio too big device md1 (16 > 8)
Jul 17 02:36:21 leia kernel: bio too big device md1 (32 > 8)
Jul 17 02:38:27 leia kernel: md: md1: sync done.
Jul 17 02:38:27 leia kernel: RAID1 conf printout:
Jul 17 02:38:27 leia kernel:  --- wd:2 rd:2
Jul 17 02:38:27 leia kernel:  disk 0, wo:0, o:1, dev:sdd1
Jul 17 02:38:27 leia kernel:  disk 1, wo:0, o:1, dev:md0
Jul 17 02:39:08 leia kernel: bio too big device md1 (64 > 8)
Jul 17 02:39:08 leia kernel: bio too big device md1 (16 > 8)
Jul 17 02:39:08 leia kernel: bio too big device md1 (144 > 8)
Jul 17 02:39:08 leia kernel: bio too big device md1 (16 > 8)
:
:

Now, if I install new software to the drive I also get:

Jul 17 18:27:54 leia kernel: bio too big device md1 (64 > 8)
Jul 17 18:27:54 leia kernel: bio too big device md1 (32 > 8)
Jul 17 18:27:54 leia kernel: bio too big device md1 (96 > 8)
Jul 17 18:27:54 leia kernel: bio too big device md1 (248 > 8)
Jul 17 18:27:57 leia last message repeated 30 times
:

This files appear normal:

[ft@leia bin]$ date
Sun Jul 17 18:29:13 CEST 2005
[ft@leia bin]$ ls -l opera
-rwxr-xr-x  1 ft ft 5143 Jul 17 18:28 opera
[ft@leia bin]$ file opera
opera: Bourne shell script text executable
[ft@leia bin]$ head -4 opera
#!/bin/sh

# Location of the Opera binaries
OPERA_BINARYDIR=/opt/opera/8.02p1/lib/opera/8.02-20050705.1


However, after some time:

[ft@leia bin]$ date
Sun Jul 17 21:29:27 CEST 2005
[ft@leia bin]$ ls -l opera
-rwxr-xr-x  1 ft ft 5143 Jul 17 19:43 opera
[ft@leia bin]$ file opera
opera: data
[ft@leia bin]$ head -4 opera
@charset "UTF-8";
/*
Name: Disable tables
Version: 1.01


Version-Release number of selected component (if applicable):

[root@leia iptables]# mdadm -V
mdadm - v1.11.0 - 11 April 2005

[root@leia iptables]# uname -a
Linux leia 2.6.12-prep #1 Sat Jul 16 07:35:19 CEST 2005 i686 i686 i386 GNU/Linux
(It's kernel-2.6.12-1.1398_FC4, but I had to add the EATA driver which is 
missing)

[root@leia iptables]# lvm.static version
  LVM version:     2.01.08 (2005-03-22)
  Library version: 1.01.01 (2005-03-29)
  Driver version:  4.4.0


How reproducible:
This has happened all three times I have tried hot adding /dev/md0. The first 
two times were with kernel-2.6.10-1.760_FC3 (but with the EATA driver enabled). 

Steps to Reproduce:
(from memory)
1. fdisk /dev/sdd .....
2. mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdd1 missing
3. pvcreate /dev/md1
4. vgcreate vg1 /dev/md1
5. lvcreate -L9G -nopt vg1
6. mount /dev/vg1/opt /opt
7. rsync -a /mnt/tmp /opt #(a backup is kept here)
8. umount /mnt/tmp
9. fdisk /dev/sd[ac] ....
10. mdadm --create /dev/md0 --level=0 --radi-device=2 /dev/sda1 /dev/sdc1
11. mdadm /dev/md1 -a /dev/md0

At this point it appears that te raid itself, ie. that data already residing on 
md1 before hot-adding md0, is intact, ie. no data corruption on the existing 
data even though I get tons of "bio too big").

12. install a tar-file or eg. 
    dd if=/dev/zero of=./test bs=65536 count=1000

produces an equal amount of "bio too big".


Actual results:

Expected results:

Additional info:

[root@leia iptables]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid1 md0[1] sdd1[0]
      35842944 blocks [2/2] [UU]

md0 : active raid0 sdc1[1] sda1[0]
      35872896 blocks 8k chunks

unused devices: <none>

[root@leia iptables]# more /etc/mdadm.conf
DEVICE /dev/sd[acd]1
ARRAY /dev/md0 level=0 devices=/dev/sda1,/dev/sdc1

DEVICE /dev/md0
ARRAY /dev/md1 level=1 devices=/dev/sdd1,/dev/md0

MAILADDR ft
PROGRAM 96 bytes

[root@leia linux]# fdisk -l /dev/sd[acd] /dev/md[01]

Disk /dev/sda: 18.3 GB, 18373205504 bytes
255 heads, 63 sectors/track, 2233 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1        2233    17936541   fd  Linux raid autodetect

Disk /dev/sdc: 18.3 GB, 18373205504 bytes
255 heads, 63 sectors/track, 2233 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        2233    17936541   fd  Linux raid autodetect

Disk /dev/sdd: 36.7 GB, 36703933952 bytes
64 heads, 32 sectors/track, 35003 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       35003    35843056   fd  Linux raid autodetect

Disk /dev/md0: 36.7 GB, 36733845504 bytes
2 heads, 4 sectors/track, 8968224 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/md1: 36.7 GB, 36703174656 bytes
2 heads, 4 sectors/track, 8960736 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Comment 1 Frode Tennebø 2005-07-17 19:39:05 UTC
*** Bug 163469 has been marked as a duplicate of this bug. ***

Comment 2 Dave Jones 2005-08-04 05:20:02 UTC
kernel-2.6.12-1.1411_FC4 just got pushed out to updates-testing.  This has one
block layer fix that could explain this.


Comment 3 Frode Tennebø 2005-08-07 21:53:46 UTC
Sorry - this took some time. Not sure about the data corruption since I've just 
started the raid reconstruction and this coupled with the inherent Time Before 
Corruption means that it will take some longer. I just wanted to let you know 
that the errors (or whatever) are still there:

Aug  7 23:26:26 leia kernel: md: bind<md0>
Aug  7 23:26:26 leia kernel: RAID1 conf printout:
Aug  7 23:26:26 leia kernel:  --- wd:1 rd:2
Aug  7 23:26:26 leia kernel:  disk 0, wo:0, o:1, dev:sdd1
Aug  7 23:26:26 leia kernel:  disk 1, wo:1, o:1, dev:md0
Aug  7 23:26:26 leia kernel: ..<6>md: syncing RAID array md1
Aug  7 23:26:26 leia kernel: md: minimum _guaranteed_ reconstruction speed: 1000 
KB/sec/disc.
Aug  7 23:26:26 leia kernel: md: using maximum available idle IO bandwith (but 
not more than 200000 KB/sec) for reconstruction.
Aug  7 23:26:26 leia kernel: md: using 128k window, over a total of 35842944 
blocks.
Aug  7 23:26:26 leia kernel: cfq: depth 4 reached, tagging now on
Aug  7 23:26:26 leia kernel: cfq: depth 4 reached, tagging now on
:
Aug  7 23:34:57 leia kernel: bio too big device md1 (16 > 8)
Aug  7 23:40:00 leia kernel: bio too big device md1 (16 > 8)
Aug  7 23:45:37 leia kernel: bio too big device md1 (16 > 8)
:


Comment 4 Dave Jones 2005-09-30 06:31:35 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.


Comment 5 Frode Tennebø 2005-10-15 23:06:08 UTC
I still get these:

Oct 14 23:49:24 leia kernel: bio too big device md1 (16 > 8)
Oct 14 23:49:25 leia kernel: bio too big device md1 (64 > 8)
Oct 14 23:49:25 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia last message repeated 12 times
Oct 14 23:49:26 leia kernel: bio too big device md1 (136 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (136 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (144 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (184 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (152 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (88 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (192 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (136 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (40 > 8)
Oct 14 23:49:26 leia kernel: bio too big device md1 (248 > 8)
:
:


Comment 6 Frode Tennebø 2005-10-15 23:10:37 UTC
BTW: I have also recently experienced strange data corruption in a database I 
use. Though, this could have happened before the update in kernel-2.6.12-1.
1411_FC4 (as mentioned above). I will try to figure this out soonish.

Comment 7 Dave Jones 2005-11-10 19:30:51 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 8 Stephen Tweedie 2005-12-13 21:05:26 UTC
*** Bug 170964 has been marked as a duplicate of this bug. ***

Comment 9 Dave Jones 2006-02-03 05:33:27 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 10 John Thacker 2006-05-05 01:22:44 UTC
Closing per previous comment.

Comment 11 Frode Tennebø 2006-05-12 21:44:46 UTC
Sorry for the delay, but life's been busy. I have upgraded to FC5 and stil get:

May 11 22:28:27 leia kernel: bio too big device md1 (48 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (16 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (80 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (88 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (32 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (24 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (32 > 8)
May 11 22:28:27 leia kernel: bio too big device md1 (24 > 8)
May 11 22:28:28 leia kernel: bio too big device md1 (80 > 8)
May 11 22:28:28 leia kernel: bio too big device md1 (16 > 8)
May 11 22:28:31 leia kernel: bio too big device md1 (24 > 8)
May 11 22:28:31 leia kernel: bio too big device md1 (56 > 8)
May 11 22:28:31 leia kernel: bio too big device md1 (80 > 8)
May 11 22:28:31 leia kernel: bio too big device md1 (16 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (48 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (64 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (24 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (120 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (64 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (64 > 8)
May 11 22:29:33 leia kernel: bio too big device md1 (128 > 8)
May 11 22:57:47 leia kernel: bio too big device md1 (32 > 8)
May 11 22:57:47 leia kernel: bio too big device md1 (24 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (16 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (16 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (24 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (16 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (40 > 8)
May 11 22:58:15 leia kernel: bio too big device md1 (16 > 8)

I'm running kernel-2.6.16-1.2096_FC5.

Comment 12 Dave Jones 2006-10-16 18:12:00 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 13 Jon Stanley 2008-01-20 04:39:03 UTC
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.