513169 – /boot fsck error with Intel BIOS RAID using 2 raid10 sets sharing disks

Bug 513169 - /boot fsck error with Intel BIOS RAID using 2 raid10 sets sharing disks

Summary: /boot fsck error with Intel BIOS RAID using 2 raid10 sets sharing disks

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	mkinitrd
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Hans de Goede
QA Contact:	Release Test Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-07-22 11:59 UTC by Krzysztof Wojcik
Modified:	2010-07-01 21:53 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-07-01 21:53:02 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot (62.91 KB, image/jpeg) 2009-07-22 11:59 UTC, Krzysztof Wojcik	no flags	Details
screenshot 1 (79.95 KB, image/jpeg) 2009-07-22 12:00 UTC, Krzysztof Wojcik	no flags	Details
configuration (1.47 KB, text/plain) 2009-07-22 12:00 UTC, Krzysztof Wojcik	no flags	Details
Anaconda logs (64.94 KB, application/x-gzip) 2009-07-23 09:15 UTC, Krzysztof Wojcik	no flags	Details
PATCH: fix dmraid 10 boot ugly (but harmless) error messages (658 bytes, patch) 2009-07-29 14:10 UTC, Hans de Goede	no flags	Details \| Diff
the log files (69.91 KB, application/x-gzip) 2009-08-12 13:32 UTC, Rafal Marszewski	no flags	Details
View All

Description Krzysztof Wojcik 2009-07-22 11:59:43 UTC

Created attachment 354679 [details]
screenshot

Steps to reproduce:
Create two raid 10 on the same sets of hard drives;
a) Raid10, strip size 64, size: 150GB
b) Raid10, strip size 64, size: <rest>

Try to install 
boot on 1st raid (size 23GB)
root on LVM on 2nd raid (size 190GB)
swap on LVM - size 7GB
Verify that after 1 reboot, the root intervention is needed to fix a unexpected inconsistency.
(reproduced twice)
(for more information see kickstart file and screen shot)

Comment 1 Krzysztof Wojcik 2009-07-22 12:00:13 UTC

Created attachment 354680 [details]
screenshot 1

Comment 2 Krzysztof Wojcik 2009-07-22 12:00:43 UTC

Created attachment 354681 [details]
configuration

Comment 3 Joel Andres Granados 2009-07-22 12:19:59 UTC

Does this work on f11 or f10?  This is relevant as the fix might be there.

Comment 4 Krzysztof Wojcik 2009-07-22 13:29:35 UTC

We have not test any Fedora in this test case.

Comment 5 Chris Lumens 2009-07-22 14:25:38 UTC

Please attach /var/log/anaconda.log to this bug report.

In addition yes, in order to verify whether or not this will be a RHEL6 bug, it's worth testing with Fedora too.

Comment 6 Krzysztof Wojcik 2009-07-23 09:15:19 UTC

Created attachment 354828 [details]
Anaconda logs

Comment 7 Hans de Goede 2009-07-27 11:21:25 UTC

Hi,

(In reply to comment #0)
> Verify that after 1 reboot, the root intervention is needed to fix a unexpected
> inconsistency.
> (reproduced twice)
> (for more information see kickstart file and screen shot)  

I've some questions:

1) How do you verify this inconsistency (which command do you use,
   what is the output and what should it have been).

2) Do you mean this inconsistency happens of the first reboot of the installed
   system. Iow the first boot after install is ok, but when you then reboot
   the inconsistency happens (so the 2nd boot of the installed system ?

   Or do you mean that the first boot after install (so after rebooting from the
   installer) this inconsistency happens ?

As for the errors in the screenshot you've attached I do not think this is related to the cause of this. This is an mkinitrd bug, can you please file a separate bug for this ?

Comment 8 Krzysztof Wojcik 2009-07-29 08:53:55 UTC

1. I haven't verified the inconsistency. This is the information I have after installing the OS.
2. This inconsistency happens during the first booting after the installation.

Comment 9 Hans de Goede 2009-07-29 09:54:15 UTC

(In reply to comment #8)
> 1. I haven't verified the inconsistency. This is the information I have after
> installing the OS.

IOW, when you say "inconsistency" you mean the error messages shown during boot, which are shown in the screenshot ?

Comment 10 Rafal Marszewski 2009-07-29 11:55:51 UTC

Hi,
Yes, I mean the error messages shown during the boot.

Comment 11 Hans de Goede 2009-07-29 14:08:27 UTC

(In reply to comment #10)
> Hi,
> Yes, I mean the error messages shown during the boot.  

Ah, ok well those have nothing to do with an inconsistency (or uncleanness) of the raid set. These are caused by an error in the mkinitrd script which
causes it to try and explicitly activate the subsets of the (nested) raid 10
set, these errors are ugly, but otherwise 100% harmless.

I'm changing the component and description to match, and I'll give this a devel ack for fixing these ugly (but totally harmless) messages for 5.5

I'll attach a (already tested) patch against mkinitrd for fixing this.

One last note this happens with any dmraid raid10 setup, it is not necessary to create 2 sets using the same disk to reproduce this.

Comment 12 Hans de Goede 2009-07-29 14:10:00 UTC

Created attachment 355550 [details]
PATCH: fix  dmraid 10 boot ugly (but harmless) error messages

Comment 13 Rafal Marszewski 2009-08-11 16:04:47 UTC

Hi Hans
Sorry for delay (I was on vacation). The problem is that after first boot, the user must give root password and fix the filesystem (see screenshot 1), so the problem is not only with message (which I agree is harmless), but we can't login after first boot withour repairing the system.

Comment 14 Hans de Goede 2009-08-11 17:36:01 UTC

(In reply to comment #13)
> Hi Hans
> Sorry for delay (I was on vacation). The problem is that after first boot, the
> user must give root password and fix the filesystem (see screenshot 1), so the
> problem is not only with message (which I agree is harmless), but we can't
> login after first boot withour repairing the system.  

Hmm, is this reproducible? IOW does this happen each time you do an install as described in the description of this bug ?

I've tried to reproduce this, but for me things works fine.

Comment 15 Rafal Marszewski 2009-08-12 09:05:58 UTC

Yes, We could reproduce this several times. Please notice, that root file system is on the second volume, while the rest of the files are on 1st volume. 
There is no problem when everything is on the one volume.

Comment 16 Hans de Goede 2009-08-12 09:10:12 UTC

(In reply to comment #15)
> Yes, We could reproduce this several times. Please notice, that root file
> system is on the second volume, while the rest of the files are on 1st volume. 
> There is no problem when everything is on the one volume.  

Ah, ok, so / was on the second volume and /boot was on the first ? and where
was grub installed ?

Comment 17 Rafal Marszewski 2009-08-12 09:14:46 UTC

bootloader --location=mbr --driveorder=mapper/isw_cidahbjeeg_r10d4n0s64-150 --append=pci=nommconf rhgb quiet

clearpart --initlabel --linux --drives=mapper/isw_cidahbjeeg_r10d4n0s64
part pv.21 --size=0 --grow --ondisk=mapper/isw_cidahbjeeg_r10d4n0s64
volgroup VolGroup00 --pesize=32768 pv.21
logvol swap --fstype swap --name=LogVol00 --vgname=VolGroup00 --size=7680 
clearpart --initlabel --linux --drives=mapper/isw_cidahbjeeg_r10d4n0s64-150
part /boot --fstype ext2 --size=23859 --ondisk=mapper/isw_cidahbjeeg_r10d4n0s64-150
logvol / --fstype ext3 --name=LogVol01 --vgname=VolGroup00 --size=190873 

This is a part of kickstart file (see attached configuration file)

Comment 18 Hans de Goede 2009-08-12 10:10:45 UTC

Ok,

I've done the exactly same except that my first raid10 set was 100gb as my disks aren't that large and my second set was only 50gb.

Other then that all was the same, and everything works fine. Could it be that you are perhaps having hardware issues ?

Are there any IO errors during the installation / during boot (check dmesg from tty2 during the install) ?

Have you tried this on another machine / tried replacing your harddrives and sata cables ?

Comment 19 Rafal Marszewski 2009-08-12 13:30:37 UTC

We have just repeated the same steps (this is automatic test, so the steps are the same - except the volume name) on different board with different hard drives and the problem has been reproduced. 
I am attaching the whole log directory (/var/log).

Comment 20 Rafal Marszewski 2009-08-12 13:32:03 UTC

Created attachment 357165 [details]
the log files

Comment 21 Hans de Goede 2009-08-12 18:25:04 UTC

Hmm,

I cannot find anything useful in the logfiles and I cannot reproduce this.
I'm adding Heinz (the dmraid maintainer) to the CC. Heinz do you have any clues
as to what might be going on here ?

Comment 22 Rafal Marszewski 2009-08-13 09:33:10 UTC

BTW:
When I execute command:
#mount -a
I get error:
mount: wrong fs type, bad option, bad superblock on /dev/mapper/isw_dgedhccjda_r10d4n0s64-150-0p1,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

mount: devpts already mounted or /dev/pts busy
mount: sysfs already mounted or /sys busy

My fstab file looks like this:
/dev/VolGroup00/LogVol01 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext2    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=DUT-SWAP          swap                    swap    defaults        0 0
/dev/VolGroup00/LogVol00 swap                    swap    defaults        0 0

And /dev/mapper have these devices:

isw_dgedhccjda_r10d4n0s64
isw_dgedhccjda_r10d4n0s64-0
isw_dgedhccjda_r10d4n0s64-1
isw_dgedhccjda_r10d4n0s64-150
isw_dgedhccjda_r10d4n0s64-150-0
isw_dgedhccjda_r10d4n0s64-150-0p1
isw_dgedhccjda_r10d4n0s64-150-1
isw_dgedhccjda_r10d4n0s64-150p1
isw_dgedhccjda_r10d4n0s64p1
VolGroup00-LogVol00
VolGroup00-LogVol01

--
May be these information can be useful in reproducing the problem
best regards
Rafal

Comment 23 RHEL Program Management 2009-09-25 17:43:26 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 24 Hans de Goede 2009-09-29 13:41:37 UTC

Removing devel ack flag as that was meant for the ugly (but harmless) error messages, and changing summary to track the /boot fsck inconsistency.

I've created a new bug to  track the fixing of the ugly (but harmless) error messages: bug 526246.

Comment 26 Denise Dumas 2009-10-13 13:59:40 UTC

Moving to 5.6 since this is not reproducing and resources are constrained.

Note You need to log in before you can comment on or make changes to this bug.