Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 475385

Summary:

RAID10 - Install ERROR appears during installation of RAID10 isw dmraid raid array in RHEL 5.3 Snapshot5

Product:

Red Hat Enterprise Linux 5

Reporter:

Ed Ciechanowski <ed.ciechanowski>

Component:

anaconda

Assignee:

Joel Andres Granados <jgranado>

Status:

CLOSED ERRATA

QA Contact:

Release Test Team <release-test-team-automation>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

5.3

CC:

agk, atodorov, borgan, coughlan, cward, ddumas, dwysocha, fernando, hdegoede, heinzm, Jacek.Danecki, jgranado, jjarvis, jvillalo, keve.a.gabbert, krzysztof.wojcik, lvm-team, martinez, mbroz, naveenr, pjones, prockai, rpacheco, rvokal, syeghiay, tao

Target Milestone:

Keywords:

OtherQA, Reopened

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-01-20 20:47:50 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

471689

Bug Blocks:

476866

Attachments:

Description	Flags
Screen capture of RAID10 install error	none
txt from tty2 running 'dmraid -ay -vvv -ddd'	none
output from the command "dmesg > dmesgout.log"	none
syslog file if this will help	none
Test SRPM for Joel	none
Anaconda.log file for RAID10 install	none
Anaconda.log file for RAID5 install	none
Anaconda.log file for RAID10 install w USB img	none
output from the command "dmesg > raid10wUSBimg/dmesout.txt"	none
syslog file if this will help for RAID10wUSBimg	none
Raid 10 with network img still failing	none
Raid 5 with network img still failing	none
Builstamp.txt from RAID5	none
lsmod from raid5	none
lsmod from RAID10	none
buildstamp from RAID10	none
dmraid commands in a .tgz	none
dmsetup targets output	none
Snap6 raid10 install logs	none
RAID 5 w/Snap6 logs	none

Description Ed Ciechanowski 2008-12-09 00:52:07 UTC

Created attachment 326234 [details]
Screen capture of RAID10 install error

+++ This bug was initially created as a clone of Bug #471689 +++

Description of problem:
During the installing of RHEL 5.3 Snapshot 2 no sw raid devices are active to install OS to. Also the version of dmraid being used during install are old.

After Keyboard type selection and one Skips the Registration #, the following ERROR appears:
ERROR 
Error opening /dev/mapper/isw_cdecfjjff_Volume0:
No such device or address

If you hit Alt-F1
<the last few lines are as follows>
error: only one argument allowed for this option
Error
Error: error opening /dev/mapper/isw_cdecfjjff_Volume0: no such device or address 80

If you hit Alt-F2 and try to activate dmraid:
dmraid –ay
raidset “/dev/mapper/cdecfjjff_Volume0” was not activated 

If you hit Alt-F3 you see this:
9:20:05 INFO:		moving (1) to step regkey
9:20:11 INFO:		Repopaths is {‘base’, ‘server’}
9:20:11 INFO:		Moving (1) to step find root parts
9:20:11 ERROR:	Activating raid “/dev/mapper/isw_cdecfjjff_Volume0” : failed 
9:20:11 ERROR: 	Table: 0 156295168 mirror core 2 131072 nosync 2 /dev/sda 0 /dev/sdb 0 1 handle_errors
9:20:11 ERROR:	Exception: device-mapper: reload ioctl failed: invalid argument
9:20:11 Critical:	parted exception: ERROR: error opening /dev/mapper/isw_cdecfjjff_Volume0: no such device or address

If you hit Alt-F4 (the last few lines are):
<6> device-mapper: multipath: ver 1.0.5 loaded
<6> device-mapper: round-robin: v1.0.0 loaded
<3> device-mapper: table: 253:0 mirror: wrong number of minor arguments
<4> device-mapper: ioctl: error adding target to table

Version-Release number of selected component (if applicable):
During install of RHEL 5.3 snapshot 2.
Also in Alt-F2 you run:
dmraid –V
dmraid version: 		1.0.0.rc13 (2007.09.17) static debug
dmraid library version:	1.0.0.rc13 (2006.09.17) 
device-mapper version:	4.11.5
“THESE ARE THE WRONG VERSIONS!”

How reproducible:
Run install of RHEL 5.3 Snapshot2 DVD iso with an ISW SW RAID mirror setup as the only two drives in the system, can miss it.

Steps to Reproduce:
1. Create RAID1 in OROM. Use default settings.
2. Boot to install DVD of RHEL 5.3 Snapshot2
3. Select a keyboard type and Skip the registration #.
4. the next screen that comes up shows the error
  
Actual results:
RHEL 5.3 Snapshot2 does not recognize the sw raid drives setup in the bios orom so it can install the os to the mirror. 

Expected results:
Expected RHEL 5.3 Snapshot2 to recognize and install the OS to the SW raid mirror. 

Additional info:

--- Additional comment from clumens on 2008-11-16 22:45:49 EDT ---

Please attach /tmp/anaconda.log and /tmp/syslog to this bug report.  A picture or something of those error messages on tty1 would be pretty helpful too.  Thanks.

--- Additional comment from ed.ciechanowski on 2008-11-17 15:01:37 EDT ---

Created an attachment (id=323792)
Anaconda.log file

attached anaconda.log file

--- Additional comment from ed.ciechanowski on 2008-11-17 15:02:34 EDT ---

Created an attachment (id=323793)
syslog file

attached syslog file

--- Additional comment from ddumas on 2008-11-17 15:41:15 EDT ---

We are seeing device-mapper-related problems showing up in anaconda logs with
Snapshot 2.

05:13:57 ERROR   : Activating raid isw_cdecfjjff_Volume0 failed: 05:13:57 ERROR
  :   table: 0 156295168 mirror core 2 131072 nosync 2 /dev/sda 0 /dev/sdb 0 1
handle_errors
05:13:57 ERROR   :   table: 0 156295168 mirror core 2 131072 nosync 2 /dev/sda
0 /dev/sdb 0 1 handle_errors
05:13:57 ERROR   :   exception: device-mapper: reload ioctl failed: Invalid
argument
05:13:57 ERROR   :   exception: device-mapper: reload ioctl failed: Invalid
argument

lvm-team, could someone please take a look?

--- Additional comment from mbroz on 2008-11-17 16:57:31 EDT ---

Please retest with snapshot3, it should contain fixed dmraid package, also see
https://bugzilla.redhat.com/show_bug.cgi?id=471400#c6

--- Additional comment from ed.ciechanowski on 2008-11-19 17:29:30 EDT ---

Created an attachment (id=324109)
screen capture dmraid on first boot error

This is the screen capture of RHEL 5.3 Snapshot3 after installing to a mirror and first reboot will show this error.

--- Additional comment from ed.ciechanowski on 2008-11-19 17:31:37 EDT ---

Created an attachment (id=324110)
/var/log dir tar to show logs

Here are the latest log files from RHEL 5.3 Snapshot3. The install goes further but still get an error on first report, see screen capture. Not sure if the logs will help. Please let me know what else I can provide to help resolve this issue.

--- Additional comment from hdegoede on 2008-11-20 09:21:34 EDT ---

Ed,

As the system does boot, can you please do the following:
mkdir t
cd t
zcat /boot/mkinitrd...... | cpio -i

After that you should have a file called init (amongst others) in the "t" directory, can you please attach that here? Thanks!

--- Additional comment from hdegoede on 2008-11-20 11:09:56 EDT ---

One more information request, can you please press a key when the inital Red RHEL5 bootloader screen is shown, then press A to append kernel cmdline arguments, and then remove "quiet" from the cmdline (and press enter to boot)

And then take a screenshot of the machine when it fails to boot again.

Thank you.

--- Additional comment from ed.ciechanowski on 2008-11-21 14:59:51 EDT ---

Created an attachment (id=324337)
first screen shot of error

I took two screen shots, this is the first one.

--- Additional comment from ed.ciechanowski on 2008-11-21 15:01:47 EDT ---

Created an attachment (id=324339)
Second screen shoot

Second screen shot, I took Two. Let me know if you need previous message that do not appear in screen shot 1 nor 2.

--- Additional comment from ed.ciechanowski on 2008-11-21 15:03:34 EDT ---

Created an attachment (id=324340)
Here is the init file

I believe the command you wanted was /sbin/mkinitrd...... | cpio -i and not /boot/mkinitrd. Let me know if this is what you needed. Thanks again!

--- Additional comment from heinzm on 2008-11-24 06:47:40 EDT ---

Running "dmraid -ay -i -p $Name" on the command line works perfectly fine.

Do we have all necessary blockdev nodes to acces the component devices of the RAID set $Name requested created by the initrd ?

--- Additional comment from ed.ciechanowski on 2008-11-24 12:10:47 EDT ---

When installing RHEL 5.3 Snapshot3 it looks like the mirror is being written to before the reboot. After the install reboots the first time the above errors show up at boot. It seems from this point the OS has defaulted back to running off /dev/sda only. After the OS boots, looks like to /dev/sda, running the command from a terminal "dmraid -ay" Gives the message raid set as not activated. 

What the question in comment 13 for me? Thanks. What more can I provide that will help resolve this issue?

--- Additional comment from jgranado on 2008-11-24 12:28:18 EDT ---

Created an attachment (id=324509)
5 pictures containing the output of init. only the output relevant to the dmraid messages

I believe we have all the necesarry nodes.  This attachement is a tar.gz of the pictures I tool of the output of my machie when changing the init script to execute `dmraid -ay -i -p -vvv -ddd "isw_bhdbbaeebb_Volume0` (sorry for the crappy pictures, the only thing I could find was an Iphone.)

As seen in the output, in the NOTICE messages of the beginning.  dmraid successfully identifies /dev/sdb and /dev/sda as containing isw metadata.

--- Additional comment from jgranado on 2008-11-24 12:35:52 EDT ---

(In reply to comment #14)
> When installing RHEL 5.3 Snapshot3 it looks like the mirror is being written to
> before the reboot. 

Can you please exand on this.  What do you mean by being written to.  it is normal that just before reboot we would want to use the deivice to which we install.  postinstall scripts. rpm installation is ending....  I don't see this as out of the ordinary.

> After the install reboots the first time the above errors
> show up at boot. It seems from this point the OS has defaulted back to running
> off /dev/sda only. After the OS boots, looks like to /dev/sda, 

Yes.  this only happens with mirror raid.  If you install striped RAID you will get a kernal panic.  I assume that it is because of the same reason.  Only with stripped it is not that ieasy to default to using just one of the block devices.

> running the
> command from a terminal "dmraid -ay" Gives the message raid set as not
> activated. 

Same behaviour here.

> 
> What the question in comment 13 for me? Thanks. What more can I provide that
> will help resolve this issue?

--- Additional comment from jgranado on 2008-11-24 12:39:40 EDT ---

(In reply to comment #13)
> Running "dmraid -ay -i -p $Name" on the command line works perfectly fine.

What is your test case.  I mean.  Do you install, and after install you see that the command works as expected?  

Are you testing in a running system?  what special configuration do you have?

Thx for the info.

--- Additional comment from heinzm on 2008-11-24 12:49:41 EDT ---

Joel,

after install, the command works fine for me on a running system.
Can't open your attachment to comment #15.
Are you sure, that all block devices (I.e. the component devices making up the RAID set in question) are there when the initrd processes ?

Ed,

the question in comment #13 was meant for our anaconda/mkinitrd colleagues.

--- Additional comment from jgranado on 2008-11-24 13:21:12 EDT ---

try http://jgranado.fedorapeople.org/temp/init.tar.gz, bugzilla somehow screwed this up.

--- Additional comment from jgranado on 2008-11-24 13:24:30 EDT ---

(In reply to comment #17)
> (In reply to comment #13)
> > Running "dmraid -ay -i -p $Name" on the command line works perfectly fine.

I see the same behavior when I have the os installed on a non raid device and try to activate the raid device after boot.  But When I do the install on the raid device itself and try to use it, it does not work.

Heinz:
insight on the output of the init that is on http://jgranado.fedorapeople.org/temp/init.tar.gz would be greatly appreciated.

--- Additional comment from jgranado on 2008-11-24 13:36:36 EDT ---

On a comparison between what I see in the pictures and in the output of "dmraid -ay -i -p $Name" on a running sysmte.  I noticed a slight difference:

Init output:
.
.
.
NOTICE: added DEV to RAID set "NAME"
NOTICE: dropping unwanted RAID set "NAME_Volume0"
.
.
.

Normal output:
.
.
.
NOTICE: added DEV to RAID set "NAME"
.
.
.

The normal output does not have the "dropping unwanted ...." message.

Any ideas?

--- Additional comment from hdegoede on 2008-11-24 14:28:41 EDT ---

(In reply to comment #21)
> On a comparison between what I see in the pictures and in the output of "dmraid
> -ay -i -p $Name" on a running sysmte.  I noticed a slight difference:
> 
> Init output:
> .
> .
> .
> NOTICE: added DEV to RAID set "NAME"
> NOTICE: dropping unwanted RAID set "NAME_Volume0"
> .
> .
> .
> 
> Normal output:
> .
> .
> .
> NOTICE: added DEV to RAID set "NAME"
> .
> .
> .
> 
> The normal output does not have the "dropping unwanted ...." message.
> 
> Any ideas?

Joel, when you run dmraid on a running system do you use:
"dmraid -ay" or "dmraid -ay -p NAME_Volume0" ?

Notice how dmraid says:
> NOTICE: added DEV to RAID set "NAME"
> NOTICE: dropping unwanted RAID set "NAME_Volume0"

Where in one case the _Volume0 is printed and in the other not. There have been several comments in other reports about the _Volume0 causing problems.

Joel, if you are using "dmraid -ay" (so without the " -p NAME_Volume0", try changing the "init" script in the initrd to do the same (so remove the " -p NAME_Volume0"), and then see if the raid array gets recognized at boot.

--- Additional comment from jgranado on 2008-11-25 06:06:28 EDT ---

(In reply to comment #22)

> Joel, when you run dmraid on a running system do you use:
> "dmraid -ay" or "dmraid -ay -p NAME_Volume0" ?

I use NAME_Volume0, it does not find any sets with just NAME.  But it does print the mangled name in the NOTICE message.

> 
> Notice how dmraid says:
> > NOTICE: added DEV to RAID set "NAME"
> > NOTICE: dropping unwanted RAID set "NAME_Volume0"
> 
> Where in one case the _Volume0 is printed and in the other not. There have been
> several comments in other reports about the _Volume0 causing problems.
> 
> Joel, if you are using "dmraid -ay" (so without the " -p NAME_Volume0", try
> changing the "init" script in the initrd to do the same (so remove the " -p
> NAME_Volume0"), and then see if the raid array gets recognized at boot.

I'll give it a try.

--- Additional comment from heinzm on 2008-11-25 06:11:36 EDT ---

Hans' comment #22 is a workaround in case mkinitrd provides the wrong RAID set name to dmraid,

Our policy is to activate boot time mappings *only* in the initrd, hence mkinitrd needs fixing if it provides a wrong RAID set name.

--- Additional comment from jgranado on 2008-11-25 08:54:22 EDT ---

(In reply to comment #24)
> Our policy is to activate boot time mappings *only* in the initrd, hence
> mkinitrd needs fixing if it provides a wrong RAID set name.

The name is correct.  That is not the issue.

Heinz:
When I run `dmraid -ay` from init, the raid set starts correctly.  I think there is something missing from the environment at that point, but I have no idea what.  Any ideas?

--- Additional comment from jgranado on 2008-11-25 09:10:10 EDT ---

The snapshots with the name are in http://jgranado.fedorapeople.org/temp/init.tar.gz.

I'll post the snapshots without the name (the one that works) shortly.

--- Additional comment from jgranado on 2008-11-25 09:40:10 EDT ---

The snapshots with the command `dmraid -ay -ddd -vvv` are in http://jgranado.fedorapeople.org/temp/initWork.tar.gz

--- Additional comment from jgranado on 2008-11-25 10:43:42 EDT ---

(In reply to comment #18)
> Joel,
> 
> after install, the command works fine for me on a running system.

Heinz
can you send me, post somewhere, attach to the bug your initrd image for the test machine.
thx.

--- Additional comment from heinzm on 2008-11-25 11:20:13 EDT ---

Joel,

like I said, I only ran online, no initrd test.

The provided init*tar.gz snapshots show with the name, that it is being dropped,
ie. the dmraid library want_set() function drops it, which is only possible when the the names in the RAID set and on the command line differ.

Could there be some strange, non-displayable char in the name ?

Please provide the initrd being used to produce to init.tar.gz (ie. the one *with* the name), thx.

--- Additional comment from hdegoede on 2008-11-25 16:51:37 EDT ---

*** Bug 472888 has been marked as a duplicate of this bug. ***

--- Additional comment from hdegoede on 2008-12-02 05:46:41 EDT ---

*** Bug 473244 has been marked as a duplicate of this bug. ***

--- Additional comment from hdegoede on 2008-12-02 06:14:55 EDT ---

We've managed to track down the course of this to mkinitrd (nash). We've done a new build of mkinitrd / nash: 5.1.19.6-41, which we believe fixes this (it does on our test systems).

The new nash-5.1.19.6-41, will be in RHEL 5.3 snapshot 5 which should become available for testing next Monday.

Please test this with snapshot5 when available and let us know how it goes. Thanks for your patience.

--- Additional comment from bmarzins on 2008-12-02 15:01:29 EDT ---

*** Bug 471879 has been marked as a duplicate of this bug. ***

--- Additional comment from pjones on 2008-12-02 15:45:56 EDT ---

*** Bug 446284 has been marked as a duplicate of this bug. ***

--- Additional comment from pjones on 2008-12-02 16:09:25 EDT ---

This should be fixed with nash-5.1.19.6-41 .

--- Additional comment from ddumas on 2008-12-05 13:18:42 EDT ---

*** Bug 474825 has been marked as a duplicate of this bug. ***

--- Additional comment from cward on 2008-12-08 06:53:21 EDT ---

~~ Snapshot 5 is now available @ partners.redhat.com ~~ 

Partners, RHEL 5.3 Snapshot 5 is now available for testing. Please send us your testing feedback on this important bug fix / feature request AS SOON AS POSSIBLE. If you are unable to test, indicate this in a comment or escalate to your Partner Manager. If we do not receive your test feedback, this bug will be AT RISK of being dropped from the release.

If you have VERIFIED the fix, please add PartnerVerified to the Bugzilla
Keywords field, along with a description of the test results. 

If you encounter a new bug, CLONE this bug and request from your Partner
manager to review. We are no longer excepting new bugs into the release, bar
critical regressions.

RAID10 (0+1) – Install ERROR appears during installation of  RAID10 isw dmraid raid array in RHEL 5.3 Snapshot5. SEE ATTACHMENT .JPG

Comment 1 Ed Ciechanowski 2008-12-09 01:00:16 UTC

RAID10 (0+1) – Install ERROR appears during installation of  RAID10 isw dmraid
raid array in RHEL 5.3 Snapshot5. SEE ATTACHMENT .JPG

If logs are needed, let me know which ones.

Comment 2 Chris Ward 2008-12-09 07:41:30 UTC

Is this  a regression or critical error? It's getting very late to introduce new change into the release. Please make your case as soon as possible. Otherwise we'll be forced to defer to 5.4. If fixing for 5.4 is OK, please let me know that too.

Comment 3 Joel Andres Granados 2008-12-09 08:01:48 UTC

This is part of the pyblock changes to address the modifications in the dmraid API.  When pyblock searches for the dmraid set it first passes the dmraid drive name and calls group_set with this info.  When it fails it shows the error message.  If it fails a second call to group_set is done with {NULL} as an argument.

My test show that for all intel type raid the error is shown as it expects a {NULL} to be passed.  I didn't completely change to calling with the {NULL} argument because of fear of breaking other raid types.

This will not break anything, It just looks ugly.  If need be, and if reasured that it wont break anything, I can take away the first call and just leave the group_set({NULL}).

Comment 4 Hans de Goede 2008-12-09 10:10:09 UTC

Ed,

This is caused by the way the new in RHEL-5.3 isw raid 5 / raid 10 support has been implemented, as explained by Joel.

The implementation of the new isw raid 5 / raid 10 suport is being tracked in bug 437184 (which is still in progress), as such I'm closing this as a dup of 437184.

*** This bug has been marked as a duplicate of bug 437184 ***

Comment 5 Alexander Todorov 2008-12-09 10:27:47 UTC

Hans, Joel,
does this error appear when using kickstart ? If so it will break all automated installs. 

Moving to assigned until I receive answer to the above question.

Comment 6 Joel Andres Granados 2008-12-09 12:24:34 UTC

This will not break automated installs.

Comment 8 Joel Andres Granados 2008-12-09 14:57:17 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
RHEL 5.3 adds support for installing to ISW raid 5 / raid 10 setups. Due to the way this support is implement when you try to install RHEL 5.3 on a system which has a ISW raid 5 / raid 10 setup, you will see an error message like this one:
"ERROR: only one argument allowed for this option"

You can safely ignore this error, the installation will continue normally and
the raid array will be available to install to.

Comment 9 Joel Andres Granados 2008-12-09 15:15:56 UTC

Deleted Release Notes Contents.

Old Contents:
RHEL 5.3 adds support for installing to ISW raid 5 / raid 10 setups. Due to the way this support is implement when you try to install RHEL 5.3 on a system which has a ISW raid 5 / raid 10 setup, you will see an error message like this one:
"ERROR: only one argument allowed for this option"

You can safely ignore this error, the installation will continue normally and
the raid array will be available to install to.

Comment 10 Joel Andres Granados 2008-12-09 15:17:28 UTC

Please ignore comment  #3 to #9.  I was testing with raid0 and was looking at a different error message.  My bad sorry for the noise.

Comment 11 Peter Jones 2008-12-09 16:27:18 UTC

"During the installing of RHEL 5.3 Snapshot 2 no sw raid devices are active to
install OS to."

Which do you mean by this -- that anaconda doesn't see the disks, or that you've disabled software raid and yet you're seeing these errors anyway?

Comment 12 Ed Ciechanowski 2008-12-09 16:44:14 UTC

Sorry, but I am not sure if these Questions are to me.

Test Results of  installs for RHEL5.3 Snapshot5 to isw raids:

RAID 1 (Mirror) – Work correctly, a true mirror is created. You can boot to either drive if one is removed. I would closed this bugzilla as verified fixed. 


Issues that still exists in this area:
RAID0 (Strip) – produces i/o errors on boot. Seems to be a Strip RAID system boots and work normally except for i/o errors on boot. Cloned this Bugzilla – see bug 475384

RAID10 (0+1) – Install ERROR appears during installation of  RAID10 isw dmraid raid array in RHEL 5.3 Snapshot5. Cloned this Bugzilla – see bug 475385

RAID5 – Install ERROR appears during installation of  RAID5 isw dmraid raid array in RHEL 5.3 Snapshot5. Cloned this Bugzilla – see bug 475386

If you need logs from any of these issue please let me know. I can collect them today.

Comment 13 Joel Andres Granados 2008-12-09 17:39:27 UTC

(In reply to comment #12)
> Sorry, but I am not sure if these Questions are to me.
> 
> Test Results of  installs for RHEL5.3 Snapshot5 to isw raids:
> 
> RAID 1 (Mirror) – Work correctly, a true mirror is created. You can boot to
> either drive if one is removed. I would closed this bugzilla as verified fixed.

Yes, this has been dealt with. and has been verified AFAIK.

> 
> 
> Issues that still exists in this area:
> RAID0 (Strip) – produces i/o errors on boot. Seems to be a Strip RAID system
> boots and work normally except for i/o errors on boot. Cloned this Bugzilla –
> see bug 475384

Yep, this is being handled in bug 475384 and whatever addition you have to that bug will be appreciated, pls look at my comments in bug.

> 
> RAID10 (0+1) – Install ERROR appears during installation of  RAID10 isw dmraid
> raid array in RHEL 5.3 Snapshot5. Cloned this Bugzilla – see bug 475385

Yes.  Also correct.  475385 and 475386.  in your test they present the same behaviour and seem to sprout from the same place.  these two issues will be handled in this bug 475385 unless proven that their cause is different.

> 
> RAID5 – Install ERROR appears during installation of  RAID5 isw dmraid raid
> array in RHEL 5.3 Snapshot5. Cloned this Bugzilla – see bug 475386
> 

The confusion sprouted from the fact that (475385 and 475384) where cloned from 471689. which was a bug that contained some issue dating back from snap2.  Additionally 471689 has some output that also added to the confusion (the output there not relevant to 475385 nor 475384).  The issue in 471689 and the issues in 475385 and 475384 are different and present themselves in different moments in the install/boot.

Since we now have clear that this is a new bug can you please answer pjone's question. Have you disabled raid?  are you testing with snap5?  furthermore: on install, can you change to tty2 and execute `dmraid -ay -vvv- ddd` and post the out put.

I'm currently testing the same scenario as you and get slightly different behaviour,  I don't get the error message but the raid device is not seen.  I will continue to investigate and post my findings, hopefully together we may find the solution to this issue.

Comment 14 Joel Andres Granados 2008-12-09 19:18:24 UTC

If you can please provide the logs for the installation.  They are usually left in /root directory.

Comment 16 Ed Ciechanowski 2008-12-09 20:32:30 UTC

Created attachment 326396 [details]
txt from tty2 running 'dmraid -ay -vvv -ddd'

Comment 17 Ed Ciechanowski 2008-12-09 20:33:38 UTC

Created attachment 326397 [details]
output from the command "dmesg > dmesgout.log"

Comment 18 Ed Ciechanowski 2008-12-09 20:34:55 UTC

Created attachment 326398 [details]
syslog file if this will help

If other items are needed please specify the commands to run. thanks

Comment 19 Joel Andres Granados 2008-12-11 15:27:24 UTC

Been looking at this and in my tests the installer does not recognize the raid
sets.  It just shows the underlying devices as separate partitions.
1. When I execute `dmraid -ay` in the installer (TTy2) it screams to me that
the kernel does not support the raid45.

2. When I execute `dmraid -ay` in the installed system.  It correctly mounts
the raid device and all is dandy.

3. the installer does not handle dmraid through the dmriad command.  It handles
dmraid through python-pyblock.  pyblock correctly detects the raid sets in the
installer but fails to activate them (in installer).  Still looking into this
but it has something to do with the table that is received from dmraid
libraries.  On my machine it looks like this "0 312614656 raid45 core 2 65536
nosync raid5_la 1 128 3 -95466056 /dev/sdb 0 /dev/sdc 0 /dev/sdd 0".  The not
so obvious problem is the number that comes before the device list.  Acording
to the kernel code this is suppose to be the number of devices to initialize
and its range is -1 to #of raiddevs.  So a number of -95466056 is defenetly
wrong.  We don't really modify this string in pyblock, we use whatever
libdmraid_make_table gives us.  Im still investigating but I would really like
to hear from Heinz and see what might be causing this issue.

4. The same as 3.

I am aware that the symptoms that the user sees are slightly different than
what I am seeing on my machine, but I trust that fixing whatever is wrong on my
test machine will give us a more crear picture of whats going on.

Comment 21 Joel Andres Granados 2008-12-12 10:48:06 UTC

Pleaes test with http://jgranado.fedorapeople.org/temp/raid45.img as an updated
image.  You need to append
updates=http://jgranado.fedorapeople.org/temp/raid45.img to the kernel args to
make this work.

Comment 22 Joel Andres Granados 2008-12-12 16:54:50 UTC

The image adds 1 second sleeps before calling the dmraid function   libdmraid_make_table.  In my tests with my running system it fixed pyblocks behaviour.  This hopefully will avoid the erratic behaviour in the installer.

Before the change pyblock received strange device to init parameter (part of the device mapper table).  its suppose to be from -1 to # of devs, but it gives some *really* big possitive numbers.  When pyblock uses the table containing this bogus number device-mapper screams and says that the table has the wrong format.

Adding a sleep before the call seems to normalize things.

the bug is seen when you execute:
`for a in 1 2 3 4 5 6 7 8 9 ; do dmraid -tay ; done`
One can see the *big* numbers that go before the device list.

An example of the output in my machine is:
"isw_bafgeadidc_Volume0: 0 312614656 raid45 core 2 65536 nosync raid5_la 1 128 3 58168736 /dev/sdb 0 /dev/sdc 0 /dev/sdd 0"
Notice the 58168736.  Here it should be -1 <= x <= 3

Comment 23 Joel Andres Granados 2008-12-12 18:23:53 UTC

One of the reasons that this did not work in my test machine  is that anaconda did not have the correct modules.  I have just added dm-raid45 dm-mem-cache dm-region_hash dm-message to the mix.
Heinz:
Do you think some other module should be added?

Comment 24 Heinz Mauelshagen 2008-12-12 22:00:59 UTC

Joel,

why are these modules not in by default, since they are in the 126.el5 kernel ?

Comment 29 Joel Andres Granados 2008-12-15 08:42:17 UTC

This fixes the behavior I saw with the big numbers I was seeing.  I still need confirmation from intel about the images.

Intel:
Can we pls get a test ASAP!

Comment 30 Joel Andres Granados 2008-12-15 10:35:18 UTC

(In reply to comment #24)
> Joel,
> 
> why are these modules not in by default, since they are in the 126.el5 kernel ?

In the installer we have a list of modules that we use for the installation.  We only put stuff that is needed for installs.  We do this to cut down on the size of the install images.  The list contains various dmraid stuff but did not have these specific modules.

Comment 31 Heinz Mauelshagen 2008-12-15 14:38:51 UTC

Assigned to myself after Joels test result shows, that my lib/activate/active.c fix seems appropriate.

Joel, you still 'needinfo' from Ed or can you '-' it ?

Comment 32 Heinz Mauelshagen 2008-12-15 14:59:05 UTC

Comment on attachment 326792 [details]
Test SRPM for Joel

Setting attachment to public for Intel to test.

Comment 33 Joel Andres Granados 2008-12-15 15:11:31 UTC

Heinz:

To be sure that the dmraid fix actually fixed the anaconda strangeness that the reporter was seeing I still need him to test with my updates image.  So I would leave needinfo in ?.

Comment 35 Ed Ciechanowski 2008-12-15 15:44:17 UTC

Heinz and Joel,

Does the raid45.img file contain the 'attachment 326792 [details]'?
Should the 'attachment 326792 [details])' be included during procedure of
install?

We will test this as soon as possible. EDC

Comment 37 Heinz Mauelshagen 2008-12-15 16:15:59 UTC

Ed,

Joels image will do. The srpm is to allow for complete test coverage on your end.

Comment 40 Ed Ciechanowski 2008-12-15 17:35:18 UTC

Joel, 
Please see email I sent to you directly - jgranado. 
Thanks,
*EDC*

Comment 43 Joel Andres Granados 2008-12-15 18:24:53 UTC

(In reply to comment #35)
> Heinz and Joel,
> 
> Does the raid45.img file contain the 'attachment 326792 [details]'?
> Should the 'attachment 326792 [details] [details])' be included during procedure of
> install?

No, the attachement is a src rpm that can be used in a running system.  To get
it into the installer is more difficult and I would like to test with the
current image before starting off that road.

The image contains a 1 second wait time before the call to
libdmraid_make_table.  This should make the buggy version of dmraid work with
the installer.  To use the images you must append the following text to the
installation parameters:

"updates=http://jgranado.fedorapeople.org/temp/raid45.img"

,  you need not to download it to a usb or do anything else.  If the machine
your installing has access to internet there should be no problem.  If you have
issues with connecting the machine to the internet I suggest you use an
internal http server to host the updates image and get it from there.  Http is
the easiest way.

Comment 44 Ed Ciechanowski 2008-12-15 22:19:46 UTC

Attaced are the anaconda log, syslog and dmesg.txt files from installs of RAID10 and RAID5. Using "updates=http://jgranado.fedorapeople.org/temp/raid45.img".
The error in both anaconda.log files are as such:
23:04:52 INFO    : file location: http://jgranado.fedorapeople.org/temp/raid45.img
23:04:52 INFO    : transferring http://jgranado.fedorapeople.org//temp/raid45.img to a fd
23:08:02 ERROR   : failed to retrieve http://jgranado.fedorapeople.org///temp/raid45.img

I need to find out if it is the network or setup an http: server of my own. See attached logs. EDC Will updated again soon.

Comment 45 Ed Ciechanowski 2008-12-15 22:21:04 UTC

Created attachment 327031 [details]
Anaconda.log file for RAID10 install

Comment 46 Ed Ciechanowski 2008-12-15 22:21:34 UTC

Created attachment 327032 [details]
Anaconda.log file for RAID5 install

Comment 47 Ed Ciechanowski 2008-12-16 00:00:41 UTC

Created attachment 327037 [details]
Anaconda.log file for RAID10 install w USB img

Comment 48 Ed Ciechanowski 2008-12-16 00:02:08 UTC

Created attachment 327038 [details]
output from the command "dmesg > raid10wUSBimg/dmesout.txt"

Comment 49 Ed Ciechanowski 2008-12-16 00:02:44 UTC

Created attachment 327039 [details]
syslog file if this will help for RAID10wUSBimg

Comment 50 Ed Ciechanowski 2008-12-16 06:28:31 UTC

Created attachment 327067 [details]
Raid 10 with network img still failing

Raid 10 with network img still failing

Comment 51 Ed Ciechanowski 2008-12-16 06:30:19 UTC

Created attachment 327068 [details]
Raid 5 with network img still failing

Raid 5 with network img still failing.

I have not seen the 1 second wait resolve the orginal issue.

Comment 52 Ed Ciechanowski 2008-12-16 06:39:48 UTC

The .img file is transferring over the network at home. 

Started install with following command:
Boot: linux updates=http://jgranado.fedorapeople.org/temp/raid45.img
Looks like the right .img file is loading but still does not recongize RAID array 5 and 10.
Installation still FAILS with raid10 and raid5 installs. 

Attached are the tar files from both raid10 and raid5 with network image, each .tgz contains:
anaconda.log
syslog
dmesg out from install
dmraid_out from TTY2
Please find attached raid10wNETimg.tgz and raid5wNETimg.tgz respectively.

TTY1 shows message below:
 
Starting Graphical installation…
ERROR: only one argument allowed for this option
Using updates image… sleeping for 2 secs…
Using updates image… sleeping for 2 secs…
Error
Error opening /dev/mapper/isw_dgdjhihjig_Volume0: no such device or address
80

I have not seen the 1 second wait resolve the orginal issue.
I have a system with me now so I can run quick tests even early or late PST. 
Thanks,
EDC

Comment 53 Joel Andres Granados 2008-12-16 14:46:32 UTC

The first bach of tests failed most likely because the network was misconfigured, but the last output (comment #52) tells me that the sleep before the call did not work.  This basically means that the issue that Ed is seeing must have another root cause.  We have already solved one dmraid issue but still there is something causing the failure of dmraid at install time.

dont have anything concrete ATM, but will post to the bug once I do.

Comment 54 Chris Ward 2008-12-16 16:27:31 UTC

~~~ Attention Partners ~~~ The *last* RHEL 5.3 Snapshot 6 is now available at partners.redhat.com. A fix for this bug should be present. Please test and update this bug with test results as soon as possible.  If the fix present in Snap6 meets all the expected requirements for this bug, please add the keyword PartnerVerified. If any new bugs are discovered, please CLONE this bug and describe the issues encountered there.

Comment 55 Chris Ward 2008-12-16 16:29:05 UTC

Apologies. The fix for this bug won't be present in Snap6, but is scheduled for inclusion in the RC release, which will be available at a later date.

Comment 56 Joel Andres Granados 2008-12-16 21:34:11 UTC

I finally got this to work!

Stuff that was going wrong:
1. We didn't have the modules in the installer.  already commited the change to take care of this.
The installer basically did not see the dmraid set.  It just saw the separate raid devices.  It shouldn't detect that there is a raid set and should finish the install correctly if one continues the installation to one of the devices.
This brings me to the current situation seen by the intel tests.  As far as I can tell the raid sets can be activated at install time (https://bugzilla.redhat.com/attachment.cgi?id=326396. where you can clearly see that the sets where initialized).  So something strange is happening here.  because if the snap5 or 6 was being used, the modules should not be in the installer.  So the intel test is dealing with another type of raid or its putting in some dm modules that are making the process break in a different way.

2. dmraid had an issue with the table creationg.  Heinz' package has fixed this and now everything seems to work normally.
This could be seen in a running system only, not in the installer, as the installer failed becuase of lack of modules.  So one had to actually install and fidle with dmraid to observer bug.

Comment 57 Joel Andres Granados 2008-12-16 21:36:32 UTC

Ed:

Pls initialize install and wait for the window asking you for the key.  Go to tty2 and execute `lsmod` withoug executing anything else and post the output here.  This is to know what modules you have by default.

Comment 58 Joel Andres Granados 2008-12-16 22:16:19 UTC

Ed:

also run `cat /.buildstamp` and post the output pls.

Comment 59 Ed Ciechanowski 2008-12-16 23:52:16 UTC

Created attachment 327188 [details]
Builstamp.txt from RAID5

lsmodout.txt from RAID5
Builstamp.txt from RAID5
lsmod_R10.txt from RAID10
buildstamp_R10.txt from RAID10

Comment 60 Ed Ciechanowski 2008-12-16 23:53:00 UTC

Created attachment 327189 [details]
lsmod from raid5

Comment 61 Ed Ciechanowski 2008-12-16 23:53:33 UTC

Created attachment 327190 [details]
lsmod from RAID10

Comment 62 Ed Ciechanowski 2008-12-16 23:54:10 UTC

Created attachment 327191 [details]
buildstamp from RAID10

Comment 63 Ed Ciechanowski 2008-12-16 23:58:19 UTC

All the lsmod and buildstamps above were installed with the boot command:
"linux updates=http://jgranado.fedorapeople.org/temp/raid45.img" over the known working (home) network.

Comment 64 Heinz Mauelshagen 2008-12-17 08:40:41 UTC

Comment#60 unveils that there's still no dm-raid45 module loaded, hence RAID 5 activation will fail.

Comment#61 shows dm-mirror and related modules *but* no dm-stripe, hence RAID10 activation has to fail.

So the aforementioned modules are still not installed.

Comment 65 Joel Andres Granados 2008-12-17 11:00:14 UTC

Heinz:

I expected this.  And even though the modules are not there `dmraid -ay` works for him as https://bugzilla.redhat.com/attachment.cgi?id=326396 clearly shows.  So my conclusion is that we are looking at a different type of raid.  Can we confirm this somehow with a dmraid command?

Comment 66 Heinz Mauelshagen 2008-12-17 11:56:20 UTC

Joel,

I am pretty sure, Ed is only basing this on Intel Matrix RAID (dmraid isw format).

"dmraid -b" shows any discovered block devices,
"dmraid -r" any discovered RAID devices together with their format and more and
"dmraid -s" any discovered RAID sets.

So, if neither "-s", nor "-r" output shows any supported RAID -> confirmation.

Comment 67 Ed Ciechanowski 2008-12-17 12:25:13 UTC

Created attachment 327243 [details]
dmraid commands in a .tgz

I am only using dmraid and only isw raid. I believe this would showup on any SW dmraid devices.

Initialized install and wait for the window asking me for the key. Went to
tty2 and executed:

dmraid -b -vvv -ddd > dmraid_b_out.txt
dmraid -r-vvv -ddd > dmraid_r_out.txt
dmraid -s -vvv -ddd > dmraid_s_out.txt
dmraid -tay -vvv -ddd > dmraid_tay_out.txt

All attached in .tgz
I am avalible earlier today for quick tests, if you need!

Comment 68 Heinz Mauelshagen 2008-12-17 12:52:15 UTC

Attachment in comment#67 shows the presence of a RAID01 set.

If this is the 127.el5 kernel, all mappings should be available, because RAID0 (striped target) is build in.

Is this test based on 127.el5 kernel ?

Comment 69 Ed Ciechanowski 2008-12-17 13:11:07 UTC

uname -a shows:
Linux localhost.localdomain 2.6.18-125.el5 #1 SMP Mon Dec 1 17:38:25 EST 2008 x86_64 unknown

I am downloading Snap6 now.

1). Will that bring me to > 127.el5?
2).Will the raid45.img be needed on the arg boot line?
2a). Will the image work with snap6?
3). What would you like to see from the snap6 tests, if not working?

It will take at least a hour to try this.
Thanks.

Comment 70 Joel Andres Granados 2008-12-17 14:11:13 UTC

(In reply to comment #69)
> uname -a shows:
> Linux localhost.localdomain 2.6.18-125.el5 #1 SMP Mon Dec 1 17:38:25 EST 2008
> x86_64 unknown
> 
> I am downloading Snap6 now.
> 
> 1). Will that bring me to > 127.el5?

It will probably be 126.el5

> 2).Will the raid45.img be needed on the arg boot line?

No. The image worked for me.  It made my raid5 installable.

> 2a). Will the image work with snap6?

It will work, but your installations will probably still be failing.  Don't use the image anymore.

> 3). What would you like to see from the snap6 tests, if not working?

please run
`dmsetug targets` and post your outputs.

> 
> It will take at least a hour to try this.
> Thanks.

Comment 71 Ed Ciechanowski 2008-12-17 21:23:52 UTC

Created attachment 327285 [details]
dmsetup targets output

dmsetup targets output. I will run some more test and tar them up here like previously.

Comment 72 Ed Ciechanowski 2008-12-17 22:23:03 UTC

Created attachment 327288 [details]
Snap6 raid10 install logs

Started install with not .img arg on Boot: <enter>

At the point of install when the graphical interface show Skip and Back button for Installation Number.
Installation of RHEL5.3 Snapshot6 still FAILS with raid10 and raid5 installs. 

Attached is a .tgz of all files from both raid10 and raid5 with no network image, each .tgz contains:
anaconda.log
syslog
buildstamp_out
lsmod_out
uname –a out
dmesg out from install
dmraid_r_out from TTY2
dmraid_s_out from TTY2
dmraid_b_out from TTY2
dmraid_tay_out from TTY2
dmsetup table from TTY2
dmsetup targets from TTY2

Please find attached snap6R10Logs.tgz and raid5Snap6.tgz respectively.

Thanks,
EDC

Comment 73 Ed Ciechanowski 2008-12-17 22:24:23 UTC

Created attachment 327289 [details]
RAID 5 w/Snap6 logs

Attached is a .tgz of all files from raid5 contains:
anaconda.log
syslog
buildstamp_out
lsmod_out
uname –a out
dmesg out from install
dmraid_r_out from TTY2
dmraid_s_out from TTY2
dmraid_b_out from TTY2
dmraid_tay_out from TTY2
dmsetup table from TTY2
dmsetup targets from TTY2

Comment 74 Joel Andres Granados 2008-12-18 15:19:56 UTC

raid10 update:

I found the issue on riad10 to be pyblock.  Still doing test but the issue will no longer be handled by this bug.

Ed.
Lets use https://bugzilla.redhat.com/show_bug.cgi?id=475386 to track this new issue.  I just changed the description to correctly describe the situation.
thx

Comment 77 Heinz Mauelshagen 2009-01-09 12:30:48 UTC

Correcting component+assignment.

Comment 78 errata-xmlrpc 2009-01-20 20:47:50 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0078.html