524168 – Wrong RAID10 recognition in anaconda while dmraid shows correct values

Bug 524168 - Wrong RAID10 recognition in anaconda while dmraid shows correct values

Summary: Wrong RAID10 recognition in anaconda while dmraid shows correct values

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Hans de Goede
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F12AnacondaBlocker
TreeView+	depends on / blocked

Reported:	2009-09-18 07:49 UTC by Markus Mehrwald
Modified:	2013-10-29 10:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-10-28 08:12:06 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
log files after anaconda started (43.17 KB, application/x-tar) 2009-09-18 18:16 UTC, Markus Mehrwald	no flags	Details
Image of sda (3.54 KB, application/x-gzip) 2009-10-01 19:12 UTC, Markus Mehrwald	no flags	Details
Image of sdb (2.58 KB, application/x-gzip) 2009-10-01 19:12 UTC, Markus Mehrwald	no flags	Details
Logfiles after the installer started (28.15 KB, application/x-gzip) 2009-10-02 20:14 UTC, Markus Mehrwald	no flags	Details
Logfiles with deactivated RAID sets (35.67 KB, application/x-gzip) 2009-10-03 00:19 UTC, Markus Mehrwald	no flags	Details
Log with update image (78.37 KB, text/plain) 2009-10-07 19:19 UTC, Markus Mehrwald	no flags	Details
View All

Description Markus Mehrwald 2009-09-18 07:49:02 UTC

Description of problem:
Anaconda does not recognise my RAID 10. It has 4 disks but anaconda does not show the RAID but one disk named /dev/sda. It does not matter if I use --nodmraid or --dmraid. Both options have the same result. In the forum discussion you can find the (correct) outputs of dmraid with some parameters after booting from livecd. 
http://forums.fedoraforum.org/showthread.php?t=230225
I tried with F12 alpha and the current F11.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Configure RAID 10 in the BIOS
2.Start with the livecd or directly with the install cd
3.Go forward to the disk management => only /dev/sda is shown
  
Actual results:


Expected results:


Additional info:

Comment 1 Hans de Goede 2009-09-18 09:21:18 UTC

Interesting promise BIOS RAID 10, I've never encountered that before :)

IOW you are the first to try this with Fedora, and maybe even with Linux in general, so chances are there will be some bugs left and right. The good news
is we already have Intel BIOS RAID 10 working, so the generic BIOS RAID code path can handle this.

Which means that dmraid itself is a likely culprit for the cause of this.

For starters can you please run the following commands
(from the live cd, or from tty 2 (ctrl + alt +f2) on the installers GUI welcome
screen), and paste the output here?

blkid -o udev -p /dev/sda
blkid -o udev -p /dev/sdb
blkid -o udev -p /dev/sdc
blkid -o udev -p /dev/sdd

dmraid -ay -t

And when on the welcome screen, or on the livecd after making sure non
of your disks are mounted in anyway:
dmraid -ay
ls /dev/mapper

Thanks!

Comment 2 Hans de Goede 2009-09-18 09:22:58 UTC

Oh, also, can you please boot the installer cd / dvd, then press next on the initial gui screen, wait for the "finding storage devices" dialog to disappear and then get all the log files under /tmp, and attach them here ?

You can use scp from tty2 to get them to another machine.

Comment 3 Markus Mehrwald 2009-09-18 17:51:10 UTC

Ok, I did what you asked me for.
Because the -o udev option caused an error I changed it to full and here is the output

[root@localhost ~]# blkid -o udev /dev/sda
Invalid output format udev. Choose from value,
	device, list, or full
[root@localhost ~]# blkid -o full /dev/sda
[root@localhost ~]# blkid -o full /dev/sdb
[root@localhost ~]# blkid -o full /dev/sdc
[root@localhost ~]# blkid -o full /dev/sdd

[root@localhost ~]# dmraid -ay -t
pdc_bafbccaha-0: 0 1953279744 striped 2 128 /dev/sda 0 /dev/sdb 0
pdc_bafbccaha-1: 0 1953279744 striped 2 128 /dev/sdc 0 /dev/sdd 0
pdc_bafbccaha: 0 1953279744 mirror core 2 131072 nosync 2 /dev/mapper/pdc_bafbccaha-0 0 /dev/mapper/pdc_bafbccaha-1 0 1 handle_errors

[root@localhost ~]# dmraid -ay 
RAID set "pdc_bafbccaha" already active

[root@localhost ~]# ls /dev/mapper/
control         live-rw        pdc_bafbccaha-0
live-osimg-min  pdc_bafbccaha  pdc_bafbccaha-1

The logfiles will follow in a few minutes.

Comment 4 Markus Mehrwald 2009-09-18 18:16:51 UTC

Created attachment 361702 [details]
log files after anaconda started

Comment 5 Hans de Goede 2009-09-21 14:36:14 UTC

As I already thought, the logs show that somehow sda does not get identified
as a member of the raidset, which explains what we are seeing here.

As for the empty blkid output for all disks, that is my bad F-11 uses
vol_id, not blkid, can you please run the following and copy and paste the output here:

/lib/udev/vol_id --export /dev/sda
/lib/udev/vol_id --export /dev/sdb
/lib/udev/vol_id --export /dev/sdc
/lib/udev/vol_id --export /dev/sdd

Also could you perhaps download boot.iso from todays rawhide, and give that
a try. If that does not work either, please switch to tty2 (ctrl + alt + f2) once
in the GUI part of the installer, and run the following:

blkid -o udev -p /dev/sda
blkid -o udev -p /dev/sdb
blkid -o udev -p /dev/sdc
blkid -o udev -p /dev/sdd

Thanks!

Comment 6 Markus Mehrwald 2009-09-27 15:00:25 UTC

Here is the output of what you wanted

[root@localhost ~]# /lib/udev/vol_id --export /dev/sda
unknown or non-unique volume type (--probe-all lists possibly conflicting types)
[root@localhost ~]# /lib/udev/vol_id --export /dev/sdb
ID_FS_USAGE=raid
ID_FS_TYPE=promise_fasttrack_raid_member
ID_FS_VERSION=
ID_FS_UUID=
ID_FS_UUID_ENC=
ID_FS_LABEL=
ID_FS_LABEL_ENC=
[root@localhost ~]# /lib/udev/vol_id --export /dev/sdc
ID_FS_USAGE=raid
ID_FS_TYPE=promise_fasttrack_raid_member
ID_FS_VERSION=
ID_FS_UUID=
ID_FS_UUID_ENC=
ID_FS_LABEL=
ID_FS_LABEL_ENC=
[root@localhost ~]# /lib/udev/vol_id --export /dev/sdd
ID_FS_USAGE=raid
ID_FS_TYPE=promise_fasttrack_raid_member
ID_FS_VERSION=
ID_FS_UUID=
ID_FS_UUID_ENC=
ID_FS_LABEL=
ID_FS_LABEL_ENC=

I will try the boot.iso the next days.

Comment 7 Hans de Goede 2009-09-27 18:18:46 UTC

Thanks,

The output is as expected (as in it explains the problem) lets hope blkid in F-12 does better, otherwise I'll get in touch with the blkid maintainer and we'll see from there.

Comment 8 Markus Mehrwald 2009-09-28 14:30:15 UTC

I tried now the current boot.iso with nearly the same output as vol_id shows me and the same problem with sda

Comment 9 Karel Zak 2009-09-29 08:58:51 UTC

The "unknown or non-unique volume type" usually means that there is more valid superblocks on the device. Unfortunately, vol_id does not provide proper error message about it. The blkid (from util-linux-ng!) should be more verbose.

Please, use

 # BLKID_DEBUG=0xffff blkid -p /dev/sda

if you want to see more details about the device probing.

You need to wipe old superblocks from the device, libblkid cannot resolve any conflicts between non-unique signatures. This is mkfs (mkswap, raid initializer, ..) responsibility to remove old signatures from the disk.

Comment 10 Markus Mehrwald 2009-09-29 17:04:44 UTC

Is it possible that it is some kind of hardware problem? I swaped the cables and after that I had the same problem but with sdb. Nevertheless I think anaconda should show me something different than the "broken" disk.

I tried with BLKID_DEBUG as well but no new informations. Also after deleting everything with fdisk.

Comment 11 Karel Zak 2009-09-29 18:02:29 UTC

(In reply to comment #10)
> Is it possible that it is some kind of hardware problem? I swaped the cables
> and after that I had the same problem but with sdb. Nevertheless I think

I guess the disknames are allocated dynamically.

> anaconda should show me something different than the "broken" disk.
> 
> I tried with BLKID_DEBUG as well but no new informations. Also after deleting
> everything with fdisk. 

Do you have blkid from util-linux-ng? For example:

  $ rpm -qf /sbin/blkid
  util-linux-ng-2.15.1-1.fc12.x86_64

Comment 12 Markus Mehrwald 2009-09-29 18:27:39 UTC

Yes, I do. Here again the output. The top most lines show the util-linux-ng version

[root@localhost ~]# rpm -qf /sbin/blkid
util-linux-ng-2.16-10.fc12.x86_64
[root@localhost ~]# BLKID_DEBUG=0xffff blkid -p /dev/sda
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500106780160
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
highpoint_raid_member: call probefunc()
adaptec_raid_member: call probefunc()
jmicron_raid_member: call probefunc()
vfat: magic sboff=510, kboff=0
vfat: call probefunc()
ufs: call probefunc()
sysv: call probefunc()
<-- leaving probing loop (failed) [idx=49]
[root@localhost ~]# BLKID_DEBUG=0xffff blkid -p /dev/sdb
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500107862016
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
assigning TYPE
assigning USAGE
<-- leaving probing loop (type=promise_fasttrack_raid_member) [idx=7]
returning TYPE value
/dev/sdb: TYPE="promise_fasttrack_raid_member" returning USAGE value
USAGE="raid" 
[root@localhost ~]# BLKID_DEBUG=0xffff blkid -p /dev/sdc
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500107862016
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
assigning TYPE
assigning USAGE
<-- leaving probing loop (type=promise_fasttrack_raid_member) [idx=7]
returning TYPE value
/dev/sdc: TYPE="promise_fasttrack_raid_member" returning USAGE value
USAGE="raid" 
[root@localhost ~]# BLKID_DEBUG=0xffff blkid -p /dev/sdd
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500107862016
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
assigning TYPE
assigning USAGE
<-- leaving probing loop (type=promise_fasttrack_raid_member) [idx=7]
returning TYPE value
/dev/sdd: TYPE="promise_fasttrack_raid_member" returning USAGE value
USAGE="raid"

Comment 13 Karel Zak 2009-09-29 19:35:05 UTC

Hmm.. strange, it really seems that there is nothing (superblock) useful on the /dev/sda device.

Please, check also dmesg output, maybe you will found something strange about your sda or sdb.

The promise raid superblock should be at end of the device. Please, send me last 2MiB of your sda and sdb disk, something like:

 # dd if=/dev/sda of=~/promise-sda.img skip=$(( ($(blockdev --getsize64 /dev/sda) / (1024 * 1024)) - 2 )) bs=1MiB

 # gzip ~/promise-sda.img


Note, the promise magic string is "Promise Technology, Inc." and should be at
sectors 63, 255, 256, 16, 399 or 0 from end of the device. For example

 # for sec in 63 255 256 16 399 0; do let offset="($(blockdev --getsz /dev/sda) - $sec ) * 512"; hexdump -C -s $offset -n 24 /dev/sda; done

should be able to found the magic string.

Comment 14 Markus Mehrwald 2009-10-01 19:09:55 UTC

Ok, here we have the problem: sda does not have the magic string as you can see in the hexdump:

[root@localhost ~]# for sec in 63 255 256 16 399 0; do let offset="($(blockdev --getsz /dev/sda)
> - $sec ) * 512"; hexdump -C -s $offset -n 24 /dev/sda; done
7470af6000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470af6010  00 00 00 00 00 00 00 00                           |........|
7470af6018
7470ade000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470ade010  00 00 00 00 00 00 00 00                           |........|
7470ade018
7470adde00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470adde10  00 00 00 00 00 00 00 00                           |........|
7470adde18
7470afbe00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470afbe10  00 00 00 00 00 00 00 00                           |........|
7470afbe18
7470acc000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470acc010  00 00 00 00 00 00 00 00                           |........|
7470acc018
7470afde00
[root@localhost ~]# for sec in 63 255 256 16 399 0; do let offset="($(blockdev --getsz /dev/sdb) - $sec ) * 512"; hexdump -C -s $offset -n 24 /dev/sdb; done
7470bfe200  50 72 6f 6d 69 73 65 20  54 65 63 68 6e 6f 6c 6f  |Promise Technolo|
7470bfe210  67 79 2c 20 49 6e 63 2e                           |gy, Inc.|
7470bfe218
7470be6200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470be6210  00 00 00 00 00 00 00 00                           |........|
7470be6218
7470be6000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470be6010  00 00 00 00 00 00 00 00                           |........|
7470be6018
7470c04000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470c04010  00 00 00 00 00 00 00 00                           |........|
7470c04018
7470bd4200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
7470bd4210  00 00 00 00 00 00 00 00                           |........|
7470bd4218
7470c06000

This would also explain why after swaping the cables sdb was the problem. It seems for me that the raid controller does not write the correct things on the disk which is pluged to (I think) SATA_0.

Comment 15 Markus Mehrwald 2009-10-01 19:12:00 UTC

Created attachment 363389 [details]
Image of sda

Comment 16 Markus Mehrwald 2009-10-01 19:12:21 UTC

Created attachment 363391 [details]
Image of sdb

Comment 17 Markus Mehrwald 2009-10-01 19:48:17 UTC

I copied now the blocks from a working disk to sda. blkid seems to give the correct output now:

[root@localhost ~]# BLKID_DEBUG=0xffff blkid -o udev -p /dev/sda
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500106780160
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
assigning TYPE
assigning USAGE
<-- leaving probing loop (type=promise_fasttrack_raid_member) [idx=7]
returning TYPE value
ID_FS_TYPE=promise_fasttrack_raid_member
returning USAGE value
ID_FS_USAGE=raid
[root@localhost ~]# BLKID_DEBUG=0xffff blkid -o udev -p /dev/sdb
libblkid: debug mask set to 0xffff.
reseting blkid_probe
ready for low-probing, offset=0, size=500107862016
--> starting probing loop [idx=-1]
linux_raid_member: call probefunc()
ddf_raid_member: call probefunc()
isw_raid_member: call probefunc()
lsi_mega_raid_member: call probefunc()
via_raid_member: call probefunc()
silicon_medley_raid_member: call probefunc()
nvidia_raid_member: call probefunc()
promise_fasttrack_raid_member: call probefunc()
assigning TYPE
assigning USAGE
<-- leaving probing loop (type=promise_fasttrack_raid_member) [idx=7]
returning TYPE value
ID_FS_TYPE=promise_fasttrack_raid_member
returning USAGE value
ID_FS_USAGE=raid

But the problem now is, that the diskmanager from anaconda does not show a single disk cause an error that no device for installation was found.

Comment 18 Hans de Goede 2009-10-01 20:45:38 UTC

(In reply to comment #17)
> But the problem now is, that the diskmanager from anaconda does not show a
> single disk cause an error that no device for installation was found.  

Which is sort of to be expected after copying the metadata from disk to the next, as now they probably have some disk serial no inside the metadata indentical.

Can you:
1) see if dmraid will still recognize the set (I bet it wont)
2) see if the BIOS will still recognize the set
3) recreate the set in the BIOS (backup first!) and see if that
   maybe fixes things completely ?

Comment 19 Markus Mehrwald 2009-10-01 21:20:22 UTC

ad 1) Indeed it does. I can activate it and get the output of above including the created nodes in /dev/mapper
ad 2) BIOS recoginses the RAID as before
ad 3) I will try but I guess I will get the same results as before.

I will also have again a look to the log files because something should complain about the undetermined identifier if so.

Comment 20 Hans de Goede 2009-10-02 07:17:44 UTC

Interesting,

Can you:

1) re-copy the blocks from a working disk to sda (to make sure
   blkid identifies sda properly, as it might have been reset by some of your
   other tests)

2) run the installer from F-12 (rawhide / development), up till the select which
   disks to use for install screen

3) Copy /tmp/anaconda.log /tmp/storage.log /tmp/program.log /tmp/syslog somewhere
   (from tty2, use scp) and attach them here ?

It seems that we may have an other issue on top of not identifying sda properly
(oh joy).

Thanks!

Comment 21 Markus Mehrwald 2009-10-02 20:14:55 UTC

Created attachment 363525 [details]
Logfiles after the installer started

I activated the raid with dmraid -ay before starting the installer. I found something strange for sda but I do not know where this comes from.

Comment 22 Hans de Goede 2009-10-02 20:38:18 UTC

Judging from then logs things would probably have worked if you would not have
done a dmraid -ay first, you really should not do that. Actually since you are using the livecd, you may need to do a "dmraid -an" before starting the installer, can you give that a try please, then it will hopefully find your raidset, after which we can get back to the problem if why by default sda does not get recognized as a raid member (which then is the only problem left)

Comment 23 Markus Mehrwald 2009-10-03 00:19:41 UTC

Created attachment 363537 [details]
Logfiles with deactivated RAID sets

Sorry but this also has no effect. I deactivated the RAID set (nodes in /dev/mapper were gone) but the screen looks the same as before. I checked two times with activated and deactivated RAID set but it is always the same screen then. I attached the logs with deactivated RAID set.

Comment 24 Hans de Goede 2009-10-05 13:53:58 UTC

Hi Markus,

Hmm, this bug keeps on being interesting. Many thanks for your efforts in helping debugging this.

I'm afraid I'm going to ask a bit more of your patience and help.

Can you please start a recent F-12 livecd, and then start a terminal, and on
that terminal do "dmraid -an" and then start python (as root).

Then on the python prompt type (*exactly* as given):

import block
rss = block.getRaidSetFromRelatedMem(uuid=None, name="sda", major=8, minor=0)
len(rss)
rss[0].name
rss[0].activate(mknod=True)
exit()

Then do ls /dev/mapper

If your raid set was not activated, please start python again and try:

import block
c = block.dmraid.context()
rss = c.get_raidsets([])
len(rss)
rss[0].name
rss[0].activate(mknod=True)
exit()

And copy and paste the output of the above here (or attach it).

Thanks,

Hans

Comment 25 Markus Mehrwald 2009-10-05 19:25:41 UTC

Here we go, unfortunately with a python error

[root@localhost ~]# ls /dev/mapper/
control  live-osimg-min  live-rw
[root@localhost ~]# python
Python 2.6.2 (r262:71600, Jul 30 2009, 17:08:54) 
[GCC 4.4.1 20090725 (Red Hat 4.4.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import block
>>> rss = block.getRaidSetFromRelatedMem(uuid=None, name="sda", major=8, minor=0)
>>> len(rss)
0
>>> rss[0].name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> rss[0].activate(mknod=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> exit()
[root@localhost ~]# ls /dev/mapper/
control  live-osimg-min  live-rw
[root@localhost ~]# python
Python 2.6.2 (r262:71600, Jul 30 2009, 17:08:54) 
[GCC 4.4.1 20090725 (Red Hat 4.4.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import block
>>> c = block.dmraid.context()
>>> rss = c.get_raidsets([])
>>> len(rss)
1
>>> rss[0].name
'pdc_baafdefhg'
>>> rss[0].activate(mknod=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dmraid.raidset' object has no attribute 'activate'
>>> exit()

Comment 26 Hans de Goede 2009-10-06 07:12:05 UTC

Hi,

Ok, so we are making progress, the lowlevel python -> dmraid glue code is seeing your raidset, but for some reason the getRaidSetFromRelatedMem function is failing
to figure out that sda is part of the raidset the lowlevel code finds.

I don't often asked this, but given that this problem can be reproduced outside the installer, and that it needs some deep diving into pyblock code, could you perhaps boot the box from the livecd, start sshd (do a yum install first if necessary) and allow me to remotely debug this on the actual hardware in question ?

You can find me on irc to make a "date" for this remote debugging session, I'm hansg on the freenode IRC network, you can find my in #anaconda

Note we seem to have 2 issues on this box:
1) blkid failing to see sda as a raid member
2) pyblock getRaidSetFromRelatedMem function failing to match sda to your
   raidset

Thanks,

Hans

p.s.

1) If you feel uncomfortable giving a complete stranger access, thats fine, then I'll come up with a next set of python commands for you to type in and we will debug this that way.

2) Many thanks for your patience, this sure is one hard bug to hunt down.

Comment 27 Hans de Goede 2009-10-06 20:37:42 UTC

Markus, note this comment is mainly meant for fellow developers.

Heinz, adding you to the CC because I'm seeing some weird dmraid stuff here.

So Markus has given me ssh access to the machine in question and I've been looking
at why pyblock (which uses libdmraid) does not like his raidset.

And then things get weird.

get_raidsets reports one set, with 6 members, like this:

1 raidset, with 2 raid devs as members
1 raiddev sdb
1 raiddev sda
1 raidset, with 2 raid devs as members
1 raiddev sdc
1 raiddev sdd

2 Things are weird here, sdb sda are swapped in order, which would be normal / ok,
if the same thing was true for sdc and sdd, but it is not , this feels rather weird.

Even weirder is that although raid 10 is a nested raid, the top level set
reports not only 2 subsets, but also the 4 disks directly as devices

And then we get to the problem which is causing anaconda to not see the set,
both subsets report they are degraded.


Here are some dmraid logs:

[root@localhost ~]# dmraid -ay -t -vvv
WARN: locking /var/lock/dmraid/.lock
NOTICE: /dev/sdd: asr     discovering
NOTICE: /dev/sdd: ddf1    discovering
NOTICE: /dev/sdd: hpt37x  discovering
NOTICE: /dev/sdd: hpt45x  discovering
NOTICE: /dev/sdd: isw     discovering
NOTICE: /dev/sdd: jmicron discovering
NOTICE: /dev/sdd: lsi     discovering
NOTICE: /dev/sdd: nvidia  discovering
NOTICE: /dev/sdd: pdc     discovering
NOTICE: /dev/sdd: pdc metadata discovered
NOTICE: /dev/sdd: sil     discovering
NOTICE: /dev/sdd: via     discovering
NOTICE: /dev/sdc: asr     discovering
NOTICE: /dev/sdc: ddf1    discovering
NOTICE: /dev/sdc: hpt37x  discovering
NOTICE: /dev/sdc: hpt45x  discovering
NOTICE: /dev/sdc: isw     discovering
NOTICE: /dev/sdc: jmicron discovering
NOTICE: /dev/sdc: lsi     discovering
NOTICE: /dev/sdc: nvidia  discovering
NOTICE: /dev/sdc: pdc     discovering
NOTICE: /dev/sdc: pdc metadata discovered
NOTICE: /dev/sdc: sil     discovering
NOTICE: /dev/sdc: via     discovering
NOTICE: /dev/sdb: asr     discovering
NOTICE: /dev/sdb: ddf1    discovering
NOTICE: /dev/sdb: hpt37x  discovering
NOTICE: /dev/sdb: hpt45x  discovering
NOTICE: /dev/sdb: isw     discovering
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi     discovering
NOTICE: /dev/sdb: nvidia  discovering
NOTICE: /dev/sdb: pdc     discovering
NOTICE: /dev/sdb: pdc metadata discovered
NOTICE: /dev/sdb: sil     discovering
NOTICE: /dev/sdb: via     discovering
NOTICE: /dev/sda: asr     discovering
NOTICE: /dev/sda: ddf1    discovering
NOTICE: /dev/sda: hpt37x  discovering
NOTICE: /dev/sda: hpt45x  discovering
NOTICE: /dev/sda: isw     discovering
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi     discovering
NOTICE: /dev/sda: nvidia  discovering
NOTICE: /dev/sda: pdc     discovering
NOTICE: /dev/sda: pdc metadata discovered
NOTICE: /dev/sda: sil     discovering
NOTICE: /dev/sda: via     discovering
NOTICE: added /dev/sdd to RAID set "pdc_baafdefhg"
NOTICE: added /dev/sdc to RAID set "pdc_baafdefhg"
NOTICE: added /dev/sdb to RAID set "pdc_baafdefhg"
NOTICE: added /dev/sda to RAID set "pdc_baafdefhg"
pdc_baafdefhg-0: 0 1953124864 striped 2 128 /dev/sdb 0 /dev/sda 0
pdc_baafdefhg-1: 0 1953124864 striped 2 128 /dev/sdc 0 /dev/sdd 0
pdc_baafdefhg: 0 1953124864 mirror core 2 131072 nosync 2 /dev/mapper/pdc_baafdefhg-1 0 /dev/mapper/pdc_baafdefhg-0 0 1 handle_errors
INFO: Activating raid10 raid set "pdc_baafdefhg"
NOTICE: discovering partitions on "pdc_baafdefhg"
NOTICE: /dev/mapper/pdc_baafdefhg: dos     discovering
NOTICE: /dev/mapper/pdc_baafdefhg: dos metadata discovered
NOTICE: created partitioned RAID set(s) for /dev/mapper/pdc_baafdefhg
WARN: unlocking /var/lock/dmraid/.lock

dmraid -s -s :

*** Active Superset
name   : pdc_baafdefhg
size   : 1953124864
stride : 128
type   : raid10
status : ok
subsets: 2
devs   : 4
spares : 0
--> Active Subset
name   : pdc_baafdefhg-0
size   : 1953124864
stride : 128
type   : stripe
status : ok
subsets: 0
devs   : 2
spares : 0
--> Active Subset
name   : pdc_baafdefhg-1
size   : 1953124864
stride : 128
type   : stripe
status : ok
subsets: 0
devs   : 2
spares : 0

dmraid -r:
/dev/sdd: pdc, "pdc_baafdefhg-1", stripe, ok, 976562432 sectors, data@ 0
/dev/sdc: pdc, "pdc_baafdefhg-1", stripe, ok, 976562432 sectors, data@ 0
/dev/sdb: pdc, "pdc_baafdefhg-0", stripe, ok, 976562432 sectors, data@ 0
/dev/sda: pdc, "pdc_baafdefhg-0", stripe, ok, 976562432 sectors, data@ 0


Heinz, is the way the superset contains both sets and devs normal ? IIRC isw raid10 does not have this ? Also any idea why the 2 subsets would report being degraded ? Maybe after creation in the BIOS the set needs some OS-level tool to sync the mirrors ?

Comment 28 Heinz Mauelshagen 2009-10-06 23:03:06 UTC

Hans,

yes, the created mapping tables look ok.

I fail to spot where wou've ween 'degraded'. Status is 'ök'.

If the RAID set got initially created, any writes will update both mirrors hence offering data redundancy. Any filesystem won't rely on data it hasn't written before anyway.


Markus,
in order to judge the proper sequence of disks in each striped set,
a "dmraid -n /dev/sd[a-d]" would help.

Comment 29 Hans de Goede 2009-10-07 07:13:55 UTC

Heinz,

I've dived into the pyblock code to translate the python degraded attribute to
libdmraid code, the problem is that for the 2 subsets the following check
in pyblock fails:
rs->total_devs == rs->found_devs

This might just be pyblock doing evil stuff, or it might be a dmraid issue.

Comment 30 Heinz Mauelshagen 2009-10-07 09:49:15 UTC

Hans,

the above dmraid output (under reserve of the "dmraid -n" output WRT the disk ordering) looks sane so it seems pyblock has an issue.

Comment 31 Heinz Mauelshagen 2009-10-07 10:42:08 UTC

Hans and I checked pyblock code and found a fix using libdmraid S_OK() macro on rs->status avoiding the member checks metioned in comment#29 altogether.

Hans will create a patch and will ask Markus to test it.

Comment 32 Hans de Goede 2009-10-07 14:50:18 UTC

Markus,

Here is an updates.img which contains a fixed pyblock, which will hopefully recognize your raidset:
http://people.fedoraproject.org/~jwrdegoede/updates-524168-x86_64.img

To use this specify:
updates=http://people.fedoraproject.org/~jwrdegoede/updates-524168-x86_64.img

On the syslinux cmdline when you start anaconda. This means you will need to use
boot.iso and do a network install from rawhide. updates= is not supported from the livecd.

I hope this works, then we are back to only having to tackle the blkid problem, please let me know how it goes.

Heinz, note I only changed the check from rs->total_devs != rs->found_devs to
S_INCONSISTENT. I did not use S_OK, as sets freshly created in the BIOS often have need sync set in the metadata (atleast they do for ISW), yet we still want
to install on them.

Comment 33 Markus Mehrwald 2009-10-07 19:19:07 UTC

Created attachment 364027 [details]
Log with update image

I tried the update image but it causes an exception. Maybe I should turn something on/off before trying?

Comment 34 Hans de Goede 2009-10-07 19:31:46 UTC

Markus, hmm, once more one step further but for some reason there still is an issue, I need to remote debug this on your box again, but not tonight, as I'm currently in the middle of debugging something else.

Maybe tomorrow evening. I'll keep this bug on my todo list and ping you here
when I've got a free evening to work on this.

Comment 35 Hans de Goede 2009-10-08 11:03:01 UTC

I'm not going to be able to work on this tonight or tomorrow night, could
I perhaps log in to the machine next Monday around 19:00 CET ?

Comment 36 Markus Mehrwald 2009-10-08 12:17:39 UTC

Currently I am not able to say if I will be there. Most likely Wednesday will be possible. I will come in the IRC if it is possible for me.

Comment 37 Hans de Goede 2009-10-08 12:20:53 UTC

Ok, just ping me on IRC when you've got time I usually show up on IRC around 19 - 19:15 pm every weekday evening.

Comment 38 Hans de Goede 2009-10-20 06:02:55 UTC

Markus,

Thanks for the remote access to your machine. As already discussed on irc we've done a new build (and tested it from the livecd environment you provided, thanks!) which fixes the activation issues with Promise RAID sets.

I've also verified this fixes the exception seen in the latest backtrace by doing
the activation / deactivation from pyblock, and that exception no longer occurs.

This new build is dmraid-1.0.0.rc16-4.fc12:
http://koji.fedoraproject.org/koji/buildinfo?buildID=136909

rel-eng nominating this for F12Blocker status as without the fixed dmraid
installation to any Promise BIOS RAID sets will not be possible.

Regards,

Hans

Comment 39 Markus Mehrwald 2009-10-20 12:44:43 UTC

Hans, will I get it via a yum update? Are there any things left to do for me or for you? Otherwise I will install the machine (unfortunately) with the alpha of F12.

Regards,
Markus

Comment 40 Hans de Goede 2009-10-20 15:01:40 UTC

(In reply to comment #39)
> Hans, will I get it via a yum update? Are there any things left to do for me or
> for you? Otherwise I will install the machine (unfortunately) with the alpha of
> F12.
> 
> Regards,
> Markus  

Hi Markus,

If accepted for F-12 final inclusion, the update should show up in rawhide the next couple of days, and then you can do a net install using boot.iso from rawhide. This would be the path I would like you to take, as then the installer
path gets tested completely with all the fixed bits.

Thanks,

Hans

Comment 41 Adam Williamson 2009-10-23 17:29:04 UTC

This went into Rawhide on the 21st (2009-10-21). Please test an install with a tree or live image from that date or later. Thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 42 Markus Mehrwald 2009-10-26 12:58:45 UTC

I just tried the current rawhide image and it seams to work. I did not get any errors during the file system creation. Also the size with about 950GB seems to be ok although I expected 1TB as the RAID tool tells me. Unfortunately due to a file conflict with some x-devel packages I was not able to complete the installation but I think after creation of the file system without errors the installation will also work.

Thank you everyone for helping me with my problem. If you need some more testing please drop me a line.
Markus

Comment 43 Hans de Goede 2009-10-26 13:10:18 UTC

Hi Markus,

Many thanks for testing the latest rawhide! I hope I can ask one more good deed of you. Now that we have the dmraid -> pyblock -> anaconda chain working, I would
like to revisit the blkid issue.

Could you destroy the RAID set in the BIOS and recreate it? And then boot a
livecd (or the installer till the welcome screen and then switch to tty2 (ctrl+alt+F2), and then run blkid on /dev/sda ? :
blkid -o udev -p /dev/sda

If blkid then again mis identifies sda (so does not recognize it is part of
a promise BIOS RAID set, can you then please backup the sector you've overwritten the last time to get past this point ?

Once backed up feel free to do the copy the same sector from another disk over
it trick again (if necessary), and then install whatever you like, but please
do write down the offset of the sector.

Then we have developed a potential fix for the blkid issue, you can put the sector
back and test that way, there won't be a need to test a full re-install for that issue.

Many Thanks!

Regards,

Hans

Comment 44 Markus Mehrwald 2009-10-27 19:39:26 UTC

Hans,

I recreated my RAID with the BIOS tool. After that I bootet my livecd and tried blkid with this result:
[root@localhost ~]# blkid -o udev -p /dev/sda
ID_FS_TYPE=promise_fasttrack_raid_member
ID_FS_USAGE=raid
[root@localhost ~]# blkid -o udev -p /dev/sdb
ID_FS_TYPE=promise_fasttrack_raid_member
ID_FS_USAGE=raid

It seams that the RAID controler does not write anything to sda at all. Of course I can overwrite the sector 63 from the end with 0 again but then we will have the same output as in #c14. I do not think it is a problem of blkid but of the RAID controler itself.
If you want me to, I can overwrite the sector because I decided that I will wait for the stable of F12 until I install it for usage. Please let me know if this is necessary. And if so can you please provide me the shell calls for copying sector 63 from the end of a disk of one disk to another one because I am not that good in shell programming and the last time I did it was an awful nasty hack :)

Regards,
Markus

Comment 45 Hans de Goede 2009-10-28 08:12:06 UTC

Markus,

Thanks for testing. It might be that the RAID BIOS does not write anything to sda at all, or, which I find more likely the last time around something had overwritten the sector in question, this could have even been anaconda itself (it zeros out the beginning and ending of the disk in certain cases).

So I think the blkid issue is a non issue (but I'll keep it in my mind in case I get similar bug reports from other users).

I think I can conclude with: case closed!

Many thanks for all your testing!

Regards,

Hans

Note You need to log in before you can comment on or make changes to this bug.