Bug 474697 - Nvidia hardware RAID not initializing properly
Summary: Nvidia hardware RAID not initializing properly
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: dmraid
Version: 10
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Heinz Mauelshagen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-04 21:15 UTC by Andrew
Modified: 2009-12-18 11:48 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 07:09:51 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of dmraid -rD (414 bytes, application/x-bzip2)
2008-12-05 16:33 UTC, Vik Heyndrickx
no flags Details
Output of dmraid -rD (511 bytes, application/x-bzip2)
2008-12-05 17:41 UTC, Andrew
no flags Details
output of dmraid -rD (438 bytes, application/x-bzip2)
2008-12-05 18:59 UTC, Sergey
no flags Details
Output of the dmraid -rD (510 bytes, application/x-bzip2)
2008-12-06 14:32 UTC, Vladimir Duloglo
no flags Details
Output of dmraid -rD (438 bytes, application/x-bzip)
2008-12-06 16:41 UTC, Maurice Pijpers
no flags Details
output of dmraid -rD (426 bytes, application/x-bzip)
2008-12-09 16:12 UTC, Sergey
no flags Details

Description Andrew 2008-12-04 21:15:25 UTC
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4

Nvidia RAID volume was properly recognized by Anaconda during install (/dev/mapper/nvidia_xxxxxxxx1) and I installed to this.  After finishing install and rebooting, the drives are mounted as /dev/sda1, /dev/sda3, etc. instead of the /dev/mapper/nvidia_xxxxx paths.

For more info from several others experiencing the issue, please see these forum threads:

http://forums.fedoraforum.org/showthread.php?t=206206
http://forums.fedoraforum.org/showthread.php?t=206284

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Actual Results:  
Volumes are mounted on /dev/sdax paths instead of /dev/mapper/nvidia_xxxxxx paths.

Expected Results:  
Volumes are mounted on /dev/mapper/nvidia_xxxxxx paths.

# dmraid -v -d -r
/dev/sdb: "pdc" and "nvidia" formats discovered (using nvidia)!
/dev/sda: "pdc" and "nvidia" formats discovered (using nvidia)!
INFO: RAID devices discovered:

/dev/sdb: nvidia, "nvidia_dceibdfa", mirror, ok, 625142446 sectors, data@ 0
/dev/sda: nvidia, "nvidia_dceibdfa", mirror, ok, 625142446 sectors, data@ 0


# mount
/dev/sda3 on / type ext3 (rw)
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

Comment 1 Andrew 2008-12-04 22:14:16 UTC
Note that this issue is new to Fedora 10.  I used Fedora 9 on this same computer with no issue.

This is Fedora 10 x86_64 installed on an Athlon64 machine.  ASUS M2N-SLI motherboard.

I'm happy to provide any logs, troubleshooting data, or run tests...just let me know what you need.

Comment 2 Vladimir Duloglo 2008-12-04 22:39:33 UTC
Same problem. My configuration is following: x86_64 AMD Phenom x4,  ASUS M3A motherboard. 

Additional information:

Volumes mounted correctly as /dev/mapper/nvidia_xxxxxx during installation and when booting up in the rescue mode. In normal mode, volumes mounted as /dev/sdax.

Comment 3 Vik Heyndrickx 2008-12-04 23:41:03 UTC
Also having the same problem. Fedora 10 (from DVD, no updates) on a Jetway JNC62K with nvidia softRAID controller, with 2 mirrored disks.

[root@localhost ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb3            236109048   1219308 222896016   1% /
/dev/sdb1               124427     13184    104819  12% /boot
tmpfs                   964108         0    964108   0% /dev/shm

These should be /dev/mapper/... entries and not /dev/sdb* or /dev/sda*.

[root@localhost ~]# dmraid -r
/dev/sdb: nvidia, "nvidia_ifaddeda", mirror, ok, 488397166 sectors, data@ 0
/dev/sda: nvidia, "nvidia_ifaddeda", mirror, ok, 488397166 sectors, data@ 0

[root@localhost ~]# modprobe dm-mirror
FATAL: Module dm_mirror not found.

[root@localhost ~]# /sbin/dmraid -ay -i -p -t
nvidia_ifaddeda: 0 488397166 mirror core 2 131072 nosync 2 /dev/sda 0 /dev/sdb 0 1 handle_errors

resolve_dm_name nvidia_ifaddeda in /etc/rc.sysinit resolves to an empty string

[root@localhost ~]# /sbin/dmraid -v -v -ay -i -p nvidia_ifaddeda
NOTICE: /dev/sdb: asr     discovering
...
NOTICE: /dev/sdb: lsi     discovering
NOTICE: /dev/sdb: nvidia  discovering
NOTICE: /dev/sdb: nvidia metadata discovered
NOTICE: /dev/sdb: pdc     discovering
...
NOTICE: /dev/sda: via     discovering
NOTICE: added /dev/sdb to RAID set "nvidia_ifaddeda"
NOTICE: added /dev/sda to RAID set "nvidia_ifaddeda"
RAID set "nvidia_ifaddeda" was not activated

While doing the above command, the following is logged in /var/log/messages
Dec  5 00:39:14 localhost kernel: device-mapper: table: 253:0: mirror: Device lookup failure
Dec  5 00:39:14 localhost kernel: device-mapper: ioctl: error adding target to table

Here I am lost. Anyone?

Comment 4 Heinz Mauelshagen 2008-12-05 13:25:13 UTC
dm_mirror is linked into the Fedora kernel, hence the module load failure.
See "dmsetup targets" output. It should show up therein.

Please attach the bzip2'ed + tar'ed output files of "dmraid -rD" (*.{dat,offset,size} files} for analysis.

This maybe related to bz#471689 too, where a nash bug prevents the RAID set named to be activated...

Comment 5 Vik Heyndrickx 2008-12-05 16:33:43 UTC
Created attachment 325875 [details]
output of dmraid -rD

As requested, the tar+bzip2ed output of dmraid -rD.

Comment 6 Andrew 2008-12-05 17:41:15 UTC
Created attachment 325887 [details]
Output of dmraid -rD

Output of dmraid -rD, as requested.

Comment 7 Sergey 2008-12-05 18:59:20 UTC
Created attachment 325895 [details]
output of dmraid -rD

as requested

Comment 8 Vladimir Duloglo 2008-12-06 14:32:23 UTC
Created attachment 326000 [details]
Output of the dmraid -rD

Output of the dmraid -rD

Comment 9 Maurice Pijpers 2008-12-06 16:41:28 UTC
Created attachment 326007 [details]
Output of dmraid -rD

Same thing happens to me with a striped diskset (RAID0) on M57SLI-S4 mainboard with Fedora 10 (2.6.27.5-117.fc10.x86_64).

Comment 10 Roel Gloudemans 2008-12-07 11:42:00 UTC
I'm affected by this too. Watch out when resolving this bug. When the raidset is not initialized, the second disk from the set is used. If the raidset _is_ initialized, metadata is read from the first disk, destroying the changes on the second. I found this out the hard way after booting from the rescue disk (which initializes the set fine) and making changes on the disk.

Comment 11 Andrew 2008-12-08 19:21:05 UTC
Roel-  I noticed this, too, and I'm worried about the implications when this bug is fixed.  Hopefully there will either be a simple way to rebuild the array from the second drive to fix the problem or the Fedora will provide some method/process to get back to a properly-working array without serious data loss.  Fingers crossed.  (Hopefully they don't just publish a new kernel or package that fixes this that gets downloaded through yum and auto-hoses us.)

Comment 12 Andrew 2008-12-08 19:26:22 UTC
One other warning to all experiencing this bug, though I'm not certain this is caused by this bug or might be another bug.  I noticed that if I modify my grub.conf file (I tried to remove the "hiddenmenu" line) while the array is not properly initialized (e.g., / mounted from /dev/sda3), then my bootloader config is totally hosed.  When I reboot, it drops into the GRUB shell and I have to know my exact GRUB config to manually boot it again.  Anybody have this problem (or successfully change their grub.conf while having this issue)?

Comment 13 Roel Gloudemans 2008-12-08 20:14:34 UTC
Andrew- That is exactly what happened to me. I quoted out "hiddenmenu" and then fixed that with the rescue CD hosing my disk.

As a precaution you could disconnect the first disk of the array. When you reconnect it again, the array should be rebuild, but I never tried this.

Comment 14 Heinz Mauelshagen 2008-12-09 12:28:38 UTC
I assume this is another duplicate of bz#471689.

Comment 15 Roel Gloudemans 2008-12-09 12:43:12 UTC
Could be, though the error messages are not the same. Also, because Fedora did work during installation and because the new spiffy bootscreen is hiding all the errors, there might actually be a lot of users out there who do not know they are affected.They will get a nasty suprise when they install the patch.

I'm not in the position to try out any more. I needed my PC so I switched to software raid instead

Comment 16 Sergey 2008-12-09 16:12:21 UTC
Created attachment 326342 [details]
output of dmraid -rD

I have exactly the same problem with RH5.2

2.6.18-92.1.18.el5

Attached output of dmraid -rD

Is it possible that grub does something funny and confuses fakeraid? ... because it looks like this problem happens only when you boot from the hard drive - if you boot from a rescue CD or install from CD ... everything looks fine? .. I am not an expert :) so most likely wrong.

Comment 17 Andrew 2008-12-09 17:29:12 UTC
Heinz- This may be a dup of 471689, but as Roel says, it presents differently.  Is there any ETA when the fix for 471689 will be released in Fedora 10 packages?  I am unable to update my kernel package, modify my grub.conf, or otherwise change my /boot filesystem without rendering my system completely unbootable.  Makes me a bit twitchy.

Comment 18 Heinz Mauelshagen 2008-12-10 16:19:52 UTC
Andrew,

Anaconda colleagues are looking into it.
No ETA yet but ASAP.

Keeping component for the time being until we've got tests with that future fix on Fedora 10.

Comment 19 Andrew 2008-12-10 17:45:32 UTC
(In reply to comment #18)

Thanks, Heinz.  Again, I'm happy to collect any further info or even test patches, workarounds, etc.  Just let me know.

Comment 20 Richie Rich 2008-12-12 15:58:21 UTC
I'm wonder if bug #474074 is also related to these.

I'm running Sil and Promise cards using software raid (2 disk striped raid sets).  During a normal boot lvscan sees the volume groups, and activates the one VG, but the rest fail.  Device-mapper spits out something about striped not being able to parse the destination (probably to do with the volume not being activated properly).  Rescue disk works fine (mounts show up as /dev/dm-x instead of /dev/device-mapper/vg-xxx, but at least I can access the data).

I've attached tonnes of information to the above bug # if it helps anyone out.

Comment 21 David Jansen 2008-12-16 21:21:25 UTC
I encountered a similar problem with intel onboard raid, bug #474399. May be related.

Comment 22 Andrew 2009-01-14 18:42:48 UTC
I'm gathering that our definitions of "ASAP" may be a little different...

Comment 23 Lloyd Matthews 2009-01-31 16:24:25 UTC
I have an Nvidia based motherboard, and I have found that the solutions from bug #473305 did the trick.  I used the scsi_mod.scan=sync on the DVD install image and on the installed kernel.  Then updated only nash and mkinitrd, then all the rest of the updates.  This way, the new kernel was built with the mkinitrd fix.

Comment 24 Darin May 2009-02-26 11:26:12 UTC
If this is fixed in mkinitrd would a fresh network install or a newly built jigdo ISO solve the problem?

Comment 25 Nicolas Chauvet (kwizart) 2009-04-14 16:21:19 UTC
I was able to install Rawhide (F-11) on such nvidia raid 0 configuration. 
But I haven't re-tested for F-10... Is this bug supposed to be closed now ? Or are we waiting for a confirmation for F-10 ?

Comment 26 Nicolas Chauvet (kwizart) 2009-06-30 12:56:13 UTC
I don't feel empowered to close this bug, But unless something is silently missing, it should be closed, at least to avoid false positive search results.

Comment 27 Bryn M. Reeves 2009-06-30 13:40:11 UTC
Hi Nicolas, yes - someone with appropriate hardware needs to test on the release the bug is filed against (f10). If there was ever an f11 version of this BZ that could be closed based on your testing but this one requires testing in f10.

Comment 28 Maurice Pijpers 2009-07-01 13:40:47 UTC
I retested this after upgrading to kernel 2.6.27.25-170.2.72.fc10.x86_64 on f10 and the problem remains. dmraid shows the same output as before and there is no /dev/mapper/nvidia_xxxxxxxx entry. Has anyone heard that this should be fixed in f10 ? Or are we assuming this because it was fixed in f11 (which also works for me).

Comment 29 Bug Zapper 2009-11-18 09:38:24 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 30 Darin May 2009-11-18 18:24:02 UTC
This is still fixed in the latest fedora release, please move.

Comment 31 Bug Zapper 2009-12-18 07:09:51 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.