Bug 479116

Summary: NVidia dmraid metadata - wrong offset for volume
Product: [Fedora] Fedora Reporter: Milan Broz <mbroz>
Component: dmraidAssignee: Heinz Mauelshagen <heinzm>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: agk, bmr, dwysocha, heinzm, kwizart, lvm-team, mbroz, prockai, pvrabec
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-09 15:21:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmraid -rD on Fedora 10 x86_64
none
fdisk on kwizatz partition
none
diff between the previous commands none

Description Milan Broz 2009-01-07 11:28:48 UTC
+++ This bug was initially created as a clone of Bug #474074 +++

--- Additional comment from kwizart on 2009-01-05 06:03:30 EDT ---

same problem here with : FakeRaid using nForce 3 stripe
00:09.0 IDE interface [0101]: nVidia Corporation CK8S Serial ATA Controller (v2.5) [10de:00ee] (rev a2)
00:0a.0 IDE interface [0101]: nVidia Corporation CK8S Serial ATA Controller (v2.5) [10de:00e3] (rev a2)

I will provide a comparaison with a multiboot on same hardware using:
- Fedora 10 x86_64 with failing LVM
- Centos 5.2 i686 with accurate LVM

--- Additional comment from kwizart on 2009-01-05 06:04:57 EDT ---

Created an attachment (id=328182)
lvmdump -a on Fedora 10 x86_64 with failing LVM

--- Additional comment from kwizart on 2009-01-05 06:05:58 EDT ---

Created an attachment (id=328183)
lvmdump -a on CentOS 5.2 with accurate LVM

--- Additional comment from mbroz on 2009-01-05 07:14:21 EDT ---

(In reply to comment #15)
> same problem here with : FakeRaid using nForce 3 stripe

No, this is different problem. The mapping is really (over)complicated here...

You have 3 VG on system mapped over Nvidia dmraid (striped) mapping.
IOW - 7 nvidia_cebfcccgN volumes, some of them are PVs for 3 LVM VolumeGroup0[0-2].

In CentOS lvmdump, there is one striped dmraid device (nvidia_cebfcccg), over it 7 nvidia devices (nvidia_cebfcccgp1,5,6,7,8,9,1) and volumes 7,8,10 are PVs for three VolGroup0[0-2].

In Fedora lvmdump, dmraid activated Nvidia volumes differently: there is the same striped dmraid device (nvidia_cebfcccg) but nvidia volumes5,6,7,8,9  are now stacked over new nvidia volume 2 (nvidia_cebfcccgp2).

Only VolGoup00 and 02 is activated here (VolGroup01 is missing - is this the reported problem?).

Maybe just order of activation in initscripts...

If you run manually "vgchange -a y" after the Fedora boots, is the VolGroup01 now activated?
If not, please can you post output of "vgchange -vvvv -a y" here?

(Anyway the problem in comment #15 - #17 is different - dmraid (fake raid) related, former problem is pure LVM mapping.)

--- Additional comment from kwizart on 2009-01-05 08:39:58 EDT ---

(In reply to comment #18)
> (In reply to comment #15)
...
> Only VolGoup00 and 02 is activated here (VolGroup01 is missing - is this the
> reported problem?).
Yes, but there is also the nvidia_cebfcccg5 which is reported as NTFS by fdisk but cannot be mounted:
[root@kwizatz ~]# LANG=C;mount /dev/mapper/nvidia_cebfcccg5 /mnt/
mount: you must specify the filesystem type

> Maybe just order of activation in initscripts...
> 
> If you run manually "vgchange -a y" after the Fedora boots, is the VolGroup01
> now activated?
[root@kwizatz ~]# vgchange -a y
  1 logical volume(s) in volume group "VolGroup02" now active
So , nope ): VolGroup02 was already activated
> If not, please can you post output of "vgchange -vvvv -a y" here?
output following.
> (Anyway the problem in comment #15 - #17 is different - dmraid (fake raid)
> related, former problem is pure LVM mapping.)
Should I report another bug ?

--- Additional comment from kwizart on 2009-01-05 08:40:56 EDT ---

Created an attachment (id=328195)
output of vgchange-vvvv-a-y on kwizatz

--- Additional comment from mbroz on 2009-01-05 12:59:58 EDT ---

well, the command was not not exacly correct... :-)
#lvmcmdline.c:914         Processing: vgchange -vvvv -a y 2
..
#toollib.c:493   Volume group "2" not found


But the problem that the scan it doesn't find the PV correctly.

#device/dev-io.c:439         Opened /dev/dm-6 RO O_DIRECT
#device/dev-io.c:134         /dev/dm-6: block size is 2048 bytes
#label/label.c:184       /dev/dm-6: No label detected

and here should be PV label. It isn't because dmraid wrongly shifts device to completely different offset.

CentOS:
253:0 -> nvidia_cebfcccg: 0 796593920 striped 2 128 8:16 0 8:0 0
253:6 -> nvidia_cebfcccgp8: 0 41945652 linear 253:0 609008148

Fedora:
253:0 -> nvidia_cebfcccg: 0 796593920 striped 2 128 8:16 0 8:0 0
253:2 -> nvidia_cebfcccgp2: 0 670745880 linear 253:0 125837145
253:6 -> nvidia_cebfcccgp8: 0 41945652 linear 253:2 609008148

So the device is mapped through nvidia_cebfcccgp2 device which is shifted by 125837145 sectors!

This is clearly dmraid bug (or incompatibility?), I'll clone the bug for dmraid.

--- Additional comment from mbroz on 2009-01-05 13:11:43 EDT ---

Maybe it is bug 474697 ...

Heinz, is the comment #21 a known nvidia dmraid related bug?


...

--- Additional comment from heinzm on 2009-01-07 04:35:46 EDT ---

(In reply to comment #22)
> Maybe it is bug 474697 ...
> 
> Heinz, is the comment #21 a known nvidia dmraid related bug?

Refering to your "shifted by 125837145 sectors" comment: I wonder if there could be another partition not being discovered/activated upfront p2.


This is not a known dmraid bug.

We'd need the metadata retrieved
"dmraid -rD ; tar jcvf nvidia-bz474074-raid0.tar.bz2 *.{dat,offset,size}"
attached here to analyze, if the partition discovery/activation is bogus.


BTW (Richie) : does "dmraid -pay ; kpartx -a /dev/mapper/nvidia_cebfcccg" do right ?


On the MD front: is the nvidia metadata detected by dmraid just legacy and hence superfluous and obviously invalid WRT the partition tables ?
In that case, "dmraid -rE" would be appropriate before creating an MD array to remove it.

Comment 1 Milan Broz 2009-01-07 11:31:18 UTC
Please attach metadata for analysing this, thanks.

"dmraid -rD ; tar jcvf nvidia-b479116-raid0.tar.bz2 *.{dat,offset,size}"

Comment 2 Nicolas Chauvet (kwizart) 2009-01-07 11:48:54 UTC
Created attachment 328366 [details]
dmraid -rD on Fedora 10 x86_64

Does a comparaison with another "Family" OS would be interesting ?

Comment 3 Heinz Mauelshagen 2009-01-07 13:31:12 UTC
(In reply to comment #2)
> Created an attachment (id=328366) [details]
> dmraid -rD on Fedora 10 x86_64
> 
> Does a comparaison with another "Family" OS would be interesting ?

The striped mapping as of this metadata activates fine here.

So we're down to partition discovery and handling:
can you try "dmraid -p -ay" (presumably the RAID set ain't active) in order to activate the basic stripe only and "kpartx -a /dev/mapper/nvidia_cebfcccg" in order to activate the partitions via kpartx rather than dmraid itself and report the results here ?

Comment 4 Nicolas Chauvet (kwizart) 2009-01-07 13:46:35 UTC
[root@kwizatz ~]# dmraid -p -ay
RAID set "nvidia_cebfcccg" already active
[root@kwizatz ~]# kpartx -a /dev/mapper/nvidia_cebfcccg
device-mapper: reload ioctl failed: Invalid argument
[root@kwizatz ~]# kpartx -a /dev/mapper/nvidia_cebfcccg5
[root@kwizatz ~]# mount /dev/mapper/nvidia_cebfcccg5 /mnt/
mount: you must specify the filesystem type
[root@kwizatz ~]# kpartx -l /dev/mapper/nvidia_cebfcccg
nvidia_cebfcccg1 : 0 125837082 /dev/mapper/nvidia_cebfcccg 63
nvidia_cebfcccg2 : 0 670745880 /dev/mapper/nvidia_cebfcccg 125837145
nvidia_cebfcccg5 : 0 335549592 /dev/dm-1 63
nvidia_cebfcccg6 : 0 819252 /dev/dm-1 335549718
nvidia_cebfcccg7 : 0 146801907 /dev/dm-1 336369033
nvidia_cebfcccg8 : 0 41945652 /dev/dm-1 483171003
nvidia_cebfcccg9 : 0 401562 /dev/dm-1 525116718
nvidia_cebfcccg10 : 0 125837082 /dev/dm-1 525518343


I'm not sure to understand how LVM device-mapper dmraid works, so I let you drive.

Comment 5 Heinz Mauelshagen 2009-01-07 13:50:42 UTC
To start out clean, mappings have to be inactive:

dmraid -an
dmsetup ls # May not show any nvidia* devices
dmraid -pay
kpartx -a /dev/mapper/nvidia_cebfcccg

Comment 6 Nicolas Chauvet (kwizart) 2009-01-08 13:20:30 UTC
So...
I don't know how can I umount the "/" partition since i'm using it.
vgchange -an said that partitions 7 and 10 cannot be unmounted either for some reasons.
Does the use of a livecd could help ? or using a rescue cd ?

Comment 7 Heinz Mauelshagen 2009-01-08 14:33:21 UTC
Either live or rescue should do.

Comment 8 Nicolas Chauvet (kwizart) 2009-01-08 16:39:06 UTC
With a live 64bit generated from today Everything + updates repositories and
transfered on bootable USB disk:
 
[root@localhost ~]# kpartx -v -a /dev/mapper/nvidia_cebfcccg 
add map nvidia_cebfcccg1 (253:3): 0 125837082 linear /dev/mapper/nvidia_cebfcccg 63
add map nvidia_cebfcccg2 (253:4): 0 670745880 linear /dev/mapper/nvidia_cebfcccg 125837145
add map nvidia_cebfcccg5 : 0 335549592 linear 253:4 125837208
add map nvidia_cebfcccg6 : 0 819252 linear 253:4 461386863
add map nvidia_cebfcccg7 : 0 146801907 linear 253:4 462206178
add map nvidia_cebfcccg8 : 0 41945652 linear 253:4 609008148
add map nvidia_cebfcccg9 : 0 401562 linear 253:4 650953863
device-mapper: reload ioctl failed: Invalid argument
add map nvidia_cebfcccg10 : 0 125837082 linear 253:4 651355488

device-mapper-1.02.27-7.fc10.x86_64
dmesg |tail
device-mapper: table: device 253:4 too small for target
device-mapper: table: 253:10: linear: dm-linear: Device lookup failed
device-mapper: ioctl: error adding target to table

Comment 9 Heinz Mauelshagen 2009-01-09 12:19:44 UTC
Like assumend, this looks like a bogus partition table.

Can you provide attachment for:

fdisk -l /dev/mapper/nvidia_cebfcccg ; parted -l

for completeness please ?

Comment 10 Nicolas Chauvet (kwizart) 2009-01-09 13:07:07 UTC
Created attachment 328542 [details]
fdisk on kwizatz partition

If the partition are bogus, then Fedora 10 is less tolerant than Fedora9/CentOS5.2 , since partitions are well detected in theses cases.

I've first reported a problem here:
https://bugzilla.redhat.com/show_bug.cgi?id=473601
As I said in c#2, Fedora 10 broke the partitions shared with CentOS 5.2 (nvidia_cebfcccg7 with swap and ext3 labeled as /share in it). This appeared on first boot, afer the install, where mounting the /share partition failed and fsck was run. CentOS 5 wasn't able to detect it after that.

So I've deleted everything beyond nvidia_cebfcccg5, Reinstalled CentOS5, Installed Fedora10 which wasn't able to detect the partition outside of it's own LVM and the first ntfs.
Then I deleted Fedora10 and re-installed Fedora 9 where everything were detected just fine.

If Fedora 10 cannot mount partitions created by centos5, that's really annoying.

Comment 11 Milan Broz 2009-01-09 13:15:24 UTC
/dev/mapper/nvidia_cebfcccg2            7834       49585   335372940    f  W95 Ext'd (LBA)

Hmm, why kpartx creates "extended partition" mapping at all?
(There is already similar bug 475283 )

Comment 12 Nicolas Chauvet (kwizart) 2009-01-09 14:21:37 UTC
Created attachment 328550 [details]
diff between the previous commands

With the suggested patch from bug #475283 I'm able to discover and mount every partitions from within Fedora10

btw - I've enabled our $RPM_OPT_FLAGS on the device-mapper-multipath package build.

Should be worry about this ?
-------------
Error: Invalid partition table on /dev/mapper/nvidia_cebfcccgp2 -- wrong signature e9b7
-------------

I think the bug could be closed as duplicate of #475283

Comment 13 Heinz Mauelshagen 2009-01-09 14:32:12 UTC
Nicolas,

one last question before we close this one:

does "dmraid -ay" activate the partitions properly from your livecd ?

Comment 14 Nicolas Chauvet (kwizart) 2009-01-09 15:13:54 UTC
(In reply to comment #13)
> does "dmraid -ay" activate the partitions properly from your livecd ?
yes, it worked. I was able to mount nvidia_cebfcccg5 on /mnt

Comment 15 Heinz Mauelshagen 2009-01-09 15:21:12 UTC
Ok, down to the kpartx bz#475283.

Closing as duplicate.

*** This bug has been marked as a duplicate of bug 475283 ***