Bug 592059

Summary:	When RAID initialization fails, initscripts mounts the partitions inside directly
Product:	[Fedora] Fedora	Reporter:	Andy Lutomirski <luto>
Component:	udev	Assignee:	Harald Hoyer <harald>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	urgent	Docs Contact:
Priority:	low
Version:	13	CC:	amluto, harald, iarlyy, jonathan, kzak, notting, plautrba, thomas.moschny
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-06-27 16:20:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andy Lutomirski 2010-05-13 18:54:17 UTC

I have an Intel firmware RAID array which is currently degraded.  My boot log shows:

ERROR: isw: wrong number of devices in RAID set "isw_dajhdjieaf_Volume0" [1/2] on /dev/sdb
Setting up Logical Volume Management:   No volume groups found
                                                           [  OK  ]
Checking filesystems
F13-Beta-x86_64-: clean, 165261/3842048 files, 1557759/15360000 blocks
/dev/sdb1: clean, 47/128016 files, 82549/512000 blocks
/dev/sdb5: clean, 78294/26017792 files, 3406593/104047360 blocks
                                                           [  OK  ]
Remounting root filesystem in read-write mode:             [  OK  ]

Sure, there's a bug (#537329, in fact) related to why mdadm failed.  But as far as I'm concerned, initscripts (and, for that matter, any other scripts) must not, UNDER ANY CIRCUMSTANCES WHATSOEVER, mount partitions directly that are supposed to be in some kind of raid array.

The proper fix would probably be to do something like

dmraid -r -cdevpath |xargs partx -d  (plus something similar for mdadm metadata types 0.90 and 1.0)

before trying to mount *anything* including the root device.  The point is that, even if dmraid / mdadm setup fails, the partitions devices should be deleted.

A more complete fix would add a command like "blkid -p -t USAGE=raid [path] && partx -d [path]" to a very early udev script run on all non-partition block devices (on add only).

Of course, it doesn't help that partx has an infinite loop somewhere...  Will file that bug too.

Comment 1 Bill Nottingham 2010-05-13 19:00:52 UTC

What do you have specified in /etc/fstab? UUID?

Comment 2 Andy Lutomirski 2010-05-13 19:50:14 UTC

Yes, uuid, because Anaconda put it there.

But even if the raid array wasn't used to boot and wasn't in fstab at all, we're still exposing block devices (to udisks if nothing else) that should never be mounted.

Here's another thought: blkid is smart enough to figure this out, but nothing uses that information.  Maybe blkid should return something special for partitions of raid devices that tells udev an everything else not to use them.

Comment 3 Bill Nottingham 2010-05-13 19:55:40 UTC

mount is using libbklid.

Comment 4 Andy Lutomirski 2010-05-13 20:10:38 UTC

blkid says:

# blkid /dev/sdb
/dev/sdb: TYPE="isw_raid_member" 
# blkid /dev/sdb2
/dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981" TYPE="ext4" 

and bypassing cache:

# blkid -p /dev/sdb
/dev/sdb: VERSION="1.1.00" TYPE="isw_raid_member" USAGE="raid" 
# blkid -p /dev/sdb2
/dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981" VERSION="1.0" TYPE="ext4" USAGE="filesystem" 

Perhaps if blkid /dev/sdb2 just reported some placeholder like DO_NOT_USE=1 then all would be well.  That might be a layering violation, though.  I still think that mount shouldn't be responsible and that somehow /dev/sdb2 shouldn't exist at all.

Comment 5 Karel Zak 2010-05-14 10:59:39 UTC

(In reply to comment #4)
> # blkid -p /dev/sdb
> /dev/sdb: VERSION="1.1.00" TYPE="isw_raid_member" USAGE="raid" 
> # blkid -p /dev/sdb2
> /dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981"
> VERSION="1.0" TYPE="ext4" USAGE="filesystem" 
> 
> Perhaps if blkid /dev/sdb2 just reported some placeholder like DO_NOT_USE=1
> then all would be well.  That might be a layering violation, though. 

 This is implementable -- upstream version of blkid already probes whole-disk if a partition (e.g. /dev/sdb2) is requested. I can add an extra check for raids too. (BTW, raids probing is already pretty complex task, see bug #543749.)

> I still think that mount shouldn't be responsible and that somehow /dev/sdb2 
> shouldn't exist at all.    

 Sure. This problem has to be resolved on the udev level.

Comment 6 Karel Zak 2010-05-18 12:56:41 UTC

Harald, I have talked about it with Kay and maybe this problem could be resolved in dracut. I think that dracut (and also initscripts) has all necessary information to remove unwanted partitions.

I can add something like DO_NOT_USE=1 (comment #4) to libblkid, but it seems like an overkill to check for RAID superblock on whole-disk always when any partition is requested. The problem should be resolved during boot.

You can easily check all whole-disks (from /proc/partitions or /sys/block or udev db) and remove by delpart(8) all partitions from disks where blkid returns  ID_FS_USAGE=raid.

Or you can add an extra check to udev rules; if parent is RAID then remove the device. etc.

I don't think we have to hardcode this test to libblkid.

Comments?

Comment 7 Andy Lutomirski 2010-05-18 13:07:11 UTC

I *think* (but haven't really tested well yet) that something like (typing from memory since I'm away from that machine right now):

SUBSYSTEM=="block",ENV{DEVTYPE}=="disk",ENV{ID_FS_USAGE}=="*_raid_member",RUN+="/sbin/partx -d $tempnode"

around 62 in the udev order will work.  And dracut needs to know about this rule so it gets copied and installs partx.

But see #592061 -- until that's fixed you'll need --nr 0-1023 or something like that to work around an infinite loop.

Comment 8 Bill Nottingham 2010-05-18 16:26:51 UTC

(In reply to comment #6)
> Harald, I have talked about it with Kay and maybe this problem could be
> resolved in dracut. I think that dracut (and also initscripts) has all
> necessary information to remove unwanted partitions.

I really don't think this is initscripts' job - it just does 'mount -a',
which pulls the UUID from fstab. It's not in a position to do mucking around in the device nodes, nor should it be.

Comment 9 Andy Lutomirski 2010-05-18 16:33:50 UTC

Want to change the component to udev, then?  I don't think I have permissions.

Comment 10 Harald Hoyer 2010-05-19 05:51:51 UTC

In dracut I already provide the correct "partx" commands to delete the partitions.

61-dmraid-imsm.rules:

SUBSYSTEM!="block", GOTO="dm_end"
ACTION!="add|change", GOTO="dm_end"
ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="dm_end"
ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end"
ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", GOTO="dm_end"
ENV{rd_NO_DM}=="?*", GOTO="dm_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="dm_end"

ENV{DEVTYPE}!="partition", \
    RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

LABEL="dm_end"




65-md-incremental-imsm.rules:

ACTION!="add|change", GOTO="md_inc_end"
SUBSYSTEM!="block", GOTO="md_inc_end"
ENV{ID_FS_TYPE}!="linux_raid_member|isw_raid_member", GOTO="md_inc_end"
ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}=="?*", GOTO="md_inc_end"
ENV{rd_NO_MD}=="?*", GOTO="md_inc_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/md[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="md_inc_end"

ENV{DEVTYPE}!="partition", \
    RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

LABEL="do_md_end"



This should probably be part of the dmraid and mdadm rpm packages.

Comment 11 Andy Lutomirski 2010-05-19 11:36:12 UTC

I (In reply to comment #10)
> In dracut I already provide the correct "partx" commands to delete the
> partitions.
> 
> 61-dmraid-imsm.rules:

(Part of 90dmraid dracut module)

> 
> SUBSYSTEM!="block", GOTO="dm_end"
> ACTION!="add|change", GOTO="dm_end"
> ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="dm_end"
> ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end"
> ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", GOTO="dm_end"
> ENV{rd_NO_DM}=="?*", GOTO="dm_end"
> 
> PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ]
> && exit 0; done; exit 1;' ", \
>     GOTO="dm_end"
> 
> ENV{DEVTYPE}!="partition", \
>     RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"
> 
> LABEL="dm_end"
> 
> 
> 
> 
> 65-md-incremental-imsm.rules:

(Part of 90mdraid dracut module)

> 
> ACTION!="add|change", GOTO="md_inc_end"
> SUBSYSTEM!="block", GOTO="md_inc_end"
> ENV{ID_FS_TYPE}!="linux_raid_member|isw_raid_member", GOTO="md_inc_end"
> ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}=="?*", GOTO="md_inc_end"
> ENV{rd_NO_MD}=="?*", GOTO="md_inc_end"
> 
> PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/md[0-9]*; do [ -e $$i ] &&
> exit 0; done; exit 1;' ", \
>     GOTO="md_inc_end"
> 
> ENV{DEVTYPE}!="partition", \
>     RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"
> 
> LABEL="do_md_end"
> 
> 
> 
> This should probably be part of the dmraid and mdadm rpm packages.    

I disagree.  As it stands, this is incredibly fragile.  It only works if:

1. 90dmraid is installed and mdraid didn't already pick up the device and the device *isn't* a standard linux_raid_member and not (imsm and rd_NO_IMSM).  [not sure how mdraid's supposed to pick up the device at priority level 61, though.]

--or--

2. 90mdraid is installed the device is a standard linux RAID or (imsm and not rd_no_IMSM) and 64-md-raid.rules didn't already claim it (but that's commented in my copy -- go figure).


Here are a few ways to break it.
1. Boot an initramfs that's missing 90dmraid and/or 90mdraid.  Easy way: Start w/o RAID, then set up RAID, then boot an old kernel.  *Poof* there goes your filesystem.

2. Have one of the .../holders/... checks trigger.  That'll happen if RAID autodetect works (maybe -- haven't tested that), or if some other new script sets up an array.

3. Use ddf RAID w/o 90dmraid (which would happen if dmraid ever gets smart enough to skip its own installation when only mdraid is in use).

4. Use imsm and boot with rd_NO_IMSM, which should probably be renamed to "rd_DESTROY_MY_IMSM" unless something changes.

etc.


My point is that the partx -d call should not depend on anything complicated and should happen regardless of what dracut modules are installed.  Just because a particular initramfs was meant to boot a non-RAID device does not mean it should  destroy filesystems on RAID 1.  Note that I managed to trigger one (or more) of these failure modes when I filed the bug in the first place.  I only managed not to corrupt anything because my RAID1 was degraded at the time.

On my system, I added 62-bad-raid-partitions.rules:

ACTION=="add|change",SUBSYSTEM=="block",ENV{DEVTYPE}=="disk",ENV{ID_FS_TYPE}=="*_raid_member",RUN+="/sbin/partx --nr 0-15 -d $tempnode"

which appears to work.  I made it a part of 95udev-rules do it's always there.  If you want an rd_ALLOW_DANGEROUS_RAID_PARTITIONS or something to disable it, that would be fine.

It would be even cleaner if there was a 60-call-blkid.rules file (or even earlier, perhaps) -- it's rather tough right now to figure out when blkid is guaranteed to have been called, and adding yet another call just slows things down.

Comment 12 Bug Zapper 2011-06-02 14:04:27 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2011-06-27 16:20:42 UTC

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.