Bug 592059
Summary: | When RAID initialization fails, initscripts mounts the partitions inside directly | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Andy Lutomirski <luto> |
Component: | udev | Assignee: | Harald Hoyer <harald> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | low | ||
Version: | 13 | CC: | amluto, harald, iarlyy, jonathan, kzak, notting, plautrba, thomas.moschny |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-06-27 16:20:42 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andy Lutomirski
2010-05-13 18:54:17 UTC
What do you have specified in /etc/fstab? UUID? Yes, uuid, because Anaconda put it there. But even if the raid array wasn't used to boot and wasn't in fstab at all, we're still exposing block devices (to udisks if nothing else) that should never be mounted. Here's another thought: blkid is smart enough to figure this out, but nothing uses that information. Maybe blkid should return something special for partitions of raid devices that tells udev an everything else not to use them. mount is using libbklid. blkid says: # blkid /dev/sdb /dev/sdb: TYPE="isw_raid_member" # blkid /dev/sdb2 /dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981" TYPE="ext4" and bypassing cache: # blkid -p /dev/sdb /dev/sdb: VERSION="1.1.00" TYPE="isw_raid_member" USAGE="raid" # blkid -p /dev/sdb2 /dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981" VERSION="1.0" TYPE="ext4" USAGE="filesystem" Perhaps if blkid /dev/sdb2 just reported some placeholder like DO_NOT_USE=1 then all would be well. That might be a layering violation, though. I still think that mount shouldn't be responsible and that somehow /dev/sdb2 shouldn't exist at all. (In reply to comment #4) > # blkid -p /dev/sdb > /dev/sdb: VERSION="1.1.00" TYPE="isw_raid_member" USAGE="raid" > # blkid -p /dev/sdb2 > /dev/sdb2: LABEL="F13-Beta-x86_64-" UUID="1b0d5ef6-474b-4955-a40a-281867ff2981" > VERSION="1.0" TYPE="ext4" USAGE="filesystem" > > Perhaps if blkid /dev/sdb2 just reported some placeholder like DO_NOT_USE=1 > then all would be well. That might be a layering violation, though. This is implementable -- upstream version of blkid already probes whole-disk if a partition (e.g. /dev/sdb2) is requested. I can add an extra check for raids too. (BTW, raids probing is already pretty complex task, see bug #543749.) > I still think that mount shouldn't be responsible and that somehow /dev/sdb2 > shouldn't exist at all. Sure. This problem has to be resolved on the udev level. Harald, I have talked about it with Kay and maybe this problem could be resolved in dracut. I think that dracut (and also initscripts) has all necessary information to remove unwanted partitions. I can add something like DO_NOT_USE=1 (comment #4) to libblkid, but it seems like an overkill to check for RAID superblock on whole-disk always when any partition is requested. The problem should be resolved during boot. You can easily check all whole-disks (from /proc/partitions or /sys/block or udev db) and remove by delpart(8) all partitions from disks where blkid returns ID_FS_USAGE=raid. Or you can add an extra check to udev rules; if parent is RAID then remove the device. etc. I don't think we have to hardcode this test to libblkid. Comments? I *think* (but haven't really tested well yet) that something like (typing from memory since I'm away from that machine right now): SUBSYSTEM=="block",ENV{DEVTYPE}=="disk",ENV{ID_FS_USAGE}=="*_raid_member",RUN+="/sbin/partx -d $tempnode" around 62 in the udev order will work. And dracut needs to know about this rule so it gets copied and installs partx. But see #592061 -- until that's fixed you'll need --nr 0-1023 or something like that to work around an infinite loop. (In reply to comment #6) > Harald, I have talked about it with Kay and maybe this problem could be > resolved in dracut. I think that dracut (and also initscripts) has all > necessary information to remove unwanted partitions. I really don't think this is initscripts' job - it just does 'mount -a', which pulls the UUID from fstab. It's not in a position to do mucking around in the device nodes, nor should it be. Want to change the component to udev, then? I don't think I have permissions. In dracut I already provide the correct "partx" commands to delete the partitions. 61-dmraid-imsm.rules: SUBSYSTEM!="block", GOTO="dm_end" ACTION!="add|change", GOTO="dm_end" ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="dm_end" ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end" ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", GOTO="dm_end" ENV{rd_NO_DM}=="?*", GOTO="dm_end" PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \ GOTO="dm_end" ENV{DEVTYPE}!="partition", \ RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}" LABEL="dm_end" 65-md-incremental-imsm.rules: ACTION!="add|change", GOTO="md_inc_end" SUBSYSTEM!="block", GOTO="md_inc_end" ENV{ID_FS_TYPE}!="linux_raid_member|isw_raid_member", GOTO="md_inc_end" ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}=="?*", GOTO="md_inc_end" ENV{rd_NO_MD}=="?*", GOTO="md_inc_end" PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/md[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \ GOTO="md_inc_end" ENV{DEVTYPE}!="partition", \ RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}" LABEL="do_md_end" This should probably be part of the dmraid and mdadm rpm packages. I (In reply to comment #10) > In dracut I already provide the correct "partx" commands to delete the > partitions. > > 61-dmraid-imsm.rules: (Part of 90dmraid dracut module) > > SUBSYSTEM!="block", GOTO="dm_end" > ACTION!="add|change", GOTO="dm_end" > ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="dm_end" > ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end" > ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", GOTO="dm_end" > ENV{rd_NO_DM}=="?*", GOTO="dm_end" > > PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] > && exit 0; done; exit 1;' ", \ > GOTO="dm_end" > > ENV{DEVTYPE}!="partition", \ > RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}" > > LABEL="dm_end" > > > > > 65-md-incremental-imsm.rules: (Part of 90mdraid dracut module) > > ACTION!="add|change", GOTO="md_inc_end" > SUBSYSTEM!="block", GOTO="md_inc_end" > ENV{ID_FS_TYPE}!="linux_raid_member|isw_raid_member", GOTO="md_inc_end" > ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}=="?*", GOTO="md_inc_end" > ENV{rd_NO_MD}=="?*", GOTO="md_inc_end" > > PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/md[0-9]*; do [ -e $$i ] && > exit 0; done; exit 1;' ", \ > GOTO="md_inc_end" > > ENV{DEVTYPE}!="partition", \ > RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}" > > LABEL="do_md_end" > > > > This should probably be part of the dmraid and mdadm rpm packages. I disagree. As it stands, this is incredibly fragile. It only works if: 1. 90dmraid is installed and mdraid didn't already pick up the device and the device *isn't* a standard linux_raid_member and not (imsm and rd_NO_IMSM). [not sure how mdraid's supposed to pick up the device at priority level 61, though.] --or-- 2. 90mdraid is installed the device is a standard linux RAID or (imsm and not rd_no_IMSM) and 64-md-raid.rules didn't already claim it (but that's commented in my copy -- go figure). Here are a few ways to break it. 1. Boot an initramfs that's missing 90dmraid and/or 90mdraid. Easy way: Start w/o RAID, then set up RAID, then boot an old kernel. *Poof* there goes your filesystem. 2. Have one of the .../holders/... checks trigger. That'll happen if RAID autodetect works (maybe -- haven't tested that), or if some other new script sets up an array. 3. Use ddf RAID w/o 90dmraid (which would happen if dmraid ever gets smart enough to skip its own installation when only mdraid is in use). 4. Use imsm and boot with rd_NO_IMSM, which should probably be renamed to "rd_DESTROY_MY_IMSM" unless something changes. etc. My point is that the partx -d call should not depend on anything complicated and should happen regardless of what dracut modules are installed. Just because a particular initramfs was meant to boot a non-RAID device does not mean it should destroy filesystems on RAID 1. Note that I managed to trigger one (or more) of these failure modes when I filed the bug in the first place. I only managed not to corrupt anything because my RAID1 was degraded at the time. On my system, I added 62-bad-raid-partitions.rules: ACTION=="add|change",SUBSYSTEM=="block",ENV{DEVTYPE}=="disk",ENV{ID_FS_TYPE}=="*_raid_member",RUN+="/sbin/partx --nr 0-15 -d $tempnode" which appears to work. I made it a part of 95udev-rules do it's always there. If you want an rd_ALLOW_DANGEROUS_RAID_PARTITIONS or something to disable it, that would be fine. It would be even cleaner if there was a 60-call-blkid.rules file (or even earlier, perhaps) -- it's rather tough right now to figure out when blkid is guaranteed to have been called, and adding yet another call just slows things down. This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |