497259 – F10-F11: /dev/dm-N devices cause loss of all rescue ability

Bug 497259 - F10-F11: /dev/dm-N devices cause loss of all rescue ability

Summary: F10-F11: /dev/dm-N devices cause loss of all rescue ability

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	e2fsprogs
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-04-23 03:26 UTC by Gerry Reno
Modified:	2009-08-08 19:25 UTC (History)
CC List:	20 users (show)
Fixed In Version:	1.41.4-12.fc11
Clone Of:
Environment:
Last Closed:	2009-07-23 19:12:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Filter to substitute /dev/dm-N by correct device (found by UUID) (540 bytes, text/plain) 2009-04-27 14:58 UTC, Marian Csontos	no flags	Details
tarball of ramdisk /tmp (35.03 KB, application/x-compressed-tar) 2009-05-18 17:50 UTC, Gerry Reno	no flags	Details
View All

Description Gerry Reno 2009-04-23 03:26:22 UTC

Description of problem:
On all our systems upgraded to F10 or F11-b we've noticed that devices are getting changed/renamed from their normal: /dev/mapper/VolGroup00-LogVol00 names to /dev/dm-N type names.  When this happens and you have to recover the system for any reason you are not able to do so because none of the rescue disks F9-F10-F11 can find the linux partitions due to this weird renaming of the devices.

We lost one machine and had to reinstall it to get it working again.


Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Install/Upgrade to F10/F11
2. wait
3. devices will get renamed to /dev/dm-N style
  
Actual results:
once devices get renamed to /dev/dm-N style you cannot rescue the machine any longer.

Expected results:
devices never get renamed and machine stays stable and rescueable.

Additional info:

Comment 1 Dave Jones 2009-04-23 19:56:26 UTC

device node naming is handled by udev, not the kernel.

Comment 2 Harald Hoyer 2009-04-24 06:41:03 UTC

/dev/mapper/ nodes are created by lvm and dmsetup

I cannot reproduce this on all of my F11 machines..

$ ls /dev/mapper/
control                             isw_bfadchbffa_Volume0p2
data                                swap
DATA-lvol0                          VolGroup00-cryptharald
_dev_mapper_VolGroup00_cryptharald  VolGroup00-LogVol00
isw_bfadchbffa_Volume0              VolGroup00-LogVol01
isw_bfadchbffa_Volume0p1            VolGroup00-luks--swap

Comment 3 Milan Broz 2009-04-24 06:53:18 UTC

Please do not create new duplicates of bugs.

The device mapper nodes are created properly, device-mapper never created /dev/dm-X nodes, it was some udev rule which by mistake do this.

The proper dev nodes was there (lvm2 activation depends on them internally), so this is _not_ lvm2 problem.

I guess the fail was 2 step problem: first some update breaks grub.conf line, lated during update some script rewrited fstab according these bad line.
Neiter is under lvm2 control.

*** This bug has been marked as a duplicate of bug 475773 ***

Comment 4 Gerry Reno 2009-04-24 13:58:05 UTC

Milan, your analysis is wrong.  The /dev/dm-N devices are NOT showing up in /etc/fstab.  Only UUIDs are in /etc/fstab.  These devices are showing up in 'mount' and 'df' and 'blkid' and other command outputs.  And blkid shows that there are duplicate UUIDs.  An LVM device like /dev/mapper/VolGroup00-LogVol00 has the same UUID as /dev/dm-0.  So if you're trying to extract the device by parsing blkid by UUID you get two devices.

I don't care whether this bug is duplicate or not or what group needs to make some change.  What I want to know is what is the fix?  What change is going to stop this behavior of devices getting randomly changed from /dev/mapper/VolGroup00-LogVol00 to /dev/dm-0 in these commands?

Comment 5 Milan Broz 2009-04-24 14:59:53 UTC

ah so, you did not mentioned mount, so I expected that it is the same problem with fstab.

The /dev/dm-X and /dev/mapper/<real_name> are the same devices, so the scan will return the same UUID. This is correct.

But the tools for some reason prefers sometimes using /dev/dm-X devices.

I think the udev (default rule, explicitly /lib/udev/rules.d/50-udev-default.rules) rule, which creates /dev/dm-X devices is wrong, but I am really not sure if some other subsystem do not use it...

Why we need that dm-X devices at all?

Comment 6 Alasdair Kergon 2009-04-24 15:13:29 UTC

(In reply to comment #5)
> Why we need that dm-X devices at all?  

We don't.  But nobody seems brave enough to delete them!  As you know, Peter is redoing the udev rules for dm+lvm2 for F-12 so they will certainly disappear then.  After the new scheme is working we'll see if it's feasible to backport it to F-11.

Comment 7 Marian Csontos 2009-04-27 14:58:03 UTC

Created attachment 341438 [details]
Filter to substitute /dev/dm-N by correct device (found by UUID)

Could attached script provide a temporary workaround?
"Filter to substitute /dev/dm-N by correct device (found by UUID)"

Just filter mount or whatever output is using dm-N through it.

Be cautious, I am a Perl learner. Use on your own risk!

-- Marian

Comment 8 Peter Rajnoha 2009-04-28 08:51:04 UTC

Well, I've looked at this and tried to find out which rule creates these nodes exactly, but the finding really amazed me -- there's no such rule (finally, I simply deleted all the rules and just ran dmsetup create then). Since no rule creates these nodes, udevd itself creates them by default under /dev with the internal kernel name (dm-N) he sees in the uevent. This seems to be a default behaviour. Since we will provide our udev rules for LVM/DM soon, this will be solved then.

Comment 9 Milan Broz 2009-04-28 09:29:12 UTC

Please also see (the problem is also in blkid library)

[PATCH] blkid: use /dev/mapper/≤name> rather than /dev/dm-<N>

http://article.gmane.org/gmane.comp.file-systems.ext4/12896

Comment 10 Bryn M. Reeves 2009-04-28 10:14:34 UTC

In reply to comment #8 this is how udev behaves afaik. At least at some point we had an explicit ignore rule for DM devices, something like:

KERNEL="dm-[0-9]*", OPTIONS+="ignore_device"

Comment 11 Karel Zak 2009-04-28 22:14:02 UTC

(In reply to comment #7)
> Created an attachment (id=341438) [details]
> Filter to substitute /dev/dm-N by correct device (found by UUID)

  # cat /sys/block/dm-<N>/dm/name

returns the "real name" on Linux >= 2.6.29 -- in such a case I don't think you need to call blkid(8).

Comment 15 Eric Sandeen 2009-05-12 20:19:07 UTC

So based on http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=4271e23942bdc60e1fa6c0b26bc666a94a8b3e1d this is a blkid bug, so I should reassign it to myself & get this merged to fedora, yes? :)

Sorry, wasn't even aware of this bug.

Now, is this really breaking rescue mode?  Why hasn't anyone else reported this ... do we need it before GA?  We're in the final days (hours?)

Comment 16 Eric Sandeen 2009-05-12 20:28:34 UTC

Gerry, what exactly failed for you in rescue?

I don't want to ship a broken rescue cd in F11, but I don't want to destabilize things at the last minute either, and I don't know of any other reports that rescue mode has been broken for the life of F10/F11....

-Eric

Comment 17 Gerry Reno 2009-05-14 02:53:54 UTC

I reported this when all we would get was the message that we didn't have any Linux partitions when we were in rescue.  This was on machines that had raid as well as volume groups and where we were experiencing lvm devices getting renamed as /dev/dm-N randomly.  So can I pinpoint the exact cause?  Hard to do.  But, on our machines like this, prior to F10 and this /dev/dm-N renaming business, we didn't see anaconda have any problems mounting our existing system in rescue.

Comment 18 David Lehman 2009-05-14 03:22:50 UTC

There is no renaming of lvm devices to /dev/dm-N. There are always /dev/dm-N nodes. This does not mean there cannot also be /dev/mapper/VolGroup-lv_foo nodes with the same major/minor and is in effect the same device with a different filename.

Comment 19 Gerry Reno 2009-05-17 00:48:25 UTC

Yes, these new /dev/dm-N devices that have started showing up in the various command outputs are the same device as some other device such as /dev/mapper/VG-LV.  When you first install the system you will see command outputs that reference only /dev/mapper/VG-LV type entries and then slowly over time these begin to get converted to /dev/dm-N style entries in these command outputs.  And eventually a whole list of /dev/dm-N entries tells you exactly nothing about what the devices might be.  And scripts that parse some of these command outputs don't work right if they're expecting or looking for one style of device naming (which has worked for years) and another style shows up.

Comment 20 David Lehman 2009-05-18 15:52:58 UTC

If the basic problem here involves anaconda's rescue mode, please attach /tmp/storage.log, /tmp/program.log, /tmp/anaconda.log from a failed rescue attempt, along with a basic description of the actual disk/lvm/mdraid configuration. Thanks.

Just a note:  Anaconda in F11 does not use blkid -- it uses udev's vol_id.

Comment 21 Gerry Reno 2009-05-18 16:54:45 UTC

In looking at this, the rescue failures occurred on the same systems where the superblock preferred minors had become changed after multiple failed installation attempts.  So after I checked/matched up all the superblock preferred minors with the arrays now anaconda can find the system in rescue mode.  These superblock preferred minor inconsistencies should not have prevented anaconda from assembling the system.  They are only recommendations only.  Not prescriptive.  And the running system had experienced no problems with them and would assemble and boot just fine.  I think anaconda needs to be a little more tolerant with preferred minors and rely on uuid and scans to assemble the system.

Comment 22 David Lehman 2009-05-18 17:20:24 UTC

I cannot discuss what anaconda should or should not do if you do not provide logs so I can see what it is actually doing.

Comment 23 Gerry Reno 2009-05-18 17:50:17 UTC

Created attachment 344494 [details]
tarball of ramdisk /tmp

We didn't save anything from the ramdisk when the failure occurred and since matching up all the superblock minors we are not able to reproduce the original situation.  However, this tarball attached of the ramdisk /tmp for a failed install attempt of the same machine before the minors were matched is a close as I can get.  It's from the same machine during an attempt to use the same partition/device layout to do an install.  It suffered from the same inability to properly assemble the existing system.

Comment 24 David Lehman 2009-05-18 18:04:47 UTC

Was it unable to assemble the system, or did it just use different raid minors than you expected because of metadata in the raid superblocks that contradicts your desired configuration? They aren't the same thing.

Comment 25 Gerry Reno 2009-05-18 18:19:54 UTC

What I remember about that failure was that the RAID-5 array was able to assemble correctly and it's overlying volume group was able to start and it mounted on its mount point.  There were three other RAID-1 arrays and all of these were incorrectly assembled and misidentified.  It is probably because of the minors issue that these arrays didn't assemble properly.  They assembled fine when you do a 'mdadm -A -s' however.  The arrays were given the wrong device numbers such as /dev/md0 was /dev/md1, etc.  These arrays are the PV's for some volume groups and of course those failed to activate because they could not find the right uuids on the misassembled array PVs.

Comment 26 David Lehman 2009-05-18 18:38:57 UTC

LVM assembly is not dependent on device naming, at least not in anaconda.

What do you mean when you say that the RAID1 arrays were "incorrectly assembled"? That they were correctly assembled using the metadata contained in the member devices but with different device names than you expected?

This is going nowhere. I can't argue vague claims all afternoon. If you have logs from the failed rescue, great. Please attach them. If not, there isn't anything I can do for you. Assuming the latter to be the case, this bug will become a tracker for whatever blkid fix Eric identified.

Comment 27 Eric Sandeen 2009-05-18 18:46:33 UTC

And, well, if anaconda isn't even using blkid then whatever I identified is probably not the issue, and I regret taking the bug ;)

Comment 28 Gerry Reno 2009-05-18 18:56:39 UTC

What we saw was that druid showed the RAID-1 arrays as assembled but with the wrong device names.  All of them were named wrong.  And the volume groups that used these arrays as their PVs did not show up at all.  I had to tear down these arrays and build them back up correctly in the window from the correct partitions and using the correct device names.  And then had to build up the Volume Groups that use these arrays as the PVs.  Normally, I would have expected that the installer would have been able to do all this automatically.  I suspect that the superblock minors inconsistencies caused the array naming problems which somehow affects the ability of the volume groups to activate.

Comment 29 Bug Zapper 2009-06-09 14:26:47 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 30 Laurent Jacquot 2009-06-22 20:33:28 UTC

For the record, this is affecting also a fully updated fedora 10

[root@jack ~]# uname -a
Linux jack.lutty.net 2.6.27.24-170.2.68.fc10.i686 #1 SMP Wed May 20 23:10:16 EDT 2009 i686 i686 i386 GNU/Linux

I recently upgraded my raid array from 4 to 5 disks, and mount no longer shows lv names but is /dev/dm-* style instead.

My setup is lvm on softraid and blkid, mount, df, etc.. are all affected.

Karel Zak said to me:

  We need to backport

          commit 4271e23942bdc60e1fa6c0b26bc666a94a8b3e1d
          Author: Karel Zak <kzak>
          Date:   Mon Apr 27 15:00:57 2009 +0200

          blkid: use /dev/mapper/<name> rather than /dev/dm-<N>

 upstream patch to Fedora.

Comment 31 Eric Sandeen 2009-06-24 17:28:14 UTC

Ok, I can pull that in, thanks.  Sorry for the delay on this, it hadn't clicked for me that that's what is missing here.

Comment 32 Fedora Update System 2009-06-27 02:53:43 UTC

e2fsprogs-1.41.4-6.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-7005

Comment 33 Laurent Jacquot 2009-06-29 20:57:38 UTC

Works for me!

[alex@jack ~]$ uname -r
2.6.27.25-170.2.72.fc10.i686
[alex@jack ~]$ rpm -q e2fsprogs
e2fsprogs-1.41.4-6.fc10.i386
[alex@jack ~]$ mount
/dev/mapper/rootvg-rootlv on / type ext3 (rw,noatime,nodiratime,data=writeback)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/rootvg-usrlv on /usr type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/rootvg-tmplv on /tmp type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/rootvg-varlv on /var type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-wwwlv on /var/www/html type ext3 (rw,noatime,nodiratime,data=writeback)
tmpfs on /dev/shm type tmpfs (rw)
/dev/mapper/datavg-homelv on /home type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-Imageslv on /home/alex/Images type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-mp3lv on /home/alex/Mp3 type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-vmwarelv on /home/alex/vmware type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-logicielslv on /home/alex/Logiciels type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-Filmslv on /home/alex/Films type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/datavg-Divxlv on /home/alex/Films/Divx type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/xmulevg-templv on /home/alex/.xMule type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-wwwlv on /backup/www type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-homelv on /backup/home type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-Imageslv on /backup/Images type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-mp3lv on /backup/Mp3 type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-vmwarelv on /backup/vmware type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-logicielslv on /backup/Logiciels type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-Filmslv on /backup/Films type ext3 (rw,noatime,nodiratime,data=writeback)
/dev/mapper/backupvg-Divxlv on /backup/Divx type ext3 (rw,noatime,nodiratime,data=writeback)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
gvfs-fuse-daemon on /home/alex/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=alex)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
[alex@jack ~]$

Comment 34 Fedora Update System 2009-06-30 15:38:12 UTC

e2fsprogs-1.41.4-12.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-12.fc11

Comment 35 Fedora Update System 2009-07-02 05:53:56 UTC

e2fsprogs-1.41.4-12.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7278

Comment 36 Fedora Update System 2009-07-23 19:11:53 UTC

e2fsprogs-1.41.4-6.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 37 Fedora Update System 2009-08-08 19:25:36 UTC

e2fsprogs-1.41.4-12.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.

agk
bmarzins
bmr
dlehman
dwysocha
esandeen
harald
heinzm
itamar
jk
kernel-maint
kzak
lvm-team
mbroz
msnitzer
oliver
prajnoha
prockai
quintela
torsten