Bug 517894 - Encrypted system sometimes fails to boot
Summary: Encrypted system sometimes fails to boot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Peter Rajnoha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-17 17:58 UTC by Tim Waugh
Modified: 2009-11-24 13:53 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-11-24 13:53:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Tim Waugh 2009-08-17 17:58:22 UTC
Description of problem:
Every so often (happened about once every couple of days so far) my LUKS encrypted rawhide system, fails to boot.

Version-Release number of selected component (if applicable):
lvm2-2.02.50-2.fc12.x86_64

How reproducible:
Occasional.

Steps to Reproduce:
1.I installed the entire system as encrypted, selecting a pre-existing encrypted partition for '/home'.
  
Actual results:
mdadm: No arrays found in config file or automatically
key slot 0 unlocked.
Command successful.
Setting up Logical Volume Management:   Device /dev/dm-2 not found.
  device-mapper: reload ioctl failed: Invalid argument
  1 logical volume(s) in volume group "vg_worm00" now active
  /dev/dm-2: stat failed: No such file or directory
  Path /dev/dm-2 no longer valid for device(253,2)
  /dev/block/253:2:stat failed: No such file or directory
  Path /dev/block/253:2 no longer valid for device(253,2)
  /dev/disk/by-id/dm-name-temporary-cryptsetup-664: stat failed: No such file or directory
  Path /dev/disk/by-id-dm-name-temporary-cryptsetup-664 no longer valid for device(253,2)
  1 logical volume(s) in volume group "vg_worm01" now active

Checking filesystems
desktop-x86_64-2: clean, 130055/770048 files, 2100176/3068928 blocks
/dev/sda1: clean, 65/51200 files, 127345/204800 blocks
fsck.ext4: Invalid argument while trying to open /dev/mapper/vg_worm00-LogVol00
/dev/mapper/vg_worm00-LogVol00:
The superblock could not be read or does not describe a correct ext2 filesystem.  If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>



*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
*** Warning -- SELinux is active
*** Disabling security enforcement for system recovery.
*** Run 'setenforce 1' to reenable.
Give root password for maintenance
(or type Control-D to continue): 

Expected results:
Just boot. :-)

Additional info:
Does happen every time.  The few times it's happened so far I've edited /etc/fstab to comment out the /home partition, rebooted, then uncommented it and rebooted and all has been well.  Not sure if that was necessary or whether just rebooting would have worked.

Comment 1 Tim Waugh 2009-08-17 21:23:02 UTC
Just happened again.  This time I just rebooted and it was fine.  Perhaps some kind of race condition?

Comment 2 Milan Broz 2009-08-18 06:31:28 UTC
Race with udev with probably another problem.

But the temporary cryptsetup device must _never_ be scanned, here some rule even created /dev/disk/by-id ... node. 

I hope that next rebuild of device-nmapper/lvm2 in rawhide will switch to udev mode (with proper udev rules) so these types of problems disapper.

Anyway, the wrong rules should be identified, I have similar report from other distros too recently...

Peter, please can check what's the problem here?

p.s.
the logic in cryptsetup is that it scans keyslot using temporary-cryptsetup-* device. This device is cryptsetup internal and is removed before the real mapping happens (there can be 8 temporary mappings according to 8 keyslots). I wonder how it can appear in system _after_ cryptsetup succesfully unlocked the device - I see only option is race with scanning blkid from udev rule, which keep device locked... (dmsetup table when this happens probably show the problem)

Comment 3 Peter Rajnoha 2009-08-18 12:28:59 UTC
Those /dev/disk/ symlinks are created by 95-devkit-disks.rules, I suppose - at least this is the only place I've found with my standard Fedora installation. As Milan says, temporary devices should be ignored by udev rules completely, so there's nothing left that opens those devices (like devkit-disks-part-id, devkit-disks-dm-export and blkid) and creates miscellaneous symlinks then.

I'm just wondering why are the symlinks created on current rawhide - that part of the rules was removed about 2 months ago from rawhide (when looking at the log for DeviceKit-disks package).

So maybe the question is when was this system last updated? If it really is the newest rawhide version of DeviceKit-disks package, then those symlinks have to be created somewhere else.. Hmm..

Anyway, this part responsible for creating /dev/disk symlinks is now part of our udev rules (but it's not in rawhide yet, it will be soon).

I've discussed this with Milan and we have concluded that the best way would be to use udev's OPTIONS+="ignore_device" when any temporary DM device is detected (we'll put that in our udev rules). Maybe we will reserve a special UUID for this.

We'll see, but we certainly have to avoid any access to such temporary devices from udev rules.

Comment 4 Tim Waugh 2009-08-18 13:22:14 UTC
It was newly installed on Wednesday (2009-08-12) and updated daily since then.

Comment 5 Peter Rajnoha 2009-08-18 14:24:20 UTC
Hmm, could you please check /lib/udev/rules.d/95-devkit-disks.rules. There is a part that is marked by comments as "device-mapper" ended by LABEL="device_mapper_end". Can you find anything like this (below) there?

SYMLINK+="disk/by-id/dm-name-$env{DKD_DM_NAME}"
ENV{DKD_DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DKD_DM_UUID}"

...
...

ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"

ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

This was used in older versions of DeviceKit-disks, but should not be there anymore. And I don't know now of any other place where these symlinks are created, this was the only place I've encountered so far.

New version has a comment there instead, like "# avoid probind if it has been done earlier". Do you have this one?

Comment 6 Tim Waugh 2009-08-18 14:36:06 UTC
(In reply to comment #5)
> Hmm, could you please check /lib/udev/rules.d/95-devkit-disks.rules. There is a
> part that is marked by comments as "device-mapper" ended by
> LABEL="device_mapper_end". Can you find anything like this (below) there?
> 
> SYMLINK+="disk/by-id/dm-name-$env{DKD_DM_NAME}"
> ENV{DKD_DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DKD_DM_UUID}"
> 
> ...
> ...
> 
> ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*",
> SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
> 
> ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*",
> SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

That's not in there.

> New version has a comment there instead, like "# avoid probind if it has been
> done earlier". Do you have this one?  

Yes, that's there.

Comment 7 Peter Rajnoha 2009-08-18 14:51:52 UTC
Just one question -- when you create DM devices manually using dmsetup (or when you try to create LVM volumes), could you please have a look and see if the symlinks in /dev/disk/by-id/dm-name-<actual_dm_name> are created as well or not (so we can see directly that there is no other rule that creates those symlinks - if not, the problem is somewhere else, I'm afraid.).

Comment 8 Tim Waugh 2009-08-19 10:11:24 UTC
I didn't create any DM devices manually, I used anaconda.

I'm not very proficient with LVM -- could you explain in more detail what test you need me to run?

Comment 9 Peter Rajnoha 2009-08-19 10:43:22 UTC
Sorry, I should have attached it. Hmm... for example, try this (it just creates a simple loop device and dm device on top of it, if there is any udev rule that creates those symlinks, it should be executed now):

dd if=/dev/zero of=test bs=1M count=1
losetup /dev/loop0 test
dmsetup create test_device --table "0 8 linear /dev/loop0 0"

---> now try to look in /dev/disk/by-id if there is a symlink named like "dm-name-test_device"

dmsetup remove test_device
losetup -d /dev/loop0
rm test

Thanks.

Comment 10 Tim Waugh 2009-08-19 11:01:04 UTC
# dd if=/dev/zero of=test bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00919161 s, 114 MB/s
# losetup /dev/loop0 test
# dmsetup create test_device --table "0 8 linear /dev/loop0 0"
# ls -l /dev/disk/by-id/dm-name-test_device 
lrwxrwxrwx. 1 root root 10 2009-08-19 11:59 /dev/disk/by-id/dm-name-test_device -> ../../dm-4
# ls -l /dev/dm-4
brw-rw----. 1 root disk 253, 4 2009-08-19 11:59 /dev/dm-4

Comment 11 Peter Rajnoha 2009-08-19 11:30:47 UTC
..and what does "grep disk/by-id/dm-name /lib/udev/rules.d/*" and "grep disk/by-id/dm-name /etc/udev/rules.d/*" show?

Comment 12 Tim Waugh 2009-08-19 11:53:06 UTC
# grep disk/by-id/dm-name /etc/udev/rules.d/*
# grep disk/by-id/dm-name /lib/udev/rules.d/*
/lib/udev/rules.d/70-anaconda.rules:SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}"

Comment 13 Peter Rajnoha 2009-08-19 14:20:16 UTC
The device-mapper part in the 70-anaconda.rules should be managed by us instead of anaconda -- I've looked into the anaconda log -- they created those rules as a temporary solution only until we have our own lvm/dm rules.

So I think this will be resolved when we release a new rawhide version and anaconda team will delete that part from their udev rules (..and we won't create symlinks for temporary cryptsetup devices). I've noticed them already, so I hope we'll get this resolved then.

Anyway, thanks a lot for this bug report!

Comment 14 Peter Rajnoha 2009-08-19 15:05:52 UTC
OK, just to make sure...

Have you installed your system from Live CD?

Then, could you please try removing temporarily the 70-anaconda.rules from
/lib/udev/rules.d and see if the problem still exists?

Comment 15 Tim Waugh 2009-08-19 15:41:45 UTC
Yes, it was installed from a Live CD.

That file is owned by anaconda so I'll just remove the anaconda package and see if the problem happens again.

Comment 16 Peter Rajnoha 2009-09-02 08:18:42 UTC
Do you still have this problem?

Comment 17 Tim Waugh 2009-09-02 08:29:09 UTC
No, not happened since then.

Comment 18 Bug Zapper 2009-11-16 11:31:01 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Peter Rajnoha 2009-11-24 13:53:12 UTC
F12 includes the fixes in the udev rules to avoid processing inappropriate devices/events in 70-anaconda.rules as well as in 95-devkit-disks.rules. These were the main problematic places we know about so the problem described here should be resolved too.


Note You need to log in before you can comment on or make changes to this bug.