Bug 702650 - Concurrency problem when unlocking partition
Summary: Concurrency problem when unlocking partition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth
Version: 15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F15Blocker, F15FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2011-05-06 13:23 UTC by Kamil Páral
Modified: 2013-01-22 23:45 UTC (History)
17 users (show)

Fixed In Version: plymouth-0.8.4-0.20110510.2.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-16 18:59:50 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
disk layout (47.85 KB, image/png)
2011-05-06 13:23 UTC, Kamil Páral
no flags Details
hanged system (14.88 KB, image/png)
2011-05-06 13:24 UTC, Kamil Páral
no flags Details
plymouth-debug.log with ok boot (103.00 KB, text/plain)
2011-05-09 07:46 UTC, Kamil Páral
no flags Details
crypttab (90 bytes, text/plain)
2011-05-09 07:46 UTC, Kamil Páral
no flags Details
fstab (885 bytes, text/plain)
2011-05-09 07:46 UTC, Kamil Páral
no flags Details
plymouth-debug.log (62.97 KB, text/plain)
2011-05-11 07:20 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2011-05-06 13:23:17 UTC
Created attachment 497348 [details]
disk layout

Description of problem:
I have created an encrypted partition mounted at /opt inside unencrypted LVM. See attachment layout.png.

When I boot the system, sometimes I can enter the password to unlock it and system boots correctly. But sometimes (let's say about 33% of cases) the whole system gets stuck at the point where it asks for password. I cannot type anything in, Esc doesn't work.

If I hit Esc on boot to switch to text boot message view, I see something like hang.png. Maybe some service is started after the password prompt and kills the input?

I'm using KVM.

Version-Release number of selected component (if applicable):
Fedora 15 TC1

How reproducible:
33% at my experience

Steps to Reproduce:
1. Install F15 according to layout.png, minimal install is enough
2. Try to boot it several times, it should happen that you get stuck too
  
Additional info:
I tried different setups with encrypted partitions (like encrypted root partition, encrypted LVM, etc) and nothing seems to trigger this issue except the one layout I decribed.

Comment 1 Kamil Páral 2011-05-06 13:24:23 UTC
Created attachment 497349 [details]
hanged system

Comment 2 Kamil Páral 2011-05-06 13:28:30 UTC
Sorry, forgot:
plymouth-0.8.4-0.20110304.1.fc15.i686
plymouth-core-libs-0.8.4-0.20110304.1.fc15.i686
plymouth-graphics-libs-0.8.4-0.20110304.1.fc15.i686
plymouth-plugin-label-0.8.4-0.20110304.1.fc15.i686
plymouth-plugin-two-step-0.8.4-0.20110304.1.fc15.i686
plymouth-scripts-0.8.4-0.20110304.1.fc15.i686
plymouth-system-theme-0.8.4-0.20110304.1.fc15.i686
plymouth-theme-charge-0.8.4-0.20110304.1.fc15.i686
systemd-25-1.fc15.i686
systemd-units-25-1.fc15.i686

Comment 3 Kamil Páral 2011-05-06 13:29:36 UTC
One more thing. I just noticed that when I wait long enough with the hanged system (two minutes?) it automatically switches itself to "Welcome to emergency mode". Hope that helps.

Comment 4 Kamil Páral 2011-05-06 13:31:59 UTC
Proposing as F15Blocker according to this release criterion:

"14. In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account"

https://fedoraproject.org/wiki/Fedora_15_Alpha_Release_Criteria

Comment 5 Lennart Poettering 2011-05-06 16:54:19 UTC
So if I understand this correctly, you get the graphical Plymouth prompt, but it takes no input? 

Sounds like a Plymouth issue, tentatively reassigning to Plymouth.

Comment 6 Tim Flink 2011-05-06 18:48:26 UTC
Discussed in the 2011-05-06 blocker review meeting. We were unable to determine the status of this with existing data. The impact is bad but this has only been reported by one user with a not-common layout (only /opt lv is encrypted).

Are other users with encrypted partitions seeing this? Would disabling plymouth work as a workaround?

If this can be triaged down to a root cause and proposed fix, that would help assess the potential impact and side-effects.

Comment 7 Ray Strode [halfline] 2011-05-06 19:59:38 UTC
I need a better grip on what's going on first to know where to go next.

Let's back up a bit...  Here's the last bits of the screenshot typed out:

Starting Initialize storage subsystems (RAID, LVM, etc.)...
Starting Cryptography Setup for luks-65<long uuid here>...
Starting Forward Password Requests to Plymouth...
Started Forward Password Requests to Plymouth.
Started Forward Password Requests to Plymouth.

Please enter passphrase for disk VolGroup-lv_opt (luks-65<long uuid here) on /opt!:Started Initialize storage subsystems (RAID, LVM, etc.).
_

As I understand things, systemd is running /lib/systemd/fedora-storage-init which ultimately does:

/sbin/lvm vgchange -a y --sysinit

That goo makes the encrypted device backing /opt (luks-65<long uuid) show up,  which is probably mentioned in /etc/crypttab. At some point earlier in boot systemd created config files dynamically based on the contents of /etc/crypttab. 

 One of those dynamic rules now kicks in and so we see "Starting Cryptography Setup for .."

That rule ends up running systemd-cryptsetup which mucks around with 

/run/systemd/ask-password

causing the 

DirectoryNotEmpty=/run/systemd/ask-password 

directive for systemd-ask-password-plymouth to kick in.  That's why we see "Starting Forward Password Reqeusts to Plymouth".  This makes /bin/systemd-tty-ask-password-agent --plymouth --watch get run.  I have no idea why it says it's getting started twice. That seems bizarre.

systemd-tty-ask-password-agent then picks up the muck in /run/systemd/ask-password and asks plymouth to ask for the password based on the muck.  Plymouth does...that's the "Please enter passphrase for disk VolGroup-lv_opt (luks-65<long uuid here) on /opt!:" part.

systemd is doing the password asking part asynchronously, though, so it continues on in the background that's why we see "Started Initialize storage subsystems (RAID, LVM, etc.)." (since fedora-storage-init finished)

Comment 8 Ray Strode [halfline] 2011-05-06 20:01:44 UTC
can you add plymouth.debug to the kernel command line and post the extra output when it fails?

Also, please post

 /etc/crypttab and /etc/fstab

Comment 9 Kamil Páral 2011-05-09 07:46:05 UTC
Created attachment 497724 [details]
plymouth-debug.log with ok boot

(In reply to comment #8)
> can you add plymouth.debug to the kernel command line and post the extra output
> when it fails?
> 
> Also, please post
> 
>  /etc/crypttab and /etc/fstab

I can't hit that issue with plymouth.debug enabled. It just works. I attached at least debug log from a successful boot.

Please note that UUID of the partition is different, I used a different installation than before.

Comment 10 Kamil Páral 2011-05-09 07:46:22 UTC
Created attachment 497725 [details]
crypttab

Comment 11 Kamil Páral 2011-05-09 07:46:35 UTC
Created attachment 497726 [details]
fstab

Comment 12 Kamil Páral 2011-05-09 07:48:41 UTC
(In reply to comment #6)
> Would disabling plymouth
> work as a workaround?

I'd love to test that, but I can't seem to be able to find plymouth kernel options documented anywhere (tried man page, project page and fedora wiki). Links welcome.

Comment 13 Kamil Páral 2011-05-09 15:54:14 UTC
I did some more testing and the results are different now (maybe because it really is a concurrency issue).

I booted the system 5 times with default plymouth:
success rate: 0% (i.e. password prompt was stuck every time)

Then I booted the system 5 times and hit Esc on plymouth start (thus showing all the text info during boot):
success rate: 0%

I booted the system 5 times with rd.plymouth=0 kernel option:
success rate: 100% (i.e. password accepted every time)

And I booted the system 5 times with plymouth.debug kernel option (using both graphical progressbar and text boot messages):
success rate: 100%


Again, I'm using VM. So my plymouth screen is just a graphical progressbar, no fancy fedora logo as on bare metal (if that makes any difference).

I tried 3 different installations (2 minimal, 1 default) with the same partition setup. All of them behave the same way, as reported here.

In different partition setups (where password is provided before root partition is mounted) everything works great. This seems to happen only when another partition needs to be unlocked later in the boot process.

Comment 14 cornel panceac 2011-05-09 18:08:19 UTC
i've seen this on real hardware (dell laptop with external usb disk attached). i've seen it on f15 and scientific linux 6. sometimes the prompt works ok, and some other times, doesn't. when it doesn't, if i wait long enough, keyboard control returns and i can either enter the password in plymouth or if i pressed escape, in text mode. i don't have the machine at hand, but tomorrow i'll do more testing.

Comment 15 James Laska 2011-05-10 14:37:17 UTC
With /opt as an encrypted btrfs partition (not a logical volume), I was unable to reproduce the reported issue.  Retesting using an encrypted logical volume ...

Comment 16 James Laska 2011-05-10 15:38:22 UTC
(In reply to comment #15)
> With /opt as an encrypted btrfs partition (not a logical volume), I was unable
> to reproduce the reported issue.  Retesting using an encrypted logical volume
> ...

Unable to reproduce in virt with /opt as an encrypted logical volume.  Retesting on bare metal ...

Comment 17 Tom "spot" Callaway 2011-05-10 17:17:12 UTC
I cannot reproduce this issue on baremetal with /opt as an encrypted logical volume formatted as ext4.

Comment 18 James Laska 2011-05-10 17:18:54 UTC
(In reply to comment #17)
> I cannot reproduce this issue on baremetal with /opt as an encrypted logical
> volume formatted as ext4.

Thanks for testing Spot.  Same here ... I'm not able to reproduce with /opt as an encrypted logical volume on bare metal.

Comment 19 Tim Flink 2011-05-10 17:25:35 UTC
This is really strange, I could reproduce this yesterday on the order of every 1 in 3 or 4 reboots. I just rebooted the same VM (i386 default desktop F15) ~ 10 times and haven't seen it once. I'm not sure what I changed to fix this, though.

Comment 20 Ray Strode [halfline] 2011-05-10 20:50:59 UTC
i've done a new build of plymouth here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3063818

(make sure to run /usr/libexec/plymouth/plymouth-update-initrd after installing it)

it doesn't fix anything, but what it does do is add the ability to prevent debug output from going to the terminal.  Hopefully, that will make the problem more reproducible with debugging enabled.  to do that, add

plymouth:debug=stream:/dev/null

to the kernel command line. After boot up /var/log/plymouth-debug.log should still get generated.

Kamil, can you give it a go?

Comment 21 Kamil Páral 2011-05-11 07:20:21 UTC
Created attachment 498215 [details]
plymouth-debug.log

(In reply to comment #20)
> Kamil, can you give it a go?

Yes, this time I could reproduce it even with debug info enabled.

I booted the system with that option, it got stuck on password prompt. I tried to press keys several times. Then I waited one or two minutes to get into emergency mode and pressed Ctrl+D to continue boot. I got asked again for the password, this time I could enter it (btw it redirected plymouth debug messages again to the screen, which caused a pretty mess). And the rest of the boot went fine.

Debug log attached.

Comment 22 Björn Ruberg 2011-05-11 13:52:04 UTC
I can confirm this problem with 32-bit Fedora 15. Everything up to this date.
I installed Fedora 15 fresh on a Latitude D620. I crypted the home partition after installation.

Now, every time when I should enter the password, the systems stucks after that. (I see some hard disk activity leds blinking - but the password screen does not disappear)

I can only boot succesfully when I hit Escape during the boot (see the text ouput) - hit Esc again to enter the password screen, enter the password and escape out of plymouth again directly after.

Comment 23 Ray Strode [halfline] 2011-05-11 14:02:34 UTC
Thanks Kamil,

That debug log was very helpful.  I see the problem and am working toward a solution now.

Comment 24 James Laska 2011-05-11 14:53:43 UTC
Thanks Ray!  Looks like a fix is available [1], and a new plymouth build is now in koji [2] for testing.

[1] http://cgit.freedesktop.org/plymouth/commit/?id=113b2e27726c5d95c31034dc5e9db1e8b985c963
[2] http://koji.fedoraproject.org/koji/buildinfo?buildID=243503

Comment 25 Fedora Update System 2011-05-11 14:54:19 UTC
plymouth-0.8.4-0.20110510.2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/plymouth-0.8.4-0.20110510.2.fc15

Comment 26 James Laska 2011-05-11 15:59:43 UTC
I added positive karma to the update in bodhi.  While I can't consistently reproduce the failure in a virt guest with the older package, I can consistently boot successfully with the updated package (and new initrd).

Kamil ... any luck on your end?  You seem to have the best luck reproducing this issue.

Comment 27 James Laska 2011-05-11 19:07:46 UTC
Accepted as a release blocker due to the criteria stated in comment#4 and for 

   "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above "

The frequency of this failure made it unclear whether it would be accepted.  However, given that a fix is available and tested, I believe we can proceed with marking this as a blocker and including it in RC1.  Nice work everyone in triage, fixing and testing.

Comment 28 Kamil Páral 2011-05-12 09:07:49 UTC
(In reply to comment #26)
> Kamil ... any luck on your end?  You seem to have the best luck reproducing
> this issue.

If you call it "luck"... :-)

Yes, now it works perfectly. Thanks Ray, seems fixed.

Comment 29 Fedora Update System 2011-05-14 03:06:43 UTC
Package plymouth-0.8.4-0.20110510.2.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing plymouth-0.8.4-0.20110510.2.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/plymouth-0.8.4-0.20110510.2.fc15
then log in and leave karma (feedback).

Comment 30 James Laska 2011-05-16 12:20:51 UTC
Moving to VERIFIED based on feedback in comment#28 and comment#26

Comment 31 Fedora Update System 2011-05-16 18:59:45 UTC
plymouth-0.8.4-0.20110510.2.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 32 Richard Körber 2011-06-06 14:43:44 UTC
I guess that I have this issue with a Fedora 15 installation. On that machine Fedora 14 was installed before, almost fully encrypted:

* "/boot" partition, ext4, unencrypted
* "/" partition, ext4, encrypted
* swap partition, encrypted
* "/home" partition, ext4, encrypted

Except of the "/home" partition, I did a fresh installation of Fedora 15. "/boot", "/" and swap were formatted. "/" and swap were encrypted, "/home" was kept encrypted. I was using the same encryption passphrase I used on the Fedora 14 system.

When I boot the system, Plymouth first asks for the passphrase (as it is expected to do). After that, the system starts booting, but then after a few seconds, the screen always turns black and booting stops.

After pressing Ctrl-Alt-Backspace and Ctrl-Alt-Return repeated times (which is somehow a desperate measure, but it always helped), a console appears.

In the console I am asked for the passphrase of the "/home" partition - which is exactly the passphrase I already typed in. Sometimes (but not always) the passphrase for the swap partition is asked after that, which again is the passphrase I already typed in. Now the system starts successfully.

After reading this bug and seeing the screenshots, I suppose this is the issue I am having on that system.

Comment 33 Richard Körber 2011-06-14 15:28:17 UTC
I found out that I actually have to press the Esc key in order to reach the console where the passphrase is asked a second time.

When I boot without Plymouth, the passphrase is only asked once, as it is expected.

I am not allowed to reopen this bug. Shall I create a new one?

Comment 34 Rafał Polak 2011-06-26 08:39:03 UTC
I'm using the same hdd configuration as https://bugzilla.redhat.com/show_bug.cgi?id=702650#c32 . Once in a while system stops during boot process, and asks for password again.

Comment 35 cornel panceac 2011-06-26 10:19:30 UTC
I have a fully updated fedora 15 and rhgb is not starting (because of nvidia driver). anyway, i've found that at the (text) prompt to enter password, often the keyboard is not responding until a few seconds later. (no *lock is working and no * is printed.) 
this bug should be reopened.

Comment 36 Richard Körber 2011-11-09 22:21:40 UTC
This bug still occurs on F16 after upgrade.

Please reopen this bug!

Comment 37 Michal Schmidt 2011-11-10 14:48:17 UTC
Richard, it is not certain that you are seeing the same problem as Kamil did. After all, Kamil confirmed that the updated packages fixed the bug for him. It may be better to report it as a new bug.
Note that this BZ was resolved only after Kamil attached the debug logs Ray asked for. I'm sure Ray will need to see logs from your system too in order to debug the problem.


Note You need to log in before you can comment on or make changes to this bug.