526652 – Boot fails on encrypted system since Oct 1st 2009

Bug 526652 - Boot fails on encrypted system since Oct 1st 2009

Summary: Boot fails on encrypted system since Oct 1st 2009

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	plymouth
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Ray Strode [halfline]
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F12Blocker, F12FinalBlocker
TreeView+	depends on / blocked

Reported:	2009-10-01 10:30 UTC by Tim Waugh
Modified:	2013-01-10 05:30 UTC (History)
CC List:	24 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-10-06 16:10:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Tim Waugh 2009-10-01 10:30:10 UTC

Description of problem:
The last koji-built kernel that boots successfully on my LUKS-encrypted system is 2.6.31.1-48.fc12.

Version-Release number of selected component (if applicable):
Fails with either of:
kernel-2.6.31.1-52.fc12.x86_64
kernel-2.6.31.1-56.fc12.x86_64
(-53 and -54 not tested)

Other packages:
device-mapper-1.02.38-2.fc12.x86_64
lvm2-2.02.53-2.fc12.x86_64
cryptsetup-luks-1.1.0-0.1.fc12.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot with kernle-2.6.31.1-52.fc12 or later
  
Actual results:
Fails.

Expected results:
Succeeds.

Additional info:
There are two SATA hard drives connected to this system.

/dev/sda1 is /boot and not encrypted
/dev/sda2 is swap
Other /dev/sda* partitions are encrypted LUKS devices containing LVM2 physical volumes.

/dev/sdb1 is an encrypted ext2 partition.

All encrypted devices use the same password.

When booting I am prompted for the encryption password, then these messages appear:

==>
device-mapper: remove ioctl failed: Device or resource busy
Key slot 0 unlocked.
error: unexpectedly disconnected from boot status daemon
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.

No root device found

Boot has failed, sleeping forever.
<==

but stair-stepped as though the stty settings are wrong.

Filing this against the kernel but CC'ing lvm2 maintainer.

Comment 1 Milan Broz 2009-10-01 10:58:58 UTC

I think this is not kernel problem but some race or misconfiguration in dracut probably.

BTW it would be very useful, if dracut have some debug swtich and run cryptsetup with --debug (this is new option, in F12 and rawhide). The same for lvm (here it is -vvvv  switch).

(Should I report RFE for that?)

Then we can see if there is problem with crypto mapping or with other later commands, missing modules etc.

Comment 2 Milan Broz 2009-10-01 11:04:23 UTC

The message
device-mapper: remove ioctl failed: Device or resource busy

can be caused by scan of temporary cryptsetup device by other process but cryptsetup will retry, so it is problem to solve but probably not root cause why it doesn't boot (the --debug log should show that it retries the operation with success)

Comment 3 Tim Waugh 2009-10-01 11:33:09 UTC

Changing component to dracut.

dracut-002-8.git845dd502.fc12.noarch

Comment 4 Harald Hoyer 2009-10-01 11:57:07 UTC

try dracut-002-9
and yes, adding a --debug and -vvvv switch would be useful. will do that

Comment 5 Harald Hoyer 2009-10-01 11:57:58 UTC

you might also want to revert the lvm and device-mapper packages to current rawhide

Comment 6 Tim Waugh 2009-10-01 12:39:00 UTC

They *are* current rawhide.  I updated it earlier today using PackageKit.

$ koji latest-pkg dist-f12 lvm2
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
lvm2-2.02.53-2.fc12                       dist-f12              agk
$ rpm -qi device-mapper | grep 'Source RPM'
Group       : System Environment/Base       Source RPM: lvm2-2.02.53-2.fc12.src.rpm

Comment 7 Tim Waugh 2009-10-01 12:42:40 UTC

(In reply to comment #4)
> try dracut-002-9

From dist-f13?

Comment 8 Tim Waugh 2009-10-01 12:44:18 UTC

Also, do I just need to 'rpm -Fvh dracut*' and reboot with the troublesome kernel, or do I need to run some command to make a new initramfs image?

Comment 9 Harald Hoyer 2009-10-01 12:48:26 UTC

you need to run dracut like you did with mkinitrd

Comment 10 Harald Hoyer 2009-10-01 12:49:47 UTC

$ koji latest-pkg dist-f12 dracut
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
dracut-002-8.git845dd502.fc12             dist-f12              wtogami


damn! why did dracut-002-9 not enter F-12 ???

Comment 11 Tim Waugh 2009-10-01 12:54:17 UTC

I didn't do anything with mkinitrd, I just installed F-12 from rawhide several months ago and have been upgrading with gnome-packagekit ever since.

Please tell me the command line to use.

Comment 12 Harald Hoyer 2009-10-01 13:03:15 UTC

ah, I know what's going on... preparing a patch

Comment 13 Harald Hoyer 2009-10-01 13:04:09 UTC

# dracut /boot/initramfs-<kernel version>.img <kernel version>
or if the image already exists
# dracut -f /boot/initramfs-<kernel version>.img <kernel version>

Comment 14 Harald Hoyer 2009-10-01 13:09:10 UTC

Hmm... seems like the plymouth client disconnects from the daemon.. very strange

error: unexpectedly disconnected from boot status daemon

Comment 15 Harald Hoyer 2009-10-01 13:14:11 UTC

cryptsetup with --debug would not work, because this is executed by
	/bin/plymouth ask-for-password

Comment 16 Harald Hoyer 2009-10-01 13:19:30 UTC

twaugh can you add to the kernel command line "rdinfo quiet rdshell" and hit <alt+enter> as soon as you see the graphical screen

or "rdinitdebug quiet rdshell" and make a photo of the last interesting pages

Comment 17 Harald Hoyer 2009-10-01 13:37:21 UTC

oh, and you might want to remove "rhgb"

Comment 18 Tim Waugh 2009-10-01 15:36:18 UTC

I upgraded to dracut-002-9.git99fd62e3.fc13, removed the -56 kernel, and reinstalled it.

With 'rdinfo rdshell' I don't see any extra information on the screen, and it ends with:

sh: can't access tty; job control turned off
#

(but I can't type at the prompt)

With 'rdinitdebug rdshell' the output flows past too fast for me to capture, even when videoing it.

But: removing 'rhgb' from the kernel boot command line avoids the problem entirely.

Comment 19 Harald Hoyer 2009-10-01 15:47:31 UTC

so plymouth seems b0rken

Comment 20 Ray Strode [halfline] 2009-10-01 19:43:07 UTC

what version of plymouth, what's the output of plymouth-set-default-theme ?

Comment 21 Gene Czarcinski 2009-10-01 20:19:39 UTC

Yes, there is a problem with plymouth.x86_64 0.8.0-0.2009.29.09.1.fc12.  I downgraded to a previous version plymouth-0.8.0-0.2009.28.09.fc12.x86_64 (yes, I keep a lot of "older" packages in a local mirror) and it works.

I am having a big problem on another baremetal system with X and colsoles dying.   I was trying to locate the problem (just which packages) so I was updating a few packages at a time and then rebooting.  When I updated plymouth, the bootup died. IIRC, there was some message from glibc about some kind of loop (or something like that).  I have not had time right now to go back and get better documentation.  There is nothing in /var/log/messages about this.

Comment 22 Tim Waugh 2009-10-02 08:57:28 UTC

$ rpm -q plymouth
plymouth-0.8.0-0.2009.29.09.1.fc12.x86_64
$ plymouth-set-default-theme 
charge

Comment 23 Gene Czarcinski 2009-10-02 12:47:16 UTC

Arrgh ... it appears that I no longer have the problem!

As I said ... I have two basemetal systems running F12-alpha-rawhide.  This first is a dual processor AMD 4400+ (falcon) and the second is a quad processor AMD 940 (hawk).  I had applied a bunch of updates to falcon and suddenly had a big problem with X (the only way in was thru ssh).  I tried downgrading sone updates but could not find the right one.  So, since I had not updated hawk yet, I starting updating a few packages at a time and then rebooting.  During this proces on hawk, I hit some kind of problem with plymouth and, after coming up in init level 3, downgraded plymouth.  After that, I continued updating until only plymouth was left.  I just updated plymouth and the "problem" no longer occurs!

BTW, I reinstalled F12alpha using the TC F12beta DVD on falcon and then updated a few packages at a time with a reboot.  Turns out the problem was xorg-x11-server-{common,Xorg} which had already been BZ'ed.

Sorry I cannot help on this one.  Maybe plymouth was interacting with some other package which was not updated yet.  If it is a real problem, it will return.

Comment 24 Gene Czarcinski 2009-10-02 13:05:24 UTC

For completeness:

Yes, I get:

[gc@hawk ~]$ rpm -q plymouth
plymouth-0.8.0-0.2009.29.09.1.fc12.x86_64
[gc@hawk ~]$ plymouth-set-default-theme 
charge
[gc@hawk ~]$

Also for completeness, I do not have a problem with plymouth on falcon either.

Comment 25 Adam Williamson 2009-10-02 17:19:28 UTC

Reviewed in today's beta blocker bug review meeting.

We cannot reproduce this on QA's test beds or on Jesse Keating's personal system (all of which are configured similarly). Tim definitely confirms it with today's Rawhide, however.

Tim has KMS disabled. He will test with it enabled. He confirms that disabling plymouth (removing rhgb from kernel parameters) works around the issue.

Tim will also try to get more diagnostics on the problem if he can.

We agreed this will not be promoted to beta blocker as we cannot reproduce it on other systems, and there's a usable workaround.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 26 Tim Waugh 2009-10-02 18:08:05 UTC

I *thought* I'd disabled KMS but on closer inspection it seems that KMS just doesn't work for me with the default parameters.

I just tried booting with 'vga=792' added to the command line, making the full command line the following:

ro root=/dev/mapper/vg_worm01-LogVol00 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_GB.UTF-8 KEYTABLE=gb rd_plytheme=charge vga=792

This boots successfully, and even with a graphical boot sequence.  So there's another work-around: add 'vga=792'.

When I don't add 'vga=792', I don't get a graphical boot sequence and haven't done for months.  So is it that plymouth is behaving badly in that situation?

Here's the information about my graphics card:

01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B64 [FireGL V3100 (PCIE)] (rev 80)

01:00.0 0300: 1002:5b64 (rev 80) (prog-if 00 [VGA controller])
	Subsystem: 1002:0102
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M]
	Region 1: I/O ports at dc00 [size=256]
	Region 2: Memory at fe9e0000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at fea00000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns, L1 <2us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <128ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Kernel modules: radeon, radeonfb

Comment 27 Adam Williamson 2009-10-02 18:18:35 UTC

can you play with the 'radeon.modeset=0' (forces KMS off) and 'radeon.modeset=1' (should force KMS on) kernel parameters?

(You'll get a message that the parameters are invalid - it's wrong, ignore it.)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 28 Adam Williamson 2009-10-02 19:15:50 UTC

it's interesting that the same error message crops up as in the firstboot bug we're also looking at:

error: unexpectedly disconnected from boot status daemon

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 29 Adam Williamson 2009-10-02 19:16:04 UTC

that's https://bugzilla.redhat.com/show_bug.cgi?id=526842 , btw.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 30 Tim Waugh 2009-10-03 11:14:21 UTC

(In reply to comment #27)
> can you play with the 'radeon.modeset=0' (forces KMS off) and
> 'radeon.modeset=1' (should force KMS on) kernel parameters?

Results:

radeon.modeset=0: Same problem as normal, no difference

radeon.modeset=1: This seems to avoid the problem.  I say "seems" because although it does boot, the gdm screen never appears.  It looks like X is stopping and restarting continuously -- if I time it right I can press Ctrl-Alt-F1 and Ctrl-Alt-Delete to reboot.

Comment 31 Gene Czarcinski 2009-10-03 14:47:03 UTC

Tim --

Your results with radeon.modeset=1 appears to me to be a lot like problem:
https://bugzilla.redhat.com/show_bug.cgi?id=526380

I had the problem described in 526380 with an ATI video card and it was fixed with the latest rawhide update.

Do you have the latest xorg-x11-server-* updates applied?

Comment 32 Adam Williamson 2009-10-03 17:12:22 UTC

tim: that's interesting - sounds like perhaps this bug is triggered with modesetting disabled but Plymouth enabled - bad interactions there, perhaps - and you're running into an unrelated X bug with modesetting enabled.

Jesse, could you see if you can reproduce the bug if you boot with modesetting disabled on your test system? jlaska, ditto with the test systems we couldn't reproduce on?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 33 Ray Strode [halfline] 2009-10-05 15:18:24 UTC

error: unexpectedly disconnected from boot status daemon

means plymouth crashed.

If modesetting isn't working then we'll fall back to the text plugin, so maybe the bug is there?

I'll try to do an install with an encrypted disk to try to reproduce this.

Comment 34 Ray Strode [halfline] 2009-10-05 21:46:29 UTC

This should be fixed in plymouth-0.8.0-0.2009.29.09.3.fc12

Comment 35 Tim Waugh 2009-10-06 11:28:02 UTC

Fix confirmed.  Thanks!

Comment 36 Adam Williamson 2009-10-06 16:10:17 UTC

Has been tagged: https://fedorahosted.org/rel-eng/ticket/2341

closing.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Note You need to log in before you can comment on or make changes to this bug.

agk
awilliam
bmarzins
bmr
dcantrell
dougsland
dwysocha
fedora
gansalmon
gczarcinski
harald
heinzm
itamar
jlaska
kernel-maint
krh
lvm-team
mbroz
mclasen
mhlavink
msnitzer
prajnoha
prockai
rstrode