Bug 524714 - F12 Virt Test Day liveCD kernel won't boot on Nehalem virtlab machines
Summary: F12 Virt Test Day liveCD kernel won't boot on Nehalem virtlab machines
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 12
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-21 20:41 UTC by Don Dutile (Red Hat)
Modified: 2013-01-10 05:29 UTC (History)
10 users (show)

Fixed In Version: 004-4.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-01-28 00:53:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Machine cpuinfo from RHEL5.4 kernel (2.50 KB, text/plain)
2009-09-21 20:41 UTC, Don Dutile (Red Hat)
no flags Details
lspci -vvv output from rhel5.4 kernel (18.51 KB, text/plain)
2009-09-21 20:43 UTC, Don Dutile (Red Hat)
no flags Details
dmidecode from rhel5.4 kernel on this machine (14.49 KB, text/plain)
2009-09-21 20:44 UTC, Don Dutile (Red Hat)
no flags Details
dmesg from rhel5.4 system w/intel_iommu=on (27.29 KB, text/plain)
2009-09-21 20:45 UTC, Don Dutile (Red Hat)
no flags Details
Boot log of failed F12 LiveCD of 20091109.15 x86_64 on Weybridge machine (75.57 KB, text/plain)
2009-11-12 16:52 UTC, Don Dutile (Red Hat)
no flags Details

Description Don Dutile (Red Hat) 2009-09-21 20:41:43 UTC
Created attachment 361994 [details]
Machine cpuinfo from RHEL5.4 kernel

Description of problem:
F12 Virt Test Day liveCD fails to boot on Weybridge devel machines.
This machine boots rhel5.4 successfully, as well as support
device assignment when intel_iommu=on is set.

Version-Release number of selected component (if applicable):
kernel: 2.6.31-12.fc12.x86_64 (installed from liveCD)
liveCD: desktop-x86_64-20090915.15.iso

How reproducible:
Every boot

Steps to Reproduce:
1. Install liveCD into CD of Weybridge machine
2. Reboot existing OS or power cycle machine
3.
  
Actual results:
Fails with the following messages on the console screen:
No root device found.
Boot has failed, sleeping forever.

Expected results:
System boots to liveCD login screen and able to install F12 onto machine.

Additional info:
I edited the boot cmdline to remove "quiet", add "debug", and
it gave some additional device config output, but ended in the same
failure.
Also tried with intel_iommu=off and had the same result/problem.

Comment 1 Don Dutile (Red Hat) 2009-09-21 20:43:42 UTC
Created attachment 361995 [details]
lspci -vvv output from rhel5.4 kernel

Comment 2 Don Dutile (Red Hat) 2009-09-21 20:44:11 UTC
Created attachment 361996 [details]
dmidecode from rhel5.4 kernel on this machine

Comment 3 Don Dutile (Red Hat) 2009-09-21 20:45:14 UTC
Created attachment 361997 [details]
dmesg from rhel5.4 system w/intel_iommu=on

Comment 4 Mark McLoughlin 2009-10-01 08:02:20 UTC
Don: can you get full dmesg output over serial for the failing boot? There's not much clues to go on here.

You could also try a more recent boot.iso and see if it's still broken:

  http://download.fedoraproject.org/pub/fedora/linux//development/x86_64/os/images/boot.iso

Comment 5 Don Dutile (Red Hat) 2009-10-01 14:33:55 UTC
hmmm... title of bz is wrong.  This is about Weybridge machine, not 
virtlab Nehalem machines.

Anyhow, i was able to boot the boot.iso;
but during F12 Virt Test day, we were given LiveCD's to test with, not
boot.iso's.  Now, if liveCD's start with the same boot.iso, then this
is an improvement, since I can get to the installation screens to 
select partitions to load F12 on, etc. (which I will do shortly, once
I re-figure out which partition is rhel5 & which is fedora! ;-) ).

If I can get some time on the Nehalem virt lab machines (from cdub),
i'll try the boot.iso there as well.

Comment 6 Mark McLoughlin 2009-10-01 14:59:09 UTC
okay, that's useful data

latest nightly live CD composes gets dumped here:

  http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/

maybe you could try out the i386 one there? if that doesn't work, should capture the log and move the bug to livecd-tools

Comment 7 Don Dutile (Red Hat) 2009-10-01 19:26:02 UTC
I was able to boot w/the boot.iso listed in c#4.

Able to do an upgrade on a Fedora installation that was at f9, then f10, 
was f11.   One 'new' feature: when doing an upgrade to f12, it nuked/removed/deleted all other fedora kernels on the system.... sigh....

Anyhow, after installation, rebooted successfully.
Will try kvm guest (with & without (nic) dev assignment) next.

Re-assigning to liveCD, since it appears that multiple installations
fail with similar error, that appears to be similar to other
dracut errors.
note: The Weybridge under test has 10 partitions, multiple LVM's, some
       partitions not in LVMs, some partitions labeled, but not all of
       them .... so quite the 'variety' of partitions & their uses.

Comment 8 Matthias Clasen 2009-10-01 20:50:24 UTC
Not sure that moving this to LiveCD will help. If anything, this appears to be either a kernel or dracut problem.

Comment 9 Matthias Clasen 2009-10-02 00:04:44 UTC
Moving to dracut for now.

Comment 10 Harald Hoyer 2009-11-05 12:26:16 UTC
is this still an issue?

Comment 11 Don Dutile (Red Hat) 2009-11-05 14:08:16 UTC
I don't know.
Haven't tried an updated liveCD of F12 on a Nehalem;
I've seen lots of chatter about anaconda trying to open
all filesystems on an install (vs update), and how that
logic is not so well liked (for other reasons like encrypted
filesystems).

I recommend a test on a Nehalem (or Tylersburg) system
(virtlab16->virtlab17) with latest F12 liveCD to see if
it's still busted.

- Don

Comment 12 Mark McLoughlin 2009-11-10 08:24:36 UTC
(In reply to comment #11)
> I recommend a test on a Nehalem (or Tylersburg) system
> (virtlab16->virtlab17) with latest F12 liveCD to see if
> it's still busted.

Please let us know when you re-test

Comment 13 Don Dutile (Red Hat) 2009-11-10 14:14:54 UTC
I tested desktop-x86_64-20091109.15.iso  late yesterday
and it still fails with 'no root device found' 
on virtlab17 (Tylersburg machines; the bz for the Tylersburg machine is
BZ 527529).
I did not get a chance to test on Weybridge machine;
I'll test that machine on Thursday.

On the Tylersburg machines, it no longer crashes in
the graphics driver, so it is better than the Virt Test day version.
It now fails more like the Weybridge machine did, but with some
new wrinkles, so I'm adding my test results here, since they
may be relevant to the F12 LiveCD problem.

I got the following (when removing "quiet", adding "debug" to kernel cmdline):

dracut: Starting plymouth daemon
      : Starting plymouth daemon
      : rd-NO-MD: removing MD RAID activation
      : rd-NO-MDIMSM: no MD RAID for imsm/isw raids
      : scanning sda2 for LVM volume groups
      : Reading all physcial volumes.  This may take a while
      : Found volume group "VolGroup00" using metadata type lvm2
      : 2 logical volume(s) in volume group "VolGroup00" now active

Note that the order of this output wrt various driver init completions
varies from boot run to boot run. Also note, I'm being a bit lazy,
and all above lines are prefixed by "dracut:".

Another oddity -- the last 4 lines of the above dracut output repeat,
but the second block is often separated from the above block by other
driver init completions.
 
It is after this second block of dracut messages that the famous
"No root device found" msg comes out followed by
"Boot has failed, sleeping forever" 

I also tried booting w/rhgb removed and no difference.

I also tried with irqpoll, and that ordered the driver output a bit
differently, but same end result.

I tried with intel_iommu=off and the system never gets to the 
No root device found msg;  it always hangs after various messages
related to the mpt2sas (LSILOGIC SCSI) driver.

I tried w/iommu=pt and got the No root found failure.
I tried w/iommu=soft and hung after mpt2sas's msg of 
  'version 01.100.04.00 loaded' 

So, on the Tylersburg systems, it now appears to be an
mpt2sas driver problem.

One more test I think I'll try on those machines is with mem=2G to
see if it gets the mpt2sas driver to configure succesfully.

In summary:
 -- Tylersburg machine better, but still failing
 -- need to re-test Weybridge tomorrow & report results in this bz.

Comment 14 Don Dutile (Red Hat) 2009-11-11 22:00:16 UTC
Test results on Weybridge machine:

Essentially, it is the same as for the virtlab17 machine

always end up fith 'No root device found'

I tried (all with quiet & rhgb removed, debug set on kernel cmdline):

 (a) intel_iommu=off
 (b) iommu=pt
 (c) iommu=soft
 (d) mem=2G
 (e) (a) & (d) together
 (f) removed all the no-raid & no-luks cmdline switches

As the Tylersburg machines, dracut saw the Volgroup's on 2 different
disks on that system.  

Graphics worked as well when rhgb left on; just died with
"unable to remove a fb that we didn't own" after the 
'No root device found'

btw -- it was interesting to see that although the message
"Boot failed, sleeping forever", if I toggled my console-KVM,
I'd see USB unplug, plug-in messages come out.... so it could be
woken up! ;-)

So, although the Tylersburg systems may be seeing an mpt2sas
issue, there are no such things on a Weybridge, and I'm getting
similar 'No root device found' problems.

Given that a generic boot.iso works, it still points
to further dracut problems.

Still don't understand why VTd systems fail with 
liveCD, given that it fails even when it is forced off
(intel_iommu=off or iommu=soft).

Comment 15 Harald Hoyer 2009-11-12 10:19:42 UTC
try adding "rd_info rd_initdebug" and removing "rhgb quiet" to/from the kernel command line.

btw, what is your kernel command line?

Comment 16 Harald Hoyer 2009-11-12 10:20:42 UTC
(In reply to comment #15)
> try adding "rd_info rd_initdebug" and removing "rhgb quiet" to/from the kernel
> command line.

of course it should be "rdinfo rdinitdebug" ... -ENOCOFFEE

> 
> btw, what is your kernel command line?

Comment 17 Don Dutile (Red Hat) 2009-11-12 16:49:18 UTC
Attached is the boot up log with the following kernel command line:

Command line: initrd=initrd0.img root=live:CDLABEL=desktop-x86_64-20091109.15 rootfstype=auto ro liveimg debug console=ttyS0,115200 console=tty0 rdinfo rdinitdebug rd_NO_LUKS rd_NO_MD noiswmd  BOOT_IMAGE=vmlinuz0 


I removed the "quiet rhgb" and added "debug console=.... rdinitdebug"
so I could get the output on a serial console & saved as attachment.
The rest of the command line comes from the LiveCD image.

Note, I added 'debug' in order to get dracut output in boot-up log;
w/o it, no dracut output would occur, just the last two log sections
after the No root device found.

Let me know if I can provide more (testing) info.

- Don

Comment 18 Don Dutile (Red Hat) 2009-11-12 16:52:07 UTC
Created attachment 369260 [details]
Boot log of failed F12 LiveCD of 20091109.15 x86_64 on Weybridge machine

Comment 19 Harald Hoyer 2009-11-12 19:59:15 UTC
seems like it cannot find a device/partition with a filesystem LABEL "desktop-x86_64-20091109.15"

Comment 20 Harald Hoyer 2009-11-12 20:00:48 UTC
how is desktop-x86_64-20090915.15.iso presented to the system?

Comment 21 Don Dutile (Red Hat) 2009-11-13 18:44:24 UTC
What do you mean '..presented ot the system?' ???

If I put the CD into my w/s, it says its label is:
"desktop-x86_64-20091109.15"

and it's filesystem is iso9660.

Note, that in the boot log, it states "CDLABEL=desktop-x86_64-20091109.15"
and not "LABEL=desktop-x86_64-20091109.15"

are they equivalent ???

Comment 22 Harald Hoyer 2009-11-16 09:22:14 UTC
How is the iso image bound in the system? real CDROM? virtual CDROM?

Comment 23 Mark McLoughlin 2009-11-16 10:02:13 UTC
I think it's pretty clear - he's inserting a physical CDROM into his machine and booting from it

Comment 24 Harald Hoyer 2009-11-16 10:16:59 UTC
scsi 1:0:0:0: CD-ROM            PLEXTOR  DVDR   PX-755A   1.04 PQ: 0 ANSI: 5
 sda8sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi CD-ROM sr0

ok, then please run with the cdrom inserted:

$ blkid /dev/cdrom

and/or

$ /lib/udev/vol_id /dev/cdrom

to check the filesystem label of the cdrom.

Comment 25 Bug Zapper 2009-11-16 12:43:52 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 26 Don Dutile (Red Hat) 2009-11-16 15:21:55 UTC
On my w/s, which uses a DVD reader....

# blkid /dev/dvd
/dev/dvd: LABEL="desktop-x86_64-20091109.15" TYPE="iso9660" 

# /lib/udev/vol_id /dev/dvd
ID_FS_USAGE=filesystem
ID_FS_TYPE=iso9660
ID_FS_VERSION=
ID_FS_UUID=
ID_FS_LABEL=desktop-x86_64-20091109.15
ID_FS_LABEL_SAFE=desktop-x86_64-20091109.15


In case there was something odd w/my Weybridge's DVD reader,
I ran the same cmd's on the CD and it generated the same
output as well.

Comment 27 Harald Hoyer 2009-11-16 17:07:11 UTC
ok, looks good, now boot with the live CD, and add to the kernel command line:

"rdinfo rdinitdebug rdshell"

and you will get a shell in case of a failed boot..

run blkid on the cdrom again
# blkid /dev/cdrom
run 
# ls /dev/disk/by-label

Comment 28 Harald Hoyer 2009-11-16 17:07:40 UTC
and please provide the output of dmesg

Comment 29 Harald Hoyer 2009-11-16 17:08:00 UTC
(In reply to comment #28)
> and please provide the output of dmesg  

scratch that

Comment 30 Don Dutile (Red Hat) 2009-11-16 19:49:01 UTC
# blkid /dev/scd0   (/dev/cdrom doesn't exist)
LABEL = "desktop-x86_64-20091109.15" TYPE="iso9660"

# ls /dev/disk/by-label
SWAP-sda3 \x2f \x2fboot \x2fboot_el5_32 \x2fboot_el5_64 \x2fboot_f9_32 \x2fguest_images \x2froot_el5_32 \x2froot_el5_64 \x2froot_f9_32

Comment 31 Harald Hoyer 2009-11-17 09:23:15 UTC
 /dev/cdrom does not exist? very strange... something is wrong with udev and your CDROM. 

Workaround: specify on the kernel command line
root=/dev/scd0

Comment 32 Harald Hoyer 2009-11-17 09:31:56 UTC
(In reply to comment #31)
>  /dev/cdrom does not exist? very strange... something is wrong with udev and
> your CDROM. 
> 
> Workaround: specify on the kernel command line
" root=/dev/scd0  liveimg"

Comment 33 Harald Hoyer 2009-11-17 09:39:30 UTC
also, I think, specifying both "liveimg" _and_ "root=live:" might create a problem.

either:
"liveimg root=CDLABEL=desktop-x86_64-20091109.15"
or
"root=live:CDLABEL=desktop-x86_64-20091109.15"

Comment 34 Harald Hoyer 2009-11-17 09:41:18 UTC
(In reply to comment #30)
> # blkid /dev/scd0   (/dev/cdrom doesn't exist)
> LABEL = "desktop-x86_64-20091109.15" TYPE="iso9660"
> 
> # ls /dev/disk/by-label
> SWAP-sda3 \x2f \x2fboot \x2fboot_el5_32 \x2fboot_el5_64 \x2fboot_f9_32
> \x2fguest_images \x2froot_el5_32 \x2froot_el5_64 \x2froot_f9_32  

it would be very interesting to see the output of a boot with 
"rdudevdebug" added to the kernel command line.

Comment 35 Don Dutile (Red Hat) 2009-11-19 19:44:45 UTC
None of the recommendations in c#32 or c#33 helped.
In all cases, it ended with root device not found.

note: when trying  " root=/dev/scd0  liveimg"  
it did generate a new/extra output:
            failed: you must specify filesystem type
then the all-too-common Can't mount root filesystem

When adding 'rdudevdebug' to the kernel cmd line,
it generated so much output that it overran my multi-megabyte screen
buffer, and it took over 3 mins to get it to the boot failure.

Do you want me to reconfigure my console screen so I can catch
the full rdudevdebug output, and post it here?

Comment 36 Harald Hoyer 2009-11-20 08:08:02 UTC
Does it work, if you boot with "rdshell" and mount the cdrom by hand?

"failed: you must specify filesystem type" looks like s.th. is wrong...

Comment 37 Don Dutile (Red Hat) 2009-11-20 18:38:51 UTC
If I remove rhgb & quiet; add debug & rdshell,
it fails to boot & drops into rdshell.

at rdshell, I can mount the cdrom using
    mount /dev/scd0 /

and it comes back:
mount block device /dev/sr0 is write-protected, mounting read-only
ISO 9660 Extensions: Microsoft Joliet Level 3
ISO 9660 Extensions: RRIP_1991A

note, an ls -l of /dev/scd0 shows /dev/scd0 -> sr0

Comment 38 Harald Hoyer 2009-11-23 08:46:07 UTC
ok, before we dig deeper, did you try the final images also?

Comment 39 Don Dutile (Red Hat) 2009-11-23 22:16:14 UTC
I burned desktop_x86_64-20091122.16.iso onto a CD and tried it.

Same results: no root device found.

No /dev/cdrom device file;  only /dev/sr0

Able to mount /dev/sr0 as /

Comment 40 Harald Hoyer 2009-11-24 11:02:06 UTC
ok, then let's try to debug, why udev does not like it.

add "rdshell", boot, be dropped to a shell, run:

# ls  /etc/udev/rules.d /lib/udev/rules.d
# udevadm info --query=all --name=/dev/sr0

you can also attach photos of the screen, instead of retyping it

Comment 41 Fedora Update System 2010-01-26 10:48:43 UTC
dracut-004-4.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/dracut-004-4.fc12

Comment 42 Fedora Update System 2010-01-27 01:05:58 UTC
dracut-004-4.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update dracut'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1088

Comment 43 Fedora Update System 2010-01-28 00:51:17 UTC
dracut-004-4.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.