Bug 568325 - Cannot boot virtual machine with root partition on iscsi with virtio network
Cannot boot virtual machine with root partition on iscsi with virtio network
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mkinitrd (Show other bugs)
5.5
All Linux
high Severity high
: rc
: ---
Assigned To: Peter Jones
Release Test Team
: Regression
: 547670 645719 645831 (view as bug list)
Depends On:
Blocks: 640580 Rhel5KvmTier2 Rhel5KvmTier1 690099
  Show dependency treegraph
 
Reported: 2010-02-25 07:24 EST by Pavel Holica
Modified: 2013-01-09 17:19 EST (History)
19 users (show)

See Also:
Fixed In Version: mkinitrd-5.1.19.6-66.el5
Doc Type: Bug Fix
Doc Text:
Previously, virtual machines using iscsi could not boot correctly after installation. With this update booting works correctly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-13 19:05:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot process with error (22.63 KB, image/png)
2010-02-25 07:24 EST, Pavel Holica
no flags Details
another screenshot (27.35 KB, image/png)
2010-02-25 07:25 EST, Pavel Holica
no flags Details
tgz dump of boot partition after RHEL-5.6 install (3.72 MB, application/octet-stream)
2010-09-30 11:06 EDT, Eduardo Habkost
no flags Details

  None (edit)
Description Pavel Holica 2010-02-25 07:24:18 EST
Created attachment 396262 [details]
boot process with error

Description of problem:
When I have /boot on virtio drive and / on iscsi target, boot crashes with
ERROR: Interface setup failed: pupSetupInterface failed: get link - 19: No such device.
When i use default network model (not virtio), everything works fine, and system boots.
This happened to me on RHEL 5.4 and RHEL 5.5, didn't happen on Fedora 12 x86_64.

Version-Release number of selected component (if applicable):
RHEL5.5-Server-20100211.0 x86_64
kvm-83-157.el5.x86_64

How reproducible:
always

Steps to Reproduce:
1. prepare iscsi target
2. start installation using command:
/usr/libexec/qemu-kvm -m 1024 -net tap -net nic,model=virtio -drive if=virtio,file=boot.img,boot=on -cdrom path_to_iso -boot d
3. when asked, add prepared iscsi target and use auto layouting
4. complete installation shut down machine
5. start virtual machine using command:
/usr/libexec/qemu-kvm -m 1024 -net tap -net nic,model=virtio -drive if=virtio,file=boot.img,boot=on -cdrom path_to_iso -boot c
  
Actual results:
System fails to boot (failure screenshot in attachment).

Expected results:
System should boot successfully.

Additional info:
System installed into virtual machine was RHEL5.5-Client-20100217.0 x86_64.
This shouldn't be problem of the OS installed though, because as I mentioned in description, identical system booted successfully on Fedora 12 x86_64.
Comment 1 Pavel Holica 2010-02-25 07:25:45 EST
Created attachment 396263 [details]
another screenshot
Comment 3 yeylon@redhat.com 2010-02-25 15:00:26 EST
isn't it a KVM bug?
Comment 4 Pavel Holica 2010-03-03 03:14:28 EST
Yes, it seems like KVM bug (see component set as KVM).
Comment 7 Eduardo Habkost 2010-09-28 15:09:27 EDT
I just tested it using a RHEL-5.4 guest (I am still downloading a RHEL5.5 ISO) and the guest booted sucessfully and mounted / on iscsi, using the following command-line:
/usr/libexec/qemu-kvm -m 1024 -net tap -net nic,model=virtio -drive if=virtio,file=/mnt/common/images/rhel6-iscsi.img,boot=on -cdrom /dev/cdrom -boot dc -vnc :2

On the other hand, I could reproduce a similar boot failure easily by changing the NIC model to rtl8139:
/usr/libexec/qemu-kvm -m 1024 -net tap -net nic,model=rtl8139 -drive if=virtio,file=/mnt/common/images/rhel6-iscsi.img,boot=on -cdrom /dev/cdrom -boot dc -vnc :2

This is suspicious:
> When i use default network model (not virtio), everything works fine, and
system boots.

You are not supposed to be able to boot the guest if you change the NIC model, as the NIC driver needs to be inside the initrd image. Are you sure the guest was not installed using the default NIC model instead of virtio?


In case you can still reproduce it, can you attach a copy of the guest initrd, so we can check if everything required to boot the guest is present?
Comment 8 Eduardo Habkost 2010-09-29 14:41:16 EDT
I could reproduce the bug installing RHEL5.5-Client-20100322.0-x86_64 as a guest.

I will check what can be wrong on the initrd image installed by the RHEL5.5 guest that makes it unable to initialize the network, as the RHEL-5.4 worked as expected.
Comment 9 Eduardo Habkost 2010-09-29 16:33:43 EDT
Found the cause: the initrd generated by RHEL-5.5 doesn't load the virtio_ring and virtio_pci modules before loading virtio_net and initializing the network, hence the virtio_net device is not initialized.

This is the diff between the RHEL5.4 and RHEL5.5 initrd /init files:


--- i54/initrd/init     2010-09-29 17:03:12.000000000 -0300
+++ i55/initrd/init     2010-09-29 17:24:01.000000000 -0300
@@ -52,14 +52,6 @@
 insmod /lib/jbd.ko
 echo "Loading ext3.ko module"
 insmod /lib/ext3.ko
-echo "Loading virtio.ko module"
-insmod /lib/virtio.ko 
-echo "Loading virtio_ring.ko module"
-insmod /lib/virtio_ring.ko 
-echo "Loading virtio_pci.ko module"
-insmod /lib/virtio_pci.ko 
-echo "Loading virtio_blk.ko module"
-insmod /lib/virtio_blk.ko 
 echo "Loading scsi_mod.ko module"
 insmod /lib/scsi_mod.ko
 echo "Loading sd_mod.ko module"
@@ -74,6 +66,8 @@
 insmod /lib/libiscsi_tcp.ko
 echo "Loading iscsi_tcp.ko module"
 insmod /lib/iscsi_tcp.ko
+echo "Loading virtio.ko module"
+insmod /lib/virtio.ko 
 echo "Loading virtio_net.ko module"
 insmod /lib/virtio_net.ko
 echo Bringing up eth0
@@ -81,7 +75,13 @@
 network --device eth0 --bootproto dhcp
 rename /var/lib/dhclient/dhclient.leases /var/lib/dhclient/dhclient-eth0.leases
 echo Attaching to iSCSI storage
-/bin/iscsistart -t iqn.2001-04.net.raisama:my-first-iscsi-vol -i iqn.1994-05.com.rhel:01.502165             -g 1 -a 172.31.74.35                
+/bin/iscsistart -t iqn.2001-04.net.raisama:my-first-iscsi-vol -i iqn.1994-05.com.rhel:01.a262aa             -g 1 -a 172.31.74.35                
+echo "Loading virtio_ring.ko module"
+insmod /lib/virtio_ring.ko 
+echo "Loading virtio_pci.ko module"
+insmod /lib/virtio_pci.ko 
+echo "Loading virtio_blk.ko module"
+insmod /lib/virtio_blk.ko 
 echo "Loading libata.ko module"
 insmod /lib/libata.ko
 echo "Loading ata_piix.ko module"



If I add the missing virtio_ring and virtio_pci insmod lines to the RHEL5.5 initrd, iscsi is initialized sucessfully by initrd.

I can't see how the same guest could have worked under a Fedora 12 host (as claimed on comment #0), as the guest initrd is not loading the required modules to initialize the virtio network interface.
Comment 10 Eduardo Habkost 2010-09-29 16:40:36 EDT
Being a guest but, this piece of information is important: RHEL iso being used to install the guest: RHEL5.5-Client-20100322.0-x86_64.

I don't know if the latest RHEL-5.6 snapshot still has the bug. I will check that soon.
Comment 11 Eduardo Habkost 2010-09-30 10:52:00 EDT
I could reproduce it on the latest RHEL-5.6 snapshot on a RHEL6 host, using the following command:

virt-install -l http://download.devel.redhat.com/rel-eng/RHEL5.6-Server-20100928.0/5/x86_64/os/ --name rhel56-iscsi --disk /root/iscsi-boot-56.img,size=1 --prompt --vnclisten=0.0.0.0 --vnc --vncport=5903 --os-type=linux --os-variant=rhel5 --network bridge=eth0,model=virtio -r 1024
Comment 12 Eduardo Habkost 2010-09-30 11:06:14 EDT
Created attachment 450783 [details]
tgz dump of boot partition after RHEL-5.6 install

Attaching initrd file generated by the RHEL-5.6 guest install.
Comment 14 Ondrej Hudlicky 2010-10-25 14:21:00 EDT
*** Bug 645831 has been marked as a duplicate of this bug. ***
Comment 16 Peter Jones 2010-10-26 13:56:15 EDT
The module dependencies for virtio_net don't show a dependency on virtio_pci:

pjones4:~/Download/tmp$ modprobe -d $PWD --set-version 2.6.18-228.el5.kpq2 --show-depends virtio_net
insmod /home/pjones/Download/tmp/lib/modules/2.6.18-228.el5.kpq2/kernel/drivers/virtio/virtio.ko 
insmod /home/pjones/Download/tmp/lib/modules/2.6.18-228.el5.kpq2/kernel/drivers/net/virtio_net.ko 
pjones4:~/Download/tmp$ modprobe -d $PWD --set-version 2.6.18-228.el5.kpq2 --show-depends virtio
insmod /home/pjones/Download/tmp/lib/modules/2.6.18-228.el5.kpq2/kernel/drivers/virtio/virtio.ko 
pjones4:~/Download/tmp$ 

Without a kernel dependency there, there is no strict ordering requirement
expressed to mkinitrd, and it has no way of knowing if one module must be
loaded before any other. If there's a dependency here, the modules must say
so.

One potential workaround is to run mkinitrd with "--with virtio_pci".
Comment 18 Michael S. Tsirkin 2010-10-26 14:08:45 EDT
These modules can be loaded in any order.

There's no dependency on virtio-pci because virtio-blk
and virtio-net are devices on the virtio bus.

virtio-pci is normally loaded by hotplug and that
creates a virtio-blk device on the virtio bus.
Comment 19 Herbert Xu 2010-10-28 11:52:19 EDT
I agree with Michael.  virtio-pci is just like a USB host controller.  There is never going to be a dependency on it by actual device drivers.  It is up to the mkinitrd tool to deal with this by always include modules like that if they're to support booting of these classes of devices.
Comment 21 Peter Jones 2010-10-28 12:56:47 EDT
Well, then somebody needs to supply me with a programmatic way to determine when it should be loaded.  Currently for storage we check the device paths of the slaves/* devices for storage devices we're installing on to.  Something similar would work here, but obviously that hack is storage-only.
Comment 23 Peter Jones 2010-10-28 13:22:26 EDT
Can somebody please test with the packages at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2859104 and see if this hack fixes the problem?
Comment 24 Marian Ganisin 2010-11-01 08:13:10 EDT
(In reply to comment #23)
> Can somebody please test with the packages at
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2859104 and see if this
> hack fixes the problem?

Same result, but I found the fix. Ramdisk created by this version of mkinitrd contains all important bits, virtio_pci (and virtio_ring before virtio_pci) has to be loaded right before virtio_net. So the 'insmod' sequence is:

echo "Loading virtio.ko module"
insmod /lib/virtio.ko 
echo "Loading virtio_ring.ko module"
insmod /lib/virtio_ring.ko 
echo "Loading virtio_pci.ko module"
insmod /lib/virtio_pci.ko 
echo "Loading virtio_net.ko module"
insmod /lib/virtio_net.ko
....
(network configuration is starting here)
Comment 25 Eduardo Habkost 2010-11-01 14:42:57 EDT
(In reply to comment #21)
> Well, then somebody needs to supply me with a programmatic way to determine
> when it should be loaded.  Currently for storage we check the device paths of
> the slaves/* devices for storage devices we're installing on to.  Something
> similar would work here, but obviously that hack is storage-only.

I went to check the /sys/ structure for virtio-pci and it looks like there is no information there indicating that the devices inside /sys/devices/virtio-pci actually correspond to PCI devices that are handled the virtio-pci module.

So, it looks like that all we can do with the current ABI is to check if the device backing the network interface is in /sys/devices/virtio-pci, and manually add it to the list of required modules. Something like the pseudocode below should work:

require_device(dev)
{
  driver=readlink("$dev/driver")
  module=readlink("$driver/module")
  required_modules = $module;
  if (dev ~= "devices/virtio-pci/.*")
    required_modules += "virtio-pci";
  return $required_modules;
}


interface_for_iscsi=eth0
netdev=readlink("/sys/class/net/$interface_for_iscsi/device")
modules_for_iscsi += require_device($netdev)
Comment 26 Peter Jones 2010-11-16 11:53:54 EST
What's "readlink /sys/class/net/$device/device/bus" going to show me here?
Comment 27 Pavel Holica 2010-11-16 13:05:33 EST
In both anaconda and on installed system:
../../../bus/virtio
Comment 28 Peter Jones 2010-11-16 14:02:17 EST
Alright, that should work then.  Let's give this another go with the packages built at:

https://brewweb.devel.redhat.com/taskinfo?taskID=2898028
Comment 29 Eduardo Habkost 2010-11-18 10:03:10 EST
I just tested it, and it didn't work because mkinitrd enters the "if [ -f /sys/class/net/$device/device/modalias ]" branch.

The following additional change, however, works:


--- /sbin/mkinitrd      2010-11-16 14:00:12.000000000 -0500
+++ /sbin/mkinitrd.hack 2010-11-18 10:48:17.000000000 -0500
@@ -479,12 +479,12 @@
            done
        elif [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "xen" ]; then
            findmodule xennet # FIXME: hack for xennet sucking
-       elif [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "virtio" ]; then
-            findmodule virtio_pci # of course virtio sucks the same way xennet
-            findmodule virtio_net # does...
         else
            findmodule $(ethtool -i $device | awk '/^driver:/ { print $2 }')
        fi
+       if [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "virtio" ]; then
+            findmodule virtio_pci # of course virtio sucks the same way xennet
+        fi
     done
 }
Comment 30 Pavel Holica 2010-11-18 10:43:25 EST
While reproducing bug and testing fix from comment 29, we were experiencing random behaviour.

We were sometimes able to boot:
RHEL-5-Server-U5 x86_64 (physical machine)
RHEL5.6-Server-20101110.0 x86_64 (virtual machine)
with root on iscsi where initrd contained correct sequence:
grep virtio init | grep -v ^echo
insmod /lib/virtio.ko
insmod /lib/virtio_ring.ko
insmod /lib/virtio_pci.ko
insmod /lib/virtio_blk.ko
insmod /lib/virtio_net.ko

But most of time not:
RHEL-5-Server-U5 x86_64 (physical machine)
RHEL5.6-Server-20101110.0 x86_64 (virtual machine)
with root on iscsi where initrd contained correct sequence:
grep virtio init | grep -v ^echo
insmod /lib/virtio.ko
insmod /lib/virtio_net.ko
insmod /lib/virtio_ring.ko
insmod /lib/virtio_pci.ko
insmod /lib/virtio_blk.ko

This behaviour was observed on same physical machines with same (both kickstart and manual) installations.
Comment 31 Eduardo Habkost 2010-11-18 11:55:57 EST
(In reply to comment #30)

Both sequences are supposed to work. If loading virtio_pci after virtio_net doesn't work, it is either a virtio_net or virtio_pci bug.

However, if loading virtio_pci first is safer, doing it looks better. Diff replacing the one on comment #29 is below.

I didn't get any random behavior by mkinitrd after doing the change below, and virtio_pci seems to be always added before virtio_net. After applying the change below, I am always seeing this on the mkinitrd -v output:

Adding module virtio
Adding module virtio_ring
Adding module virtio_pci
Adding module virtio_net


--- /sbin/mkinitrd      2010-11-16 14:00:12.000000000 -0500
+++ /sbin/mkinitrd.hack 2010-11-18 12:41:29.000000000 -0500
@@ -471,6 +471,9 @@
                 continue ;;
             *) handleddevices="$handleddevices $device" ;;
         esac
+       if [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "virtio" ]; then
+            findmodule virtio_pci # of course virtio sucks the same way xennet
+        fi
         if [ -f /sys/class/net/$device/device/modalias ]; then
             modalias=$(cat /sys/class/net/$device/device/modalias)
            moduledep $modalias
@@ -479,9 +482,6 @@
            done
        elif [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "xen" ]; then
            findmodule xennet # FIXME: hack for xennet sucking
-       elif [ "$(basename $(readlink /sys/class/net/$device/device/bus) 2>/dev/null)" = "virtio" ]; then
-            findmodule virtio_pci # of course virtio sucks the same way xennet
-            findmodule virtio_net # does...
         else
            findmodule $(ethtool -i $device | awk '/^driver:/ { print $2 }')
        fi
Comment 32 Peter Jones 2010-11-18 13:15:26 EST
Should be fixed in mkinitrd-5.1.19.6-66.el5 then.  Here's the brew link: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2902249 .
Comment 33 Pavel Holica 2010-11-19 04:18:41 EST
Thanks, packages provided from comment 32 seem to fix this issue.

I've tried several installations with these packages and all have successfully booted.
Comment 35 Alexander Todorov 2010-11-30 03:06:27 EST
mkinitrd-5.1.19.6-66.el5 from comment #32 is included in snapshot #3 (-1124.1). Comment #33 says the bug is fixed. Moving to VERIFIED.
Comment 36 Pavel Holica 2010-11-30 03:31:03 EST
Bug hasn't been properly verified. I'll verify it today.
Comment 37 Pavel Holica 2010-11-30 05:11:50 EST
Ok, performed another several installations of x86_64 RHEL5.6-Server-20101124.1 and didn't hit this bug.

Moving to verified.
Comment 38 Florian Nadge 2011-01-03 10:47:53 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, virtual machines using iscsi could not boot correctly after installation. With this update booting works correctly.
Comment 40 errata-xmlrpc 2011-01-13 19:05:51 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0110.html
Comment 41 Michael S. Tsirkin 2011-01-17 06:01:05 EST
*** Bug 645719 has been marked as a duplicate of this bug. ***
Comment 42 Markus Armbruster 2011-03-18 03:58:54 EDT
*** Bug 547670 has been marked as a duplicate of this bug. ***
Comment 43 Miroslav Rezanina 2011-03-23 07:16:18 EDT
*** Bug 547670 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.