Bug 645719

Summary: [NetApp 5.7 bug] SANboot fails with virtio NIC on guest vm
Product: Red Hat Enterprise Linux 5 Reporter: prashant singh <prashant.s>
Component: kvmAssignee: Michael S. Tsirkin <mst>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.6CC: andriusb, coughlan, ehabkost, mkenneth, mst, prashant.s, tburke, virt-maint, xdl-redhat-bugzilla, ykaul
Target Milestone: rcKeywords: OtherQA
Target Release: 5.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-17 11:01:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580949    
Attachments:
Description Flags
tcpdump of vnet0
none
initrd image of guest os none

Description prashant singh 2010-10-22 09:55:03 UTC
Description of problem:
When installing RHEL5 guest VM on KVM hypervisor, I configure a single NIC and that is of type virtio.
Now if I want to do sanboot (i.e. root on multipath) installation with iscsi protocol, I am able to install but the host hangs while rebooting. It fails to setup the NIC interface and hence fails to login to the target to access the root file system.
It works fine with RHEL6 guests.

Version-Release number of selected component (if applicable):
Hypervisor- KVM (RHEL5.5 or RHEL6)
Guest- RHEL5.5 (or RHEL5.4)

How reproducible:
Everytime

Steps to Reproduce:
1. Install RHEL5.5. During install configure virtio NIC and configure iSCSI storage
2. Install root file system on iSCSI storage
3. Complete the install as usual and reboot.
  
Actual results:
Guest hangs while trying to setup the interface and hence fails to login to the iSCSI target

Expected results:
Guest should login to the target successfully.

Additional info:

Comment 1 Andrius Benokraitis 2010-10-26 13:49:21 UTC
NetApp: Since you are seeing this on both the RHEL 6 and RHEL 5 hypervisors, going to set this to RHEL 6, and if this needs to be ported to RHEL 5 we can decide that then.

Comment 7 Andrius Benokraitis 2010-11-11 23:32:13 UTC
NetApp: is this blocking any 5.6 certs? We are already pretty late in the 5.6 devel cycle.

Comment 9 Michael S. Tsirkin 2010-11-15 13:03:18 UTC
Need the following info:
1. is the system set up with NAT?
   If yes how is dnsmasq configured?
   Pls supply output of
   ps -ef | grep dnsmasq
2. Does the problem happen in a transparent bridge setup
   (no NAT)?
3. please supply output of
   tcpdump -i tap0
   on host where tap0 is replaced with tap device for the guest

Comment 10 Michael S. Tsirkin 2010-11-16 11:42:14 UTC
additionally please supply the output of
'brctl show' and  'brctl showstp  <bridgename>'
where <bridgename> is the bridge connected to the specific
guest.

Comment 12 Bill Burns 2010-11-17 20:49:46 UTC
Please answer requests made in comments 9 & 10.

Comment 13 prashant singh 2010-11-18 11:31:45 UTC
Created attachment 461275 [details]
tcpdump of vnet0

(In reply to comment #9)
> Need the following info:
> 1. is the system set up with NAT?
>    If yes how is dnsmasq configured?
>    Pls supply output of
>    ps -ef | grep dnsmasq
NO, I am using bridged networking for the guests. I havent tried on NAT setup yet.
> 2. Does the problem happen in a transparent bridge setup
>    (no NAT)?
Yes.
> 3. please supply output of
>    tcpdump -i tap0
>    on host where tap0 is replaced with tap device for the guest
Since I am using Bridged networking, there is a corresponding device vnet<x> for the guest.
Please find attached the output of `tcpdump -i vnet0`

(In reply to comment #10)
> additionally please supply the output of
> 'brctl show' and  'brctl showstp  <bridgename>'
> where <bridgename> is the bridge connected to the specific
> guest.
Output of `brctl show`
# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.001a64e5fb98       no              eth0
                                                        vnet2
virbr0          8000.000000000000       yes

Output of `brctl showstp br0`
# brctl showstp br0
br0
 bridge id              8000.001a64e5fb98
 designated root        8000.001a64e5fb98
 root port                 0                    path cost                  0
 max age                  19.99                 bridge max age            19.99
 hello time                1.99                 bridge hello time          1.99
 forward delay            14.99                 bridge forward delay      14.99
 ageing time             299.95
 hello timer               0.75                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                   1.75
 hash elasticity           4                    hash max                 512
 mc last member count      2                    mc init query count        2
 mc router                 1                    mc snooping                1
 mc last member timer      0.99                 mc membership timer      259.96
 mc querier timer        254.96                 mc query interval        124.98
 mc response interval      9.99                 mc init query interval    31.24
 flags


eth0 (0)
 port id                0000                    state                forwarding
 designated root        8000.001a64e5fb98       path cost                 19
 designated bridge      8000.001a64e5fb98       message age timer          0.00
 designated port        8001                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 mc router                 1
 flags

vnet2 (0)
 port id                0000                    state                forwarding
 designated root        8000.001a64e5fb98       path cost                100
 designated bridge      8000.001a64e5fb98       message age timer          0.00
 designated port        8004                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 mc router                 1
 flags

(In reply to comment #12)
> Please answer requests made in comments 9 & 10.

Done :)

Also, I would just like to mention it again that the issue is seen only when guest networking driver is configured as virtio.

Comment 14 Michael S. Tsirkin 2010-11-18 11:45:04 UTC
I think I see the problem here:

You have enabled forwarding delay on the bridge, so it takes time
for the bridge to start forwarding packets.
Meanwhile iscsi times out.

On my setup created by libvirt I see:
 forward delay             0.00                 bridge forward delay       0.00

How did you create the bridge?

Comment 15 prashant singh 2010-11-18 11:56:14 UTC
(In reply to comment #14)
> I think I see the problem here:
> 
> You have enabled forwarding delay on the bridge, so it takes time
> for the bridge to start forwarding packets.
> Meanwhile iscsi times out.
> 
> On my setup created by libvirt I see:
>  forward delay             0.00                 bridge forward delay       0.00
> 
> How did you create the bridge?

I created the bridge by editing the files under /etc/sysconfig/network-scripts/.
Is there any recommended way of creating a bridge?

But if forwarding delay on the bridge is the cause, then I should be seeing this issue independent of what network driver I use on the guest, please correct me if I am wrong.

Comment 16 Michael S. Tsirkin 2010-11-18 15:36:22 UTC
> I created the bridge by editing the files under
/etc/sysconfig/network-scripts/.
> Is there any recommended way of creating a bridge?

Some hints on libvirt wiki:
http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29

> But if forwarding delay on the bridge is the cause, then I should be seeing
this issue independent of what network driver I use on the guest, please
correct me if I am wrong.

Some other issue could be masking this.
Could you try with DELAY=0 please?

Comment 17 Michael S. Tsirkin 2010-11-18 15:48:54 UTC
Also let's verify initrd loads virtio.
Could you attach the initrd file that got generated please?

Comment 18 Michael S. Tsirkin 2010-11-18 15:52:27 UTC
this is to verify we are
not seeing a duplicate of
https://bugzilla.redhat.com/show_bug.cgi?id=568325

Comment 19 prashant singh 2010-11-19 14:48:35 UTC
(In reply to comment #16)
> > I created the bridge by editing the files under
> /etc/sysconfig/network-scripts/.
> > Is there any recommended way of creating a bridge?
> 
> Some hints on libvirt wiki:
> http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29
> 
> > But if forwarding delay on the bridge is the cause, then I should be seeing
> this issue independent of what network driver I use on the guest, please
> correct me if I am wrong.
> 
> Some other issue could be masking this.
> Could you try with DELAY=0 please?

I tried with DELAY=0, but I am hitting the issue again.

(In reply to comment #17)
> Also let's verify initrd loads virtio.
> Could you attach the initrd file that got generated please?

How to do this? 
The guest hangs while booting. So I cant access the initrd through guest.
Also the boot partition for the guest resides on a file in the hypervisor,
and since that file is partitioned from guest, I cant mount it on the hypervisor. And therefore I cant access the initrd through the hypervisor also.
Is there some to access the initrd?
If not I will install the guest OS again but this time without creating partition for /boot.

Comment 20 prashant singh 2010-11-22 11:32:54 UTC
Created attachment 461978 [details]
initrd image of guest os

Please find attached the initrd image of guest operating system.

Comment 21 Andrius Benokraitis 2010-11-23 15:47:01 UTC
At this point in RHEL 5.6, this won't make it. Still debugging. 

NetApp: please keep up with narrowing this issue down in the meantime.

Comment 22 Michael S. Tsirkin 2011-01-17 11:01:05 UTC
Yes, the initrd lacks virtio-pci.ko and virtio-ring.ko
So it's a duplicate of 
https://bugzilla.redhat.com/show_bug.cgi?id=568325
Closing.

*** This bug has been marked as a duplicate of bug 568325 ***