Bug 760883

Summary: Failed to install a guest with pxe method
Product: Red Hat Enterprise Linux 6 Reporter: Alex Jia <ajia>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, dallan, dyuan, mshao, mzhan, rwu, ydu, zhpeng
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.9-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 06:37:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
guest installation screen none

Description Alex Jia 2011-12-07 09:29:31 UTC
Description of problem:
Failed to install a guest with pxe method on libvirt-0.9.8-0rc2.el6.x86_64, it's fine on libvirt-0.9.4-23.el6.x86_64, for details, please the following steps and attachment.

Version-Release number of selected component (if applicable):
# uname -r
2.6.32-220.el6.x86_64

# rpm -q libvirt
libvirt-0.9.8-0rc2.el6.x86_64

# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.209.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. yum install tftp tftp-server
2. setup a dhcp server
3. setup a tftp server
4. start to install a guest
5. check installation process by virt-viewer or virt-manager

1) create a tftpbr1 bridge and active it
# virsh net-list --all
Name                 State      Autostart
-----------------------------------------
default              active     yes       
tftpbr1              active     no        

# virsh net-dumpxml tftpbr1
<network>
  <name>tftpbr1</name>
  <uuid>9a407d07-eda5-3a84-e30e-b68b6a1aa69e</uuid>
  <forward mode='nat'/>
  <bridge name='br1' stp='off' delay='1' />
  <mac address='52:54:00:01:F9:72'/>
  <ip address='192.168.100.1' netmask='255.255.255.0'>
    <tftp root='/var/lib/tftpboot' />
    <dhcp>
      <range start='192.168.100.2' end='192.168.100.254' />
      <bootp file='pxelinux.0' />
    </dhcp>
  </ip>
</network>

2) let guest point to the tftpbr1
# virsh dumpxml vr-rhel6u2-x86_64-kvm
<domain type='kvm'>
  <name>vr-rhel6u2-x86_64-kvm</name>
  <uuid>1bb040ab-2adb-880f-3740-64919c652447</uuid>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.2.0'>hvm</type>
    <boot dev='network'/>
  </os>
  ......
    <interface type='network'>
      <mac address='52:54:00:4c:66:91'/>
      <source network='tftpbr1'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
  ......

3) setup a tftp server
# wget -P /var/lib/tftpboot
http://fileshare.englab.nay.redhat.com/pub/redhat/rhel/rel-eng/RHEL-6.2/RHEL6.2-Snapshot-1/x86_64/os/images/pxeboot/vmlinuz

# wget -P /var/lib/tftpboot
http://fileshare.englab.nay.redhat.com/pub/redhat/rhel/rel-eng/RHEL-6.2/RHEL6.2-Snapshot-1/x86_64/os/images/pxeboot/initrd.img

# cat /var/lib/tftpboot/pxelinux.cfg/default
DISPLAY boot.txt
DEFAULT vr-rhel6u2-x86_64-kvm
LABEL vr-rhel6u2-x86_64-kvm
     kernel vmlinuz
     append initrd=initrd.img ks=http://home.englab.nay.redhat.com/~nzhang/http/ks-rhel6u2-x86_64.cfg

PROMPT 1
TIMEOUT 10


4) start to install guest
# virsh start vr-rhel6u2-x86_64-kvm


5) check guest installation screen
# virt-viewer vr-rhel6u2-x86_64-kvm
  
Actual results:
guest can't get ip address.

Expected results:
can successfully install a guest with pxe method.

Additional info:
The result is the same with upstream by comparing tag between v0.9.8-rc2 and v0.9.7, tag v0.9.7 is fine, tag v0.9.8-rc2 is failed, so it should be a regression bug.

Comment 1 Alex Jia 2011-12-07 09:32:58 UTC
Created attachment 541773 [details]
guest installation screen

Comment 3 Jiri Denemark 2011-12-08 16:19:44 UTC
There is a regression somewhere in the code used for starting the virtual network. If the network is started with 0.9.8, PXE boot doesn't work. If the network was started with 0.9.7 and then libvirtd was upgraded and restarted, PXE works just fine. Options passed to dnsmasq are exactly the same. Iptables rules as well.

Comment 4 Alex Jia 2011-12-08 16:37:22 UTC
(In reply to comment #3)
> PXE works just fine. Options passed to dnsmasq are exactly the same. Iptables
> rules as well.

Yeah, I have ever compared dnsmasq process argument and iptables rules, and I haven't also found they are different.

Comment 5 Jiri Denemark 2011-12-09 11:21:28 UTC
The culprit is

c1df2c14b590b3d68b707aa4f3a570f95a6bc548 is the first bad commit
commit c1df2c14b590b3d68b707aa4f3a570f95a6bc548
Author: Daniel P. Berrange <berrange>
Date:   Wed Nov 2 13:05:27 2011 +0000

    Remove usage of brctl command line tool
    
    Convert the virNetDevBridgeSetSTP and virNetDevBridgeSetSTPDelay
    to use ioctls instead of spawning brctl.
    
    Implement the virNetDevBridgeGetSTP and virNetDevBridgeGetSTPDelay
    methods which were declared in the header but never existed
    
    * src/util/bridge.c: Convert to use bridge ioctls instead of brctl

There is a difference between a bridge created before and after this commit. With this commit, libvirt creates the bridge with STP on.

Comment 6 Jiri Denemark 2011-12-09 11:46:16 UTC
Actually, it's STP forwarding delay, which is wrong...

wrong config:
virbr1
 bridge id		8000.fe540016c61a
 designated root	8000.fe540016c61a
 root port		   0			path cost		   0
 max age		  19.99			bridge max age		  19.99
 hello time		   1.99			bridge hello time	   1.99
 forward delay		  14.99			bridge forward delay	  14.99
 ageing time		 299.98
 hello timer		   0.77			tcn timer		   0.00
 topology change timer	   0.00			gc timer		 223.54
 flags			

good config:

virbr1
 bridge id		8000.fe540016c61a
 designated root	8000.fe540016c61a
 root port		   0			path cost		   0
 max age		  19.99			bridge max age		  19.99
 hello time		   1.99			bridge hello time	   1.99
 forward delay		   0.00			bridge forward delay	   0.00
 ageing time		 299.98
 hello timer		   1.42			tcn timer		   0.00
 topology change timer	   0.00			gc timer		 160.69
 flags

Comment 7 Jiri Denemark 2011-12-09 12:08:34 UTC
Patch sent upstream: https://www.redhat.com/archives/libvir-list/2011-December/msg00445.html

Comment 8 Jiri Denemark 2011-12-09 13:10:03 UTC
Patch committed upstream as v0.9.8-20-g2d5046d:

commit 2d5046d31f4f5c961fc4aa6b415a00bb9eadae2b
Author: Jiri Denemark <jdenemar>
Date:   Fri Dec 9 13:04:14 2011 +0100

    bridge: Fix forward delay APIs
    
    Due to copy&paste error in c1df2c14b590b3d68b707aa4f3a570f95a6bc548,
    virNetDevBridge[SG]etSTPDelay APIs were accessing wrong file.

Comment 9 zhpeng 2011-12-13 10:00:33 UTC
I can reproduce this with libvirt-0.9.8-1.el6.x86_64

Comment 11 xhu 2012-01-10 08:34:51 UTC
Reproduce it with libvirt-0.9.8-0rc2.el6.x86_64.
Verify it with libvirt-0.9.9-1.el6 and it passed.

Comment 13 errata-xmlrpc 2012-06-20 06:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html