Bug 533684 - PXE booting of KVM VMs doesn't work
Summary: PXE booting of KVM VMs doesn't work
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: etherboot
Version: 5.4
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Glauber Costa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
 
Reported: 2009-11-08 13:58 UTC by Gordan Bobic
Modified: 2013-02-11 14:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-25 15:32:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gordan Bobic 2009-11-08 13:58:55 UTC
Description of problem:
PXE booting of KVM VMs doesn't work in any mode (private, NAT or bridged), with any of the emulated NICs (RTL, Intel, PCNET, virtio).

DHCP fails with "No IP address" message.

However, once the machine is set up, DHCP works fine on it, so the problem appears to be related to PXE.

Version-Release number of selected component (if applicable):

etherboot-zroms-kvm-5.4.4-10.el5

(also tried:
etherboot-zroms-kvm-5.4.4-10.el5.0.sl
etherboot-zroms-kvm-5.4.4-10.el5.centos
just to make sure)

kmod-kvm-83-105.el5_4.9
kvm-83-105.el5_4.9
libvirt-python-0.6.3-20.1.el5_4
virt-manager-0.6.1-8.el5
python-virtinst-0.400.3-5.el5
libvirt-0.6.3-20.1.el5_4

How reproducible:

Every time.

Steps to Reproduce:
1. Set up a new virtual machine image in virt-manager
2. Set it to PXE boot
3. It will fail to obtain the IP address from an external DHCP server on the network.

Actual results:

PXE booting fails at DHCP address acquisition stage with "No IP address".

Expected results:

PXE booting acquires IP address via DHCP.

Additional info:

DHCP server is another RHEL5 machine (not the host machine).
I saw a similar bug report for FC11, but the FC11 etherboot package source rpm doesn't build on RHEL5.
The problem seems specific to PXE booting as DHCP does work on the guest machine when accessed from the guest OS.

Comment 1 Dor Laor 2009-12-13 08:38:13 UTC
Can you please send the relevant tcpdump of the tap device?

Comment 2 Gordan Bobic 2009-12-30 14:15:20 UTC
I'm not sure what you are referring to here. No tap device is used. I'm using bridged networking and there are no tap interfaces.

tcpdumps from the host:

# tcpdump -i eth0 | grep -i dhcp
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
13:56:51.115525 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
13:56:51.116693 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300
13:56:53.571498 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
13:56:53.572388 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300


# tcpdump -i br0 | grep -i dhcp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 96 bytes
14:02:22.976581 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
14:02:22.997652 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300
14:02:25.299680 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
14:02:25.300533 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300

# tcpdump -i vnet0 | grep -i dhcp
tcpdump: WARNING: vnet0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 96 bytes
14:06:11.098615 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
14:06:11.997217 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300
14:06:13.370596 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 54:52:00:55:b9:47 (oui Unknown), length: 548
14:06:13.371924 IP sentinel.internal.net.bootps > 10.2.252.210.bootpc: BOOTP/DHCP, Reply, length: 300

The following lines are present in host's sysctl.conf:
net.ipv4.ip_forward = 1
net.ipv4.conf.default.proxy_arp = 1
net.ipv4.conf.br0.proxy_arp = 1

When the guest VM is set to PXE boot via DHCP, this doesn't work. However, if the VM boots off a local disk image, DHCP requests for an IP address to work, and the VM's interface gets an IP address assigned as expected.

Comment 3 Dor Laor 2009-12-30 14:39:52 UTC
What's the qemu cmdline?
What's vnet0? Is it the tap device?

Can you turn STP off for the bridge (brctl stp BRIDGE_NAME off) and set forwarding delay to 0.1 (brctl setfd BRIDGE 0.1)

Comment 4 Gordan Bobic 2009-12-31 05:14:36 UTC
qemu command line:
/usr/libexec/qemu-kvm -S -M pc -m 512 -smp 1 -name OpenVZ-OSR1 -uuid 455dc246-42ac-bd20-08e7-301f2fbd24cb -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//OpenVZ-OSR1.pid -boot n -drive file=/var/lib/libvirt/images/OpenVZ-OSR1.img,if=ide,index=0 -drive file=/var/lib/libvirt/images/OpenVZ-OSR.img,if=ide,index=1 -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:55:b9:47,vlan=0,model=e1000 -net tap,fd=15,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-gb

According to that, vnet0 does, indeed, appear to be a tap device.

After setting:

# brctl stp br0 off
# brctl setfd br0 0.1

there is no difference in behaviour. The VM still doesn't PXE boot. It just gets stuck indefinitely looking for an IP:

Search for server (DHCP)....No IP address
.No IP address
.No IP address
.No IP address

Comment 5 Dor Laor 2010-01-03 10:56:17 UTC
That's weird since it does work for others.
What's the version of the pxe server?
Can you retry it using legitimate mac address or at least use 'locally administered address' meaning the first byte should be for example '02' (02:52:00:55:b9:47)

Comment 6 Gordan Bobic 2010-02-03 09:25:51 UTC
Changing the MAC address made no difference.

When you say PXE server, do you mean DHCP server?
dhcp-3.0.5-21.el5_4.1

Comment 8 Lucas Meneghel Rodrigues 2010-02-11 19:11:05 UTC
The bug originator reports that user mode networking PXE boot doesn't work, which I found strange, since it's been working for me under RHEL 5.X flawlessly.

I have found a similar issue (PXE boot failing) on upstream kvm builds, but it's restricted to TAP networking.

I haven't tested PXE boot with TAP networking under RHEL5.X, so this bug might proceed. But with user mode, I am pretty sure it works.

Comment 9 Lucas Meneghel Rodrigues 2010-02-11 19:12:53 UTC
I forgot to mention: Since TAP mode is what most people will want to do when using KVM in the field, it's potentially a serious bug.

Comment 10 Egon Kastelijn 2010-02-23 15:18:12 UTC
Hi,

I was having the exact same problem.
I found a work-around.

My KVM XML definition showed that the guest's network-interface was using the virtio driver.

Here is what I did:
* I copied the guest's XML file.
* Undefined the guest using 'virsh undefine'.
* Replaced the virtio interface in the XML-file with the 'rtl8139' driver.
* Defined the guest using 'virsh define <xmlfile>'.
* Started the guest using 'virst start'.

I hope this is information is useful to you.

kind regards,

  Egon

Comment 11 Roman Valls 2010-03-30 15:54:20 UTC
Hi hit the same issue, this worked for me:

iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT

As can be seen in:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Virtualization_Guide/sect-Virtualization-Network_Configuration-Bridged_networking_with_libvirt.html

Comment 13 RHEL Program Management 2010-08-09 18:29:02 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 14 Lucas Meneghel Rodrigues 2010-08-09 18:44:19 UTC
We've had a similar problem, and ended up discovering it was a problem with the way the bridge is setup. We no longer see PXE boot problems when the bridge is setup like that:

/usr/sbin/brctl setfd [bridge-name] 0
/usr/sbin/brctl stp [bridge-name] off

Would the originator try to setup his bridge like that and let us know the results?

Comment 15 Gordan Bobic 2010-08-09 20:38:27 UTC
How would you set that in the ifcfg file?

Comment 16 Lucas Meneghel Rodrigues 2010-08-09 21:34:14 UTC
Whatever is the script that creates your bridge, you have to make sure it executes the brctl command right after the bridge is created. On comment#4, you mention that you've executed those commands, except that you've used 0.1 instead of 0 on the 'brctl setfd' command. Try again with 0, maybe this might solve the problem.

Comment 17 Didier 2010-11-22 08:33:52 UTC
Reproducible with RHEL6Server 6.0 (qemu-kvm-0.12.1.2-2.113.el6_0.3.x86_64) :

"brctl setfd bridge_intra 0" is required to PXEboot.


Additional info :

- no DHCP request is retrieved on the DHCP-server, hence no IP-address is assigned to the DomU ;
- when executing 'dhcp' at the gPXE-prompt within the DomU, a DHCP-request is retrieved and an IP-address is subsequently assigned to the DomU.

Comment 18 Dor Laor 2010-11-25 13:00:24 UTC
Is it working with setfd 0 ? If that the case this is not a bug at all.

Comment 19 Gordan Bobic 2010-11-25 14:02:13 UTC
I an reasonably confident now that this is actually a duplicate of this Fedora bug:

https://bugzilla.redhat.com/show_bug.cgi?id=586324

Setting DELAY=0 in the bridge's ifcfg configuration file cures the problem.

Comment 20 Dor Laor 2010-11-25 15:04:06 UTC
If that the case it is not a bug since bridges have forwarding delay and packets are paused/dropped until it passes. Can you please close the bug as worksforme?

Comment 21 Gordan Bobic 2010-11-25 15:23:31 UTC
I don't think I have permissions to close bugs. The bridge delay issue on VM hosts is quite real - it should at least be documented in big red letters, since it stops PXE booting working in KVM. But there are different bugs open for that. :)

Comment 22 wesley 2013-02-11 14:10:33 UTC
worked for me! thanks

brctl setfd br0 0
brctl stp br0 off


Note You need to log in before you can comment on or make changes to this bug.