1214891 – VM instances do not get DHCP addr on boot

Bug 1214891 - VM instances do not get DHCP addr on boot

Summary: VM instances do not get DHCP addr on boot

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	6.0 (Juno)
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	Installer
Assignee:	Jason Guiditta
QA Contact:	Shai Revivo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1171850
TreeView+	depends on / blocked

Reported:	2015-04-23 19:29 UTC by Chris Dearborn
Modified:	2016-08-04 15:23 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-04 15:23:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Chris Dearborn 2015-04-23 19:29:51 UTC

Description of problem:
VM instances do not get a DHCP address when they are created or rebooted on a compute node that has not been rebooted following initial installation.

If the compute node is on hardware, it appears that rebooting the compute node once causes the VMs on that compute to start getting DHCP addresses.
If the compute node is a VM (VmWare Workstation), then several reboots are required for the VM instances to start getting DHCP addresses.

Version-Release number of selected component (if applicable):
OSP 6

How reproducible:
On an OSP cluster, add a new compute node running puppet agent -t to install
Try to bring up a VM instance on that compute node
The VM instance comes up, but fails to get a DHCP address.
Reboot the compute one or more times, and eventually the VM instances will get DHCP addresses when they boot.

Actual results:
VM instances do not get a DHCP IP on creation/reboot.

Expected results:
VM instances should get a DHCP IP on creation/reboot.

Additional info:
I have tried using tcpdump to see where the DHCP response is lost.  It makes it back to br-tenant, but no further then that, so it looks like OVS is dropping it on the floor.

Comment 1 Jason Guiditta 2015-04-29 13:58:57 UTC

Can we get some direction how to debug this issue?

Comment 6 Chris Dearborn 2015-05-05 14:30:26 UTC

Internally, there was some tribal knowledge that claimed that adding PROMISC=yes to the ifcfg file for br-tenant on the compute nodes and then rebooting the compute nodes fixed the problem when running on hardware.  I have worked a fair amount on trying to debug this issue on VMs running in VMware Workstation, and here is what I've found:

- Adding PROMISC=yes to an ifcfg file is completely ignored.  The interface is not put into promiscuous mode.
- Manually forcing br-tenant into promiscuous mode via ifconfig does not fix the problem.
- I tried rebooting the compute node after adding PROMISC=yes, and that didn't work
- I tried shutting down the compute, then rebooting the controllers, clustercheck, pcs status, then booting up the compute.  Didn't work.
- After several reboots of the compute node, DHCP randomly started to work.
- I tried 3 reboots of the compute, and DHCP continued to work
- I removed PROMISC=yes from br-tenant & rebooted.  Continued to work
- tried 2 more reboots, Continued to work.
- unable to break it.

Once the problem corrects itself, it appears to remain fixed.  It is easy to reproduce on both hardware and on VMs following the above instructions.  On hardware, a simple reboot of the compute nodes fixes the problem.  On VMware Workstation VMs, it appears that a random number of reboots of the compute nodes are required to fix the problem.

Comment 7 Chris Dearborn 2015-05-12 15:46:44 UTC

I verified that this problem still exists in A3.

Comment 9 Chris Dearborn 2015-06-11 15:42:20 UTC

We were able to reproduce this using A2 bits and Dell 13g hardware.

Reboot of the compute nodes allowed the VMs to start getting DHCP addresses.

Comment 10 Crag Wolfe 2015-06-11 18:21:49 UTC

Please attach yaml for controllers and compute nodes, along with nova and neutron commands used which will help us properly reproduce the issue.  Thanks in advance.

Comment 11 Assaf Muller 2015-06-11 18:23:13 UTC

Can we get contents of:

/var/lib/neutron/*
/etc/neutron/*
/var/log/neutron/*

On all of your controllers, and on an affected compute node (Before a reboot that resolves the issue).

Also, on an affected compute, the output of:
ovs-vsctl show
ovs-ofctl dump-flows br-int
ovs-ofctl dump-flows br-tenant (I understand you're using VLANs segmentation)

neutron net-show <network the VMs are connected to>


> I have tried using tcpdump to see where the DHCP response is lost.  It makes it back to br-tenant, but no further then that, so it looks like OVS is dropping it on the floor.

One additional thing you can try is to 'ip link set dev %s up' where %s is the name of the tap device the of the VM. Then, tcpdump it and see if it's seeing DHCP responses.

Comment 12 Chris Dearborn 2015-06-23 15:47:16 UTC

Handing this needinfo off to Wayne, since he is in the process of doing a fresh install and will hopefully be able to supply the requested information.

Comment 13 Wayne Allen 2015-07-10 16:19:57 UTC

I have not been able to reproduce this problem. All the vm's seem able to get addresses without restarting anything after install. If we see the problem again I will try to provide more data.

Comment 15 Jason Guiditta 2016-08-04 15:23:38 UTC

As this has not been reproduced for over a year, I am going to mark it NOTABUG and chalk it up to a config issue or something unrelated that caused this as a side effect.

Note You need to log in before you can comment on or make changes to this bug.