Bug 1034822 - nova instances not acquiring dhcp IP address during launch
Summary: nova instances not acquiring dhcp IP address during launch
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 4.0
Assignee: Terry Wilson
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-26 14:55 UTC by Richard Smith
Modified: 2016-04-26 20:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-13 05:34:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
Packstack answer file (13.80 KB, text/plain)
2013-12-02 23:24 UTC, Richard Smith
no flags Details

Description Richard Smith 2013-11-26 14:55:30 UTC
Description of problem:
Launching nova instances on a neutron subnet with DHCP enabled results in the instances coming up without an IP address assigned to its single interface.  A subsequent  ifup <interface> fixes the problem.

Version-Release number of selected component (if applicable):
RHEL OpenStack havana, namely:
  rhel openstack-neutron-2013.2-5
  rhel openstack-nova-compute-2013.2-4

How reproducible:
almost always


Steps to Reproduce:
1. Packstack installation of one physical controller and two physical nova hosts, with neutron networking,
2. Upload cirros 0.3.0.x86_64 image into glance
3. Launch an instance with m1.tiny, cirros image
4. Login to new instance via console, ifconfig eth0 shows no IP assigned
5. ifup eth0 to get an IP, then ping gateway works

Actual results:
ifconfig eth0 shows no IP address assigned to the interface

Expected results:
Upon booting, the nova instance should acquire an IP address for its internal interface from the DHCP agent


Additional info:

Comment 2 Aleksandr Brezhnev 2013-11-26 15:46:43 UTC
This issue happens when the new VM is the first VM on the virtual network.
Neutron's dhcp-agent does not start dnsmasq daemon for the network if the network does not have DHCP clients. However, when the first VM appears, it looks like sometimes it may be initiated faster then the dnsmasq daemon. As a result, nobody answers its DHCP requests.

It looks like a race condition for me.

Comment 3 Terry Wilson 2013-11-26 15:49:43 UTC
Richard, if I remember correctly, there are issues with the cirros 0.3.0 image re: DHCP on RHEL. Please try the 0.3.1 image ( http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img ) and report back. If that doesn't fix the issue, please describe your setup (vlan, gre, etc.) and post your packstack answer file. Thanks.

Comment 4 Richard Smith 2013-12-02 23:24:47 UTC
Created attachment 831838 [details]
Packstack answer file

This is the original answer file used to create this Openstack cloud.

Comment 5 Richard Smith 2013-12-02 23:26:18 UTC
I downloaded the cirros.0.3.1 image from the link provided and retried the process above - same essential results;  create a new private network, launch 20 m1.tiny images, login on instance console, about 50% of the nodes had no IP assigned to their eth0 interface. I deleted the instances and tried again to launch only 10 instances and still the same results with some large portion of nodes getting no assigned IPs.

neutron-dhcp-agent reports dhcpoffers to all the instances.

Attaching our packstack answer-file which describes a gre setup, one controller, three nova hosts.

Comment 6 Maru Newby 2013-12-09 09:03:28 UTC
I'm wondering if the problem you are reporting with the 0.3.1 image are related to another bug (https://bugzilla.redhat.com/show_bug.cgi?id=1034822) also related to dhcp agent reliability.

Can you replicate the reported problem when booting a single VM?  

What is the load on the controller node - and more importantly, what is the cpu usage of the neutron service - when you are attempting to boot multiple vm's?

Comment 7 Maru Newby 2013-12-09 09:13:00 UTC
The correct link to the other bug is https://bugzilla.redhat.com/show_bug.cgi?id=1023818

Comment 8 Terry Wilson 2014-02-12 22:21:55 UTC
Using cirros 0.3.1, and packstack-based install of 2013.2.1-4 using gre and two compute hosts, every time I boot with lots of instances everything ends up getting a dhcp address. I just haven't been able to reproduce this.

Is it possible that there is an interaction with VLANs going on as well in this set up similar to what was just reported in https://bugzilla.redhat.com/show_bug.cgi?id=1064109? I don't see any information that would point me to that in the answer file provided, but maybe some manual configuration was done later? Or can you try to reproduce with 2013.2.1-4 and cirros 0.3.1? Thanks.

Comment 9 Richard Smith 2014-02-12 23:37:00 UTC
The original problem occured at a client site which I do not have access to.  The tenant network was attached manually to a vlan-tagged bonded interface, although we were using GRE across it.  At one point during that engagement, the problem disappeared and I was unable to reproduce it.  The circumstances were not unlike 1064109, but I cannot confirm it, but neither has the client reported the problem since.

Comment 10 Terry Wilson 2014-02-13 05:34:31 UTC
Going ahead and closing this as WORKSFORME. Sounds like there is at least a decent chance that this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1064109, so we can just continue to track that there since it has a lot of information on that bug. Thanks!


Note You need to log in before you can comment on or make changes to this bug.