| Summary: | nova instances not acquiring dhcp IP address during launch | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Richard Smith <rismith> | ||||
| Component: | openstack-neutron | Assignee: | Terry Wilson <twilson> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Ofer Blaut <oblaut> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.0 | CC: | chrisw, kburres, lpeer, mnewby, rismith, twilson, yeylon | ||||
| Target Milestone: | z2 | Keywords: | Unconfirmed, ZStream | ||||
| Target Release: | 4.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-02-13 05:34:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Richard Smith
2013-11-26 14:55:30 UTC
This issue happens when the new VM is the first VM on the virtual network. Neutron's dhcp-agent does not start dnsmasq daemon for the network if the network does not have DHCP clients. However, when the first VM appears, it looks like sometimes it may be initiated faster then the dnsmasq daemon. As a result, nobody answers its DHCP requests. It looks like a race condition for me. Richard, if I remember correctly, there are issues with the cirros 0.3.0 image re: DHCP on RHEL. Please try the 0.3.1 image ( http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img ) and report back. If that doesn't fix the issue, please describe your setup (vlan, gre, etc.) and post your packstack answer file. Thanks. Created attachment 831838 [details]
Packstack answer file
This is the original answer file used to create this Openstack cloud.
I downloaded the cirros.0.3.1 image from the link provided and retried the process above - same essential results; create a new private network, launch 20 m1.tiny images, login on instance console, about 50% of the nodes had no IP assigned to their eth0 interface. I deleted the instances and tried again to launch only 10 instances and still the same results with some large portion of nodes getting no assigned IPs. neutron-dhcp-agent reports dhcpoffers to all the instances. Attaching our packstack answer-file which describes a gre setup, one controller, three nova hosts. I'm wondering if the problem you are reporting with the 0.3.1 image are related to another bug (https://bugzilla.redhat.com/show_bug.cgi?id=1034822) also related to dhcp agent reliability. Can you replicate the reported problem when booting a single VM? What is the load on the controller node - and more importantly, what is the cpu usage of the neutron service - when you are attempting to boot multiple vm's? The correct link to the other bug is https://bugzilla.redhat.com/show_bug.cgi?id=1023818 Using cirros 0.3.1, and packstack-based install of 2013.2.1-4 using gre and two compute hosts, every time I boot with lots of instances everything ends up getting a dhcp address. I just haven't been able to reproduce this. Is it possible that there is an interaction with VLANs going on as well in this set up similar to what was just reported in https://bugzilla.redhat.com/show_bug.cgi?id=1064109? I don't see any information that would point me to that in the answer file provided, but maybe some manual configuration was done later? Or can you try to reproduce with 2013.2.1-4 and cirros 0.3.1? Thanks. The original problem occured at a client site which I do not have access to. The tenant network was attached manually to a vlan-tagged bonded interface, although we were using GRE across it. At one point during that engagement, the problem disappeared and I was unable to reproduce it. The circumstances were not unlike 1064109, but I cannot confirm it, but neither has the client reported the problem since. Going ahead and closing this as WORKSFORME. Sounds like there is at least a decent chance that this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1064109, so we can just continue to track that there since it has a lot of information on that bug. Thanks! |