Bug 1214891
Summary: | VM instances do not get DHCP addr on boot | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Chris Dearborn <christopher_dearborn> |
Component: | openstack-foreman-installer | Assignee: | Jason Guiditta <jguiditt> |
Status: | CLOSED NOTABUG | QA Contact: | Shai Revivo <srevivo> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.0 (Juno) | CC: | amuller, arkady_kanevsky, bthurber, cdevine, christopher_dearborn, cwolfe, fdeutsch, jjarvis, John_walsh, kbader, kurt_hey, mburns, morazi, randy_perryman, rhos-maint, sreichar, srevivo, wayne_allen |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | Installer | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-04 15:23:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1171850 |
Description
Chris Dearborn
2015-04-23 19:29:51 UTC
Can we get some direction how to debug this issue? Internally, there was some tribal knowledge that claimed that adding PROMISC=yes to the ifcfg file for br-tenant on the compute nodes and then rebooting the compute nodes fixed the problem when running on hardware. I have worked a fair amount on trying to debug this issue on VMs running in VMware Workstation, and here is what I've found: - Adding PROMISC=yes to an ifcfg file is completely ignored. The interface is not put into promiscuous mode. - Manually forcing br-tenant into promiscuous mode via ifconfig does not fix the problem. - I tried rebooting the compute node after adding PROMISC=yes, and that didn't work - I tried shutting down the compute, then rebooting the controllers, clustercheck, pcs status, then booting up the compute. Didn't work. - After several reboots of the compute node, DHCP randomly started to work. - I tried 3 reboots of the compute, and DHCP continued to work - I removed PROMISC=yes from br-tenant & rebooted. Continued to work - tried 2 more reboots, Continued to work. - unable to break it. Once the problem corrects itself, it appears to remain fixed. It is easy to reproduce on both hardware and on VMs following the above instructions. On hardware, a simple reboot of the compute nodes fixes the problem. On VMware Workstation VMs, it appears that a random number of reboots of the compute nodes are required to fix the problem. I verified that this problem still exists in A3. We were able to reproduce this using A2 bits and Dell 13g hardware. Reboot of the compute nodes allowed the VMs to start getting DHCP addresses. Please attach yaml for controllers and compute nodes, along with nova and neutron commands used which will help us properly reproduce the issue. Thanks in advance. Can we get contents of:
/var/lib/neutron/*
/etc/neutron/*
/var/log/neutron/*
On all of your controllers, and on an affected compute node (Before a reboot that resolves the issue).
Also, on an affected compute, the output of:
ovs-vsctl show
ovs-ofctl dump-flows br-int
ovs-ofctl dump-flows br-tenant (I understand you're using VLANs segmentation)
neutron net-show <network the VMs are connected to>
> I have tried using tcpdump to see where the DHCP response is lost. It makes it back to br-tenant, but no further then that, so it looks like OVS is dropping it on the floor.
One additional thing you can try is to 'ip link set dev %s up' where %s is the name of the tap device the of the VM. Then, tcpdump it and see if it's seeing DHCP responses.
Handing this needinfo off to Wayne, since he is in the process of doing a fresh install and will hopefully be able to supply the requested information. I have not been able to reproduce this problem. All the vm's seem able to get addresses without restarting anything after install. If we see the problem again I will try to provide more data. As this has not been reproduced for over a year, I am going to mark it NOTABUG and chalk it up to a config issue or something unrelated that caused this as a side effect. |