Bug 1723935

Summary: dhcp agent dnsmasq process mgmt race condition between launch and operations
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-neutronAssignee: Brian Haley <bhaley>
Status: CLOSED ERRATA QA Contact: Candido Campos <ccamposr>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: amuller, bhaley, chrisw, dhill, scohen, skaplons
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-12.0.6-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1728866 (view as bug list) Environment:
Last Closed: 2019-09-03 16:53:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1728866    

Description Jeremy 2019-06-25 19:24:24 UTC
Description of problem:
Downstream copy of https://bugs.launchpad.net/neutron/+bug/1824802

There may be a race condition involving dnsmasq startup and port operations. What appears to happen is that dnsmasq is started but the pid file isn't available when a port change occurs. The dhcp agent then attempts to start a new dnsmasq instance even though the previous one is the process of being loaded.


Version-Release number of selected component (if applicable):
osp13



Additional info
We managed to identify the issue which was too many ports handled by DHCP agent. One thread of the DHCP agent process was stuck in a loop that kept updating ports. 

There were so many ports that the process was timing out and restarting over and over again with even more ports when spawned again. We had to temporarily disable DHCP on all CCI networks in order to allow DHCP agents to come up and properly update their ports. 

The next step which is ongoing will be slowly enabling DHCP on every network.

Comment 21 errata-xmlrpc 2019-09-03 16:53:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2629