Bug 1827276

Summary: Sync script for sidecar containers can't spawn dnsmasq processes for all networks
Product: Red Hat OpenStack Reporter: Slawek Kaplonski <skaplons>
Component: openstack-tripleo-heat-templatesAssignee: Alex Katz <akatz>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: akatz, bcafarel, bdobreli, mburns, mkrcmari
Target Milestone: betaKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200513033425.a90c03e.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:52:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1824397    

Description Slawek Kaplonski 2020-04-23 14:13:04 UTC
I found it when checking failure of Tobiko tests: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-network-neutron-16_director-rhel-virthost-3cont_2comp-ipv4-vxlan-tobiko/24/testReport/junit/tobiko.tests.faults.agents.test_neutron_agents/DHCPAgentTest/test_dhcp_lease_served_when_dhcp_agent_down/

Basically it seems that neutron-dhcp-agent can actually spawns dnsmasq process (sidecar container) only for one network. For all others processes aren't started at all.

How to reproduce:
1. Create 2 networks (I had 3 in my env) with subnets with dhcp enabled,
2. Restart neutron-dhcp-agent process on controller
3. Check running dnsmasq processes running on host - it will be only one such process and should be one per network.

Comment 1 Bogdan Dobrelya 2020-04-24 11:07:57 UTC
The race with merging events should have something to https://github.com/systemd/systemd/issues/5770

Comment 2 Brent Eagles 2020-04-29 12:25:25 UTC
There are two facets to this bug, each of which may deserve their own separate bug reports:

- The sync and wrapper scripts are not using the same lock file, resulting in a race on the shared "processes" file. The solution for this is straightforward.

- The mechanism used to trigger the sync process is based on systemd.path notifications, which does not appear to queue individual notifications. This may result in some sidecars not being launched.

Bogdan's patch looks promising (see https://review.opendev.org/#/c/723373/) but will need extensive testing.

Comment 3 Brent Eagles 2020-05-07 14:24:33 UTC
The decision was to revert the patches that remove the old sidecar mechanism and modify the neutron templates to use the systemd wrapppers.

Patch to master is here: https://review.opendev.org/#/c/725162/
After this merges, we will backport to train so it is available for 16.1

Comment 4 Bernard Cafarelli 2020-05-13 08:01:26 UTC
Train backport merged and package built for 16.1

Comment 10 Alex McLeod 2020-06-16 12:29:32 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 12 errata-xmlrpc 2020-07-29 07:52:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148