Bug 1421022

Summary: Neutron-linuxbridge-agent startup bridge configuration
Product: [Community] RDO Reporter: kalle.happonen
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED EOL QA Contact: Ofer Blaut <oblaut>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: chrisw, ihrachys, srevivo, ykarel
Target Milestone: ---Keywords: Reopened
Target Release: trunk   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-13 07:13:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description kalle.happonen 2017-02-10 06:49:48 UTC
Description of problem:
In a recent update (Liberty), the systemd unitfile for neutron-linuxbridge-agent was updated in the openstack-neutron-linuxbridge rpm.

/usr/lib/systemd/system/neutron-linuxbridge-agent.service

This line was added to the file.
ExecStartPre=/usr/bin/neutron-enable-bridge-firewall.sh

This is great for compute nodes. However, if you are running a linuxbridge only setup (no OVS), the neutron-linuxbridge-agent also runs on the network nodes (in a standard l3 agent setup). 

The network nodes have nothing from OpenStack's side controlling the iptables rules for the bridge interfaces. The change in packaging explicitly turns on firewalling for the bridge interfaces.

In our case a package update, and a restart of the service, killed all customer network traffic, and we no idea why. Luckily we did this only on one network node, so we could migrate the routers away, and cause only a moderate customer impact.

Forcing iptables on the bridges should probably not be done on the network nodes, since there are no standards about handling the bridge traffic filtering on them, and in the default scenario of no firewall control for the, it blocks all traffic.

Version-Release number of selected component (if applicable):
Liberty->Master

How reproducible:
Easy

Steps to Reproduce:
1. Set up a linuxbridge-only network node with standard iptable rules
2. Start neutron-linuxbridge-agent
3. Witness the lack of customer network traffic

Actual results:
Customer traffic going through the network node stops.

Expected results:
No impact on customer traffic.

Additional info:

Comment 2 Ihar Hrachyshka 2017-05-24 21:53:14 UTC
Whatever your networking configuration is, I don't see a reason why it should depend on kernel modules loaded. You may need to define additional iptables rules that would allow traffic you need on bridges you expose to customer traffic. What we do in the new service file is we enable ability to define such rules. It's up to consumers of the tables (l2 agent, or your host networking configuration) to set actual rules everything up.

Unless I completely misunderstood your problem, it seems to me it's a lack of iptables configuration that is a problem here, not the change in the service unit file. If you believe otherwise, (meaning, you can't solve it with explicit iptables rules), then please reopen the bug.

Comment 3 kalle.happonen 2017-05-29 05:04:39 UTC
The module loading itself is not a big issue. This part of the script is.

for proto in arp ip ip6; do
    /usr/sbin/sysctl -w net.bridge.bridge-nf-call-${proto}tables=1
done

It explicitly sets calling of iptables on bridge devices. This is a change from the previous packaging, and this is very hard to override in a nice way. I had to prevent this from running on the network nodes by overriding the calling of the script.

/etc/systemd/system/neutron-linuxbridge-agent.service.d/exec.conf

The problem is that on compute nodes, the iptables rules are managed automatically for dynamically created devices. On the network node you have nothing managing the rules, you have to specify them by hand.

So the end result is that you would have to somehow come up with firewall rules that would work on dynamically created and removed bridge devices with arbitrary IP ranges. This is very hard to do, especially without interfering with your other traffic at the same time. And the rules you should try to create is "allow all traffic on the bridge interfaces, since it's filtered elsewhere".

The net.bridge.bridge-nf-call-${proto}tables=0 accomplishes the same thing without the associated problems. It can't of course be set to the default either, since it would break compute node security groups.

The end result is that I think this should be admin-definable depending on the node types.

I did reopen this bug now. If you do not consider this a bug for a pure linuxbridge installation after this explanation either, feel free to close this again.

Comment 4 Yatin Karel 2021-01-13 07:13:18 UTC
Liberty is EOL long ago, closing the bug as EOL, if you feel it's still applicable for supported release please re open for that release so it can be fixed.