Bug 2269219

Summary: tripleo_iptables doesn't apply rules idempotently
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: tripleo-ansibleAssignee: Brendan Shephard <bshephar>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: bshephar, enothen, jpretori, knoha, mariel, mburns, pkomarov, tkuroda
Target Milestone: z4Keywords: Triaged
Target Release: 17.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130751.e7c7ce3.el9ost tripleo-ansible-3.3.1-17.1.20240918100824.8debef3.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-21 09:30:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2024-03-12 17:28:36 UTC
Description of problem:
TripleO provides an interface to customize iptables rules on overcloud nodes using role-specific ExtraFirewallRules definitions. One of our customers reported concerning behavior, which is easily reproducible in the lab: rules are applied properly when ExtraFirewallRules is pre-defined before initial overcloud deployment. But iptables rules on overcloud nodes are messed up when ExtraFirewallRules are applied in existing environment.

tripleo_iptables gets an ordered list of rules required by TripleO, parses them, translates them into format required by built-in ansible iptables module and passes them to this module, so they are applied on overcloud nodes. If rule already exists on overcloud node, then built-in iptables module skips it; remaining rules are either inserted (so rule will become first in chain) or appended (so rule becomes last in the chain).

By default last default iptables rule is to drop all traffic, so additional rules appended on top of that during subsequent deployment don't make much sense: no traffic will be left for them.

Inserting all new rules at the beginning of a chain is not that bad for situations when a goal is to drop some specific traffic before default rules: it is unpleasant that rule numbers are no honored and that a rule that accepts traffic for established connections is not the first one anymore (this creates small overhead), but not that big a deal after all.

However, this approach makes it impossible to insert meaningful rules after default ones during subsequent deployments: both insert and append don't allow this.

From my perspective this situation comes from design of tripleo_firewall rule and may not have a simple workaround in existing TripleO architecture. Nonetheless, I am reporting this bug to hear feedback from engineering.

Version-Release number of selected component (if applicable):
RHOSP 17.1

How reproducible:
Deploy RHOSP 17.1 overcloud, then try to apply consistent iptables rules using ExtraFirewallRules

Actual results:
Rule numbers' order will be preserved only between new batch of inserted rules, pre-existing iptables rules and appended rules will not follow relative order in rule numbers.

Expected results:
New rules are applied according to their rule numbers and relative order between new and pre-existing rules is preserved.

Comment 1 Brendan Shephard 2024-03-13 23:38:39 UTC
Hey, so the firewall rules should be the same for each deployment regardless of whether it's the first or fifteenth deployment. Because we set all of them based on this fact:
https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L77-L88

Then we apply the rules from the fact {{ firewall_rules_sorted }}:
https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L116


So I'm just trying to clarify what you mean here: "pre-existing iptables rules and appended rules will not follow relative order in rule numbers"

What are we referring to as "pre-existing iptables rules"?

Comment 2 Alex Stupnikov 2024-03-14 08:38:27 UTC
Hi Brendan.

Thank you for taking a look. Indeed, this part works fine: ordered list of rules is passed to tripleo_iptables, which then parses them [1] and applies one by one via built-in iptables module [2]. This part works fine and rules are passed in correct order. The problem is that when built-in iptables module applies a rule that already exists, then it completes with no-op; while new rules are either inserted or appended. As a result, when someone tries to add new rules via ExtraFirewallRules in existing deployment, new rules will be either inserted or appended on top of existing ones and related order between existing ones and new ones will not be preserved. Example outcome is provided in 0010-firewall.yaml and 0020-iptables.txt attached to case 03759681; 0060-overcloud-deploy2.tar.gz has ansible debug logs + sosreports from controller nodes explaining the points I have mentioned.

[1]
https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/ansible_plugins/action/tripleo_iptables.py#L169-L289

[2]
https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/ansible_plugins/action/tripleo_iptables.py#L329-L333

Comment 3 Brendan Shephard 2024-03-19 11:42:09 UTC
It looks like a similar issue has been raised here actually, James appears to have spent some time on it there with the verdict being that the easiest way to clean it up is to remove the existing rules:
https://bugzilla.redhat.com/show_bug.cgi?id=2242069#c3

The other options I thought of were flushing the rules before applying, or writing all of the rules out to the iptables-save location and then reloading from those rules. But those options are problematic because of Neutron's dynamic iptables rules, so we can't go that route.

James' solution proposed there is probably the path of least resistance at the moment.


We could probably look into providing support for inserting at line numbers, but that would still be a mess unless each and every rule we create has an explicit line number.

And we can't really change the underlying iptables module in Ansible without forking it at this point. We would need to avoid doing that though.

Comment 4 Alex Stupnikov 2024-03-19 12:50:45 UTC
Hi Brendan. Thank you for taking detailed look. I also tried to evaluate some options you proposed and indeed they don't look like a perfect solution to me. Adding an option to insert rules to specific line number may be useful to certain extend: operators may use it to put a new rule to specific place and then remove line number from specification. But overall whole concept is very fragile and error prone.

I am wondering if we can consider an option to return puppets? They more or less worked for us and had less problems.

Comment 5 Alex Stupnikov 2024-03-19 16:59:04 UTC
UPD. Recommendations from James didn't work in customer's deployment. I investigated this a bit and it looks like they are not going to work with current state of tripleo_iptables lib. More information: https://bugzilla.redhat.com/show_bug.cgi?id=2242069#c4

Comment 7 Brendan Shephard 2024-04-03 09:06:02 UTC
We have a few bugs going for the same thing it seems. There is more context on the other BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=2242069

Do we agree that we're talking about the same problem on both of these bz's? Because if so, we should consolidate the conversation over there to avoid confusing threads of similar information.

Comment 8 Alex Stupnikov 2024-04-03 09:26:38 UTC
Hi Brendan.

From my perspective all bugs report slightly different problems:
- this bug is about idempotency of rules applied by tripleo_iptables: it doesn't related to FFU procedure and is solely about applying different sets of rules in the same overcloud consistently. 
- bug #2242069 is about pre-existing RHOSP 16 rules and ability of tripleo_iptables to manage them. This is related to FFU procedure and has little to do with previous point because the root cause seem to be https://bugzilla.redhat.com/show_bug.cgi?id=2242069#c4
- bug #2269002 may be related to this bug, but only to certain extent: solution for this bug may solve bug #2269002 as well, but it also may not. So it depends on engineering.

Best Regards, Alex.

Comment 34 errata-xmlrpc 2024-11-21 09:30:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978