Bug 2033570
| Summary: | [OSP17] Iptables rules on undercloud are missing after reboot | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Mikolaj Ciecierski <mciecier> |
| Component: | tripleo-ansible | Assignee: | Brendan Shephard <bshephar> |
| Status: | CLOSED ERRATA | QA Contact: | Jason Grosso <jgrosso> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 17.0 (Wallaby) | CC: | bshephar, hjensas, jkreger, jpretori, mburns |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | tripleo-ansible-3.3.1-0.20220326002748.9efbca4.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-21 12:18:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mikolaj Ciecierski
2021-12-17 09:36:05 UTC
I'm not able to reproduce this on OSP17 deployed with infrared: [stack@undercloud-0 ~]$ sudo systemctl restart iptables [stack@undercloud-0 ~]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Sun Dec 19 11:30:03 2021 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [32:1920] :OUTPUT ACCEPT [32:1920] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Sun Dec 19 11:30:03 2021 [stack@undercloud-0 ~]$ sudo reboot [stack@undercloud-0 ~]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Sun Dec 19 11:31:09 2021 *nat :PREROUTING ACCEPT [3:728] :INPUT ACCEPT [1:60] :POSTROUTING ACCEPT [139:9155] :OUTPUT ACCEPT [139:9155] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Sun Dec 19 11:31:09 2021 [stack@undercloud-0 ~]$ uptime 11:31:45 up 1 min, 1 user, load average: 2.30, 0.81, 0.29 I'll take a look at the Jenkins job tomorrow Morning and see if there is something that stands out. Hmm, this shouldn't still be empty: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-updates-17.0-from-passed_phase1-HA_no_ceph-ipv4/23/undercloud-0/etc/sysconfig/iptables.gz Which happens here: ❯ grep "Create empty ruleset" undercloud_install.log ─╯ 2021-12-16 12:55:19.361381 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TASK | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables 2021-12-16 12:55:19.814658 | 52540094-4a29-fbb2-c5fb-0000000001e7 | CHANGED | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | item=/etc/sysconfig/iptables 2021-12-16 12:55:19.816327 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.476563 | 0.45s 2021-12-16 12:55:20.163430 | 52540094-4a29-fbb2-c5fb-0000000001e7 | CHANGED | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | item=/etc/sysconfig/ip6tables 2021-12-16 12:55:20.164396 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.824637 | 0.80s 2021-12-16 12:55:20.166099 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.826348 | 0.80s Masquerade rules created here: ❯ grep 'routed_network masquerade' undercloud_install.log ─╯ "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", A new /etc/sysconfig/iptables file is created as part of this puppet run ^^. Example: (venv) [stack@undercloud-0 undercloud]$ sudo mv /etc/sysconfig/iptables{,-backup} (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -v -L POSTROUTING --line-number Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) num pkts bytes target prot opt in out source destination 1 168K 10M RETURN all -- any any 192.168.24.0/24 192.168.24.0/24 state NEW,RELATED,ESTABLISHED /* 137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4 */ 2 47 3552 MASQUERADE all -- any any 192.168.24.0/24 anywhere state NEW,RELATED,ESTABLISHED /* 138 routed_network masquerade 192.168.24.0/24 ipv4 */ (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -D POSTROUTING 2 (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -D POSTROUTING 1 (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -v -L POSTROUTING --line-number Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) num pkts bytes target prot opt in out source destination (venv) [stack@undercloud-0 undercloud]$ cat puppet_apply puppet apply -vvv \ --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules \ --detailed-exitcodes \ --summarize \ --color=true \ /var/lib/tripleo-config/puppet_step_config.pp (venv) [stack@undercloud-0 undercloud]$ sudo bash puppet_apply Notice: Compiled catalog for undercloud-0.redhat.local in environment production in 0.26 seconds Info: Applying configuration version '1639960881' Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24]/Firewall[137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4]/ensure: created Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -vL POSTROUTING Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) pkts bytes target prot opt in out source destination 270 16204 RETURN all -- any any 192.168.24.0/24 192.168.24.0/24 state NEW,RELATED,ESTABLISHED /* 137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4 */ 0 0 MASQUERADE all -- any any 192.168.24.0/24 anywhere state NEW,RELATED,ESTABLISHED /* 138 routed_network masquerade 192.168.24.0/24 ipv4 */ (venv) [stack@undercloud-0 undercloud]$ sudo grep -i nat /etc/sysconfig/iptables -A FORWARD -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "140 routed_network forward destinations 192.168.24.0/24 ipv4" -j ACCEPT *nat (venv) [stack@undercloud-0 undercloud]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Mon Dec 20 00:44:07 2021 *nat :PREROUTING ACCEPT [50:4280] :INPUT ACCEPT [1:60] :POSTROUTING ACCEPT [182507:10833591] :OUTPUT ACCEPT [182519:10834311] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Mon Dec 20 00:44:07 2021 All that to say that I think there is something weird going on with that particular deployment. Are you able to reproduce that error every time? The issue here is specific to the undercloud update process. Since it isn't executing puppet tasks, it isn't applying the masquerade firewall rules again.
Re-running puppet restores the rules:
puppet apply -vvv \
--modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules \
--detailed-exitcodes \
--summarize \
--color=true \
/var/lib/tripleo-config/puppet_step_config.pp
Maybe the solution is to consolidate all of these firewall rules into tripleo-ansible and ensure they are all restored during the update_steps. Having a combination of Puppet and Ansible adding and changing firewall rules probably isn't ideal anyway.
Oh yeah, I see what you mean:
(undercloud) [stack@tripleo-director ~]$ sudo cat /etc/sysconfig/iptables
# empty ruleset created by deployed-server bootstrap(undercloud) [stack@tripleo-director ~]$
(undercloud) [stack@tripleo-director ~]$ sudo puppet apply -vvv --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize
--color=true /var/lib/tripleo-config/puppet_step_config.pp
[...]
Info: Applying configuration version '1641640001'
Notice: Applied catalog in 1.24 seconds
Changes:
Events:
Resources:
Total: 22
Time:
Filebucket: 0.00
Schedule: 0.00
Package: 0.00
Firewall: 0.00
Exec: 0.01
Augeas: 0.03
File: 0.06
Service: 0.12
Config retrieval: 0.88
Transaction evaluation: 1.23
Catalog application: 1.24
Last run: 1641640003
Total: 1.25
Version:
Config: 1641640001
Puppet: 7.8.0
(undercloud) [stack@tripleo-director ~]$ sudo cat /etc/sysconfig/iptables
# empty ruleset created by deployed-server bootstrap(undercloud) [stack@tripleo-director ~]$
The iptables-save happens in firewall.pp, but firewall.pp isn't included in /var/lib/tripleo-config/puppet_step_config.pp, and we have manage_firewall set to false here:
https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/tripleo-firewall/tripleo-firewall-baremetal-ansible.yaml#L58
So it must be managed by the tripleo_firewall Ansible role:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L61-L70
Looks like this would find all of the rules that are in mem:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L56-L59
And then determine that no changes are required, so it would never execute that block:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L62-L63
Which we should be able to verify in the logs:
2022-01-08 21:15:29,902 p=71145 u=root n=ansible | 2022-01-08 21:15:29.901446 | 566f14f3-0016-3ca3-7e4e-0000000006c2 | TASK | Save firewall rules ipv4
2022-01-08 21:15:29,950 p=71145 u=root n=ansible | 2022-01-08 21:15:29.948514 | 566f14f3-0016-3ca3-7e4e-0000000006c2 | SKIPPED | Save firewall rules ipv4 | tripleo-director
So the problem that needs fixing is the when statement that determines whether or not that block needs to be executed.
Adding dfg:hardprov as well.
I'm sure there is a more elegant solution. But if this is going to become a blocker for anything, than this should fix the issue in the interim: https://review.opendev.org/c/openstack/tripleo-ansible/+/823893 Would still appreciate some additional feedback from dfg:upgrades and dfg:hardprov Results after that change: 2022-01-08 23:05:30.325631 | 566f14f3-0016-7e76-8d0f-0000000006c0 | TASK | Manage firewall rules 2022-01-08 23:05:54.119043 | 566f14f3-0016-7e76-8d0f-0000000006c0 | OK | Manage firewall rules | tripleo-director 2022-01-08 23:05:54.121579 | 566f14f3-0016-7e76-8d0f-0000000006c0 | TIMING | tripleo_firewall : Manage firewall rules | tripleo-director | 0:01:46.348490 | 23.79s 2022-01-08 23:05:54.159581 | 566f14f3-0016-7e76-8d0f-0000000006c1 | TASK | Check that /etc/sysconfig/iptables isn't empty 2022-01-08 23:05:54.835567 | 566f14f3-0016-7e76-8d0f-0000000006c1 | CHANGED | Check that /etc/sysconfig/iptables isn't empty | tripleo-director 2022-01-08 23:05:54.838318 | 566f14f3-0016-7e76-8d0f-0000000006c1 | TIMING | tripleo_firewall : Check that /etc/sysconfig/iptables isn't empty | tripleo-director | 0:01:47.065235 | 0.68s 2022-01-08 23:05:54.882352 | 566f14f3-0016-7e76-8d0f-0000000006c3 | TASK | Save firewall rules ipv4 2022-01-08 23:05:55.326388 | 566f14f3-0016-7e76-8d0f-0000000006c3 | CHANGED | Save firewall rules ipv4 | tripleo-director 2022-01-08 23:05:55.328749 | 566f14f3-0016-7e76-8d0f-0000000006c3 | TIMING | tripleo_firewall : Save firewall rules ipv4 | tripleo-director | 0:01:47.555666 | 0.44s 2022-01-08 23:05:55.363950 | 566f14f3-0016-7e76-8d0f-0000000006c4 | TASK | Save firewall rules ipv6 2022-01-08 23:05:55.786483 | 566f14f3-0016-7e76-8d0f-0000000006c4 | CHANGED | Save firewall rules ipv6 | tripleo-director (In reply to Brendan Shephard from comment #9) > I'm sure there is a more elegant solution. But if this is going to become a > blocker for anything, than this should fix the issue in the interim: > https://review.opendev.org/c/openstack/tripleo-ansible/+/823893 > > Would still appreciate some additional feedback from dfg:upgrades and > dfg:hardprov Undercloud as a router for the overcloud nodes is a bad idea ... This functionality is there to allow test/dev environments to use the undercloud as a router. IMO, we should deprecate and remove this functionality instead of spending resources on refactoring it. With the uncertain? role of ansible in future tripleo spending resources on re-implementing this in ansible does not make sense. If the proposed patch works, let's roll with it. And if we re-factor firewalling in tripleo to not use ansible we can re-visit. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 |