Description of problem: Iptables nat rules are missing after undercloud reboot preventing compute to get to external network. Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1.Deploy undercloud 2.Check connection to external network from compute. Should be able to get to external network. 3.Reboot undercloud 4.Check connection to external network from compute. No access to external network. Actual results: Nat table is empty after undercloud reboot undercloud-0 ~]# iptables-save -t nat # Generated by iptables-save v1.8.4 on Fri Dec 17 09:28:47 2021 *nat COMMIT # Completed on Fri Dec 17 09:28:47 2021 Expected results: Nat table after undercloud reboot has rules to reach out to external network for i.e. compute nodes . Nat rules should be the same as before reboot. undercloud-0 ~]# iptables-save -t nat # Generated by iptables-save v1.8.4 on Thu Dec 16 14:32:04 2021 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :CNI-HOSTPORT-SETMARK - [0:0] :CNI-HOSTPORT-MASQ - [0:0] :CNI-HOSTPORT-DNAT - [0:0] -A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT -A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE -A OUTPUT -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT -A CNI-HOSTPORT-SETMARK -m comment --comment "CNI portfwd masquerade mark" -j MARK --set-xmark 0x2000/0x2000 -A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE COMMIT # Completed on Thu Dec 16 14:32:04 2021 Additional info:
I'm not able to reproduce this on OSP17 deployed with infrared: [stack@undercloud-0 ~]$ sudo systemctl restart iptables [stack@undercloud-0 ~]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Sun Dec 19 11:30:03 2021 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [32:1920] :OUTPUT ACCEPT [32:1920] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Sun Dec 19 11:30:03 2021 [stack@undercloud-0 ~]$ sudo reboot [stack@undercloud-0 ~]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Sun Dec 19 11:31:09 2021 *nat :PREROUTING ACCEPT [3:728] :INPUT ACCEPT [1:60] :POSTROUTING ACCEPT [139:9155] :OUTPUT ACCEPT [139:9155] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Sun Dec 19 11:31:09 2021 [stack@undercloud-0 ~]$ uptime 11:31:45 up 1 min, 1 user, load average: 2.30, 0.81, 0.29 I'll take a look at the Jenkins job tomorrow Morning and see if there is something that stands out.
Hmm, this shouldn't still be empty: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-updates-17.0-from-passed_phase1-HA_no_ceph-ipv4/23/undercloud-0/etc/sysconfig/iptables.gz Which happens here: ❯ grep "Create empty ruleset" undercloud_install.log ─╯ 2021-12-16 12:55:19.361381 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TASK | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables 2021-12-16 12:55:19.814658 | 52540094-4a29-fbb2-c5fb-0000000001e7 | CHANGED | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | item=/etc/sysconfig/iptables 2021-12-16 12:55:19.816327 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.476563 | 0.45s 2021-12-16 12:55:20.163430 | 52540094-4a29-fbb2-c5fb-0000000001e7 | CHANGED | Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | item=/etc/sysconfig/ip6tables 2021-12-16 12:55:20.164396 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.824637 | 0.80s 2021-12-16 12:55:20.166099 | 52540094-4a29-fbb2-c5fb-0000000001e7 | TIMING | tripleo_bootstrap : Create empty ruleset in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables | undercloud-0 | 0:00:22.826348 | 0.80s Masquerade rules created here: ❯ grep 'routed_network masquerade' undercloud_install.log ─╯ "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", "<13>Dec 16 12:59:32 puppet-user: Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created", A new /etc/sysconfig/iptables file is created as part of this puppet run ^^. Example: (venv) [stack@undercloud-0 undercloud]$ sudo mv /etc/sysconfig/iptables{,-backup} (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -v -L POSTROUTING --line-number Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) num pkts bytes target prot opt in out source destination 1 168K 10M RETURN all -- any any 192.168.24.0/24 192.168.24.0/24 state NEW,RELATED,ESTABLISHED /* 137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4 */ 2 47 3552 MASQUERADE all -- any any 192.168.24.0/24 anywhere state NEW,RELATED,ESTABLISHED /* 138 routed_network masquerade 192.168.24.0/24 ipv4 */ (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -D POSTROUTING 2 (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -D POSTROUTING 1 (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -v -L POSTROUTING --line-number Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) num pkts bytes target prot opt in out source destination (venv) [stack@undercloud-0 undercloud]$ cat puppet_apply puppet apply -vvv \ --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules \ --detailed-exitcodes \ --summarize \ --color=true \ /var/lib/tripleo-config/puppet_step_config.pp (venv) [stack@undercloud-0 undercloud]$ sudo bash puppet_apply Notice: Compiled catalog for undercloud-0.redhat.local in environment production in 0.26 seconds Info: Applying configuration version '1639960881' Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24]/Firewall[137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4]/ensure: created Notice: /Stage[main]/Tripleo::Masquerade_networks/Tripleo::Firewall::Rule[138 routed_network masquerade 192.168.24.0/24]/Firewall[138 routed_network masquerade 192.168.24.0/24 ipv4]/ensure: created (venv) [stack@undercloud-0 undercloud]$ sudo iptables -t nat -vL POSTROUTING Chain POSTROUTING (policy ACCEPT 182K packets, 11M bytes) pkts bytes target prot opt in out source destination 270 16204 RETURN all -- any any 192.168.24.0/24 192.168.24.0/24 state NEW,RELATED,ESTABLISHED /* 137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4 */ 0 0 MASQUERADE all -- any any 192.168.24.0/24 anywhere state NEW,RELATED,ESTABLISHED /* 138 routed_network masquerade 192.168.24.0/24 ipv4 */ (venv) [stack@undercloud-0 undercloud]$ sudo grep -i nat /etc/sysconfig/iptables -A FORWARD -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "140 routed_network forward destinations 192.168.24.0/24 ipv4" -j ACCEPT *nat (venv) [stack@undercloud-0 undercloud]$ sudo iptables-save -t nat # Generated by iptables-save v1.8.4 on Mon Dec 20 00:44:07 2021 *nat :PREROUTING ACCEPT [50:4280] :INPUT ACCEPT [1:60] :POSTROUTING ACCEPT [182507:10833591] :OUTPUT ACCEPT [182519:10834311] -A POSTROUTING -s 192.168.24.0/24 -d 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "137 routed_network return src 192.168.24.0/24 dest 192.168.24.0/24 ipv4" -j RETURN -A POSTROUTING -s 192.168.24.0/24 -m state --state NEW,RELATED,ESTABLISHED -m comment --comment "138 routed_network masquerade 192.168.24.0/24 ipv4" -j MASQUERADE COMMIT # Completed on Mon Dec 20 00:44:07 2021 All that to say that I think there is something weird going on with that particular deployment. Are you able to reproduce that error every time?
The issue here is specific to the undercloud update process. Since it isn't executing puppet tasks, it isn't applying the masquerade firewall rules again. Re-running puppet restores the rules: puppet apply -vvv \ --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules \ --detailed-exitcodes \ --summarize \ --color=true \ /var/lib/tripleo-config/puppet_step_config.pp Maybe the solution is to consolidate all of these firewall rules into tripleo-ansible and ensure they are all restored during the update_steps. Having a combination of Puppet and Ansible adding and changing firewall rules probably isn't ideal anyway.
Oh yeah, I see what you mean: (undercloud) [stack@tripleo-director ~]$ sudo cat /etc/sysconfig/iptables # empty ruleset created by deployed-server bootstrap(undercloud) [stack@tripleo-director ~]$ (undercloud) [stack@tripleo-director ~]$ sudo puppet apply -vvv --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=true /var/lib/tripleo-config/puppet_step_config.pp [...] Info: Applying configuration version '1641640001' Notice: Applied catalog in 1.24 seconds Changes: Events: Resources: Total: 22 Time: Filebucket: 0.00 Schedule: 0.00 Package: 0.00 Firewall: 0.00 Exec: 0.01 Augeas: 0.03 File: 0.06 Service: 0.12 Config retrieval: 0.88 Transaction evaluation: 1.23 Catalog application: 1.24 Last run: 1641640003 Total: 1.25 Version: Config: 1641640001 Puppet: 7.8.0 (undercloud) [stack@tripleo-director ~]$ sudo cat /etc/sysconfig/iptables # empty ruleset created by deployed-server bootstrap(undercloud) [stack@tripleo-director ~]$ The iptables-save happens in firewall.pp, but firewall.pp isn't included in /var/lib/tripleo-config/puppet_step_config.pp, and we have manage_firewall set to false here: https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/tripleo-firewall/tripleo-firewall-baremetal-ansible.yaml#L58 So it must be managed by the tripleo_firewall Ansible role: https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L61-L70 Looks like this would find all of the rules that are in mem: https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L56-L59 And then determine that no changes are required, so it would never execute that block: https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_firewall/tasks/main.yml#L62-L63 Which we should be able to verify in the logs: 2022-01-08 21:15:29,902 p=71145 u=root n=ansible | 2022-01-08 21:15:29.901446 | 566f14f3-0016-3ca3-7e4e-0000000006c2 | TASK | Save firewall rules ipv4 2022-01-08 21:15:29,950 p=71145 u=root n=ansible | 2022-01-08 21:15:29.948514 | 566f14f3-0016-3ca3-7e4e-0000000006c2 | SKIPPED | Save firewall rules ipv4 | tripleo-director So the problem that needs fixing is the when statement that determines whether or not that block needs to be executed. Adding dfg:hardprov as well.
I'm sure there is a more elegant solution. But if this is going to become a blocker for anything, than this should fix the issue in the interim: https://review.opendev.org/c/openstack/tripleo-ansible/+/823893 Would still appreciate some additional feedback from dfg:upgrades and dfg:hardprov
Results after that change: 2022-01-08 23:05:30.325631 | 566f14f3-0016-7e76-8d0f-0000000006c0 | TASK | Manage firewall rules 2022-01-08 23:05:54.119043 | 566f14f3-0016-7e76-8d0f-0000000006c0 | OK | Manage firewall rules | tripleo-director 2022-01-08 23:05:54.121579 | 566f14f3-0016-7e76-8d0f-0000000006c0 | TIMING | tripleo_firewall : Manage firewall rules | tripleo-director | 0:01:46.348490 | 23.79s 2022-01-08 23:05:54.159581 | 566f14f3-0016-7e76-8d0f-0000000006c1 | TASK | Check that /etc/sysconfig/iptables isn't empty 2022-01-08 23:05:54.835567 | 566f14f3-0016-7e76-8d0f-0000000006c1 | CHANGED | Check that /etc/sysconfig/iptables isn't empty | tripleo-director 2022-01-08 23:05:54.838318 | 566f14f3-0016-7e76-8d0f-0000000006c1 | TIMING | tripleo_firewall : Check that /etc/sysconfig/iptables isn't empty | tripleo-director | 0:01:47.065235 | 0.68s 2022-01-08 23:05:54.882352 | 566f14f3-0016-7e76-8d0f-0000000006c3 | TASK | Save firewall rules ipv4 2022-01-08 23:05:55.326388 | 566f14f3-0016-7e76-8d0f-0000000006c3 | CHANGED | Save firewall rules ipv4 | tripleo-director 2022-01-08 23:05:55.328749 | 566f14f3-0016-7e76-8d0f-0000000006c3 | TIMING | tripleo_firewall : Save firewall rules ipv4 | tripleo-director | 0:01:47.555666 | 0.44s 2022-01-08 23:05:55.363950 | 566f14f3-0016-7e76-8d0f-0000000006c4 | TASK | Save firewall rules ipv6 2022-01-08 23:05:55.786483 | 566f14f3-0016-7e76-8d0f-0000000006c4 | CHANGED | Save firewall rules ipv6 | tripleo-director
(In reply to Brendan Shephard from comment #9) > I'm sure there is a more elegant solution. But if this is going to become a > blocker for anything, than this should fix the issue in the interim: > https://review.opendev.org/c/openstack/tripleo-ansible/+/823893 > > Would still appreciate some additional feedback from dfg:upgrades and > dfg:hardprov Undercloud as a router for the overcloud nodes is a bad idea ... This functionality is there to allow test/dev environments to use the undercloud as a router. IMO, we should deprecate and remove this functionality instead of spending resources on refactoring it. With the uncertain? role of ansible in future tripleo spending resources on re-implementing this in ansible does not make sense. If the proposed patch works, let's roll with it. And if we re-factor firewalling in tripleo to not use ansible we can re-visit.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543