Bug 1810316 - OPENSHIFT-FIREWALL iptables rules are not restored after accidental deletion
Summary: OPENSHIFT-FIREWALL iptables rules are not restored after accidental deletion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-04 23:17 UTC by Ross Brattain
Modified: 2020-07-13 17:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:18:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sdn pod logs at V4 (70.17 KB, text/plain)
2020-03-09 15:19 UTC, Anurag saxena
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:18:44 UTC

Description Ross Brattain 2020-03-04 23:17:12 UTC
Description of problem:

openshift-sdn iptables rules are not restored on worker nodes after accidental deletion

Version-Release number of selected component (if applicable):

4.4.0-0.nightly-2020-03-04-103509

How reproducible:

Always

Steps to Reproduce:
1. oc debug node/<node> -- chroot /host iptables -F
2. wait 35 seconds
3. iptables-save -t filter


Actual results:

mutiple OPENSHIT-FIREWALL-FORWARD and -ALLOW rules are missing

the VXLAN dport 4789 rule is missing
the tun0 rule is missing

The following rules are absent:
-A OPENSHIFT-FIREWALL-FORWARD -d 10.128.0.0/14 -m comment --comment "forward traffic from SDN" -j ACCEPT
-A OPENSHIFT-FIREWALL-FORWARD -s 10.128.0.0/14 -m comment --comment "forward traffic to SDN" -j ACCEPT
-A OPENSHIFT-FIREWALL-ALLOW -p udp -m udp --dport 4789 -m comment --comment "VXLAN incoming" -j ACCEPT
-A OPENSHIFT-FIREWALL-ALLOW -i tun0 -m comment --comment "from SDN to localhost" -j ACCEPT


Expected results:

The following rules are present:
-A OPENSHIFT-FIREWALL-FORWARD -d 10.128.0.0/14 -m comment --comment "forward traffic from SDN" -j ACCEPT
-A OPENSHIFT-FIREWALL-FORWARD -s 10.128.0.0/14 -m comment --comment "forward traffic to SDN" -j ACCEPT
-A OPENSHIFT-FIREWALL-ALLOW -p udp -m udp --dport 4789 -m comment --comment "VXLAN incoming" -j ACCEPT
-A OPENSHIFT-FIREWALL-ALLOW -i tun0 -m comment --comment "from SDN to localhost" -j ACCEPT

Additional info:

Comment 2 Anurag saxena 2020-03-05 00:49:50 UTC
Further investigating part is we are still able to communicate udp/tcp traffic across the nodes with "vxlan incoming" rule being absent.

Comment 3 Ben Bennett 2020-03-05 14:02:35 UTC
You can recover by restarting the sdn... but ... don't delete the rules?

Comment 4 Anurag saxena 2020-03-05 14:20:19 UTC
Thanks Ben. The rules can be recovered if we restart sdn pods but the way we test on 4.x is 

oc rsh <sdn_pod>
# iptables --flush

Doing above is causing this issue. 
We expected flush to delete all rules in chain or all chains?

Comment 5 Anurag saxena 2020-03-05 14:24:26 UTC
*We expected flush to delete all rules in chain or all chains? and the expected all chains recovery?

Comment 6 Ross Brattain 2020-03-05 14:28:17 UTC
To clarify this a regression, the test used to pass.  Will attempt to reproduce on 4.3.

Comment 7 Ross Brattain 2020-03-05 16:22:04 UTC
This is a regression from 4.3.0-0.nightly-2020-03-04-235307

on 4.3.0-0.nightly-2020-03-04-235307 after an iptables -F the iptables rules are fully restored in ~31 seconds.

Did the iptables sync period change from 4.3 to 4.4 maybe?

Comment 9 Dan Winship 2020-03-05 18:43:55 UTC
(In reply to Anurag saxena from comment #4)
> Thanks Ben. The rules can be recovered if we restart sdn pods but the way we
> test on 4.x is 
> 
> oc rsh <sdn_pod>
> # iptables --flush
> 
> Doing above is causing this issue. 
> We expected flush to delete all rules in chain or all chains?

Right, so the current behavior is that Kubernetes will recover from "systemctl restart iptables" or "firewall-cmd --reload", but it doesn't recover from arbitrary deletion of rules because we never actually cared about recovering from arbitrary deletion of rules, we only cared about recovering from firewall reloads.

I think RHCOS has iptables.service? So you could do

    oc adm debug node/NODENAME

      chroot /host
      systemctl restart iptables

if not, then the way to emulate it is:

    for table in filter mangle nat raw security; do
      iptables --flush -t $table
    done

(ie, you have to actually flush all of the tables, or it won't necessarily notice)

Note that this change is being backported all the way to 3.11 so you will need to update the tests on older releases too.

Comment 10 Anurag saxena 2020-03-06 21:43:58 UTC
Thanks Dan Winship for suggestions. So I performed the following steps on latest 4.4 nightly but found that rules still miss to re-appear fully specially those one mentioned in description by Ross.

On RHCOS , "Unit iptables.service not found" and "firewall-cmd" is also absent so i took an emulated approach

Steps taken

1) oc debug node/<NODE>

2) Checked rules by `iptables -S`

3) for table in filter mangle nat raw security; do iptables --flush -t $table;done

4) Checked rules again by `iptables -S` (missing those 4 rules mentioned in bug description)

Comment 11 Dan Winship 2020-03-09 14:17:57 UTC
(In reply to Anurag saxena from comment #10)
> 4) Checked rules again by `iptables -S` (missing those 4 rules mentioned in
> bug description)

I assume you gave it 10 seconds or so to recover before checking?

Also, to clarify, in the initial report Ross also said:

> the VXLAN dport 4789 rule is missing
> the tun0 rule is missing

are those still missing as well?

Can you get sdn logs from trying to reproduce it using the new reproducer? (Maybe at log level 4?)

Comment 13 Anurag saxena 2020-03-09 15:19:26 UTC
Created attachment 1668681 [details]
sdn pod logs at V4

Comment 14 Dan Winship 2020-03-09 17:10:35 UTC
> 6) oc logs <sdn_pod> --loglevel=4 > sdn_logs_log_level_4 (attached)

er, no, I meant the logs of the SDN pod while the SDN pod is running at --loglevel=4

Comment 17 Dan Winship 2020-03-09 21:47:22 UTC
sorry, my bad. You need to both flush and delete the tables. So:

    for table in filter mangle nat raw security; do
      iptables -t $table -F
      iptables -t $table -X
    done

*THEN* it will notice and reload the rules.

Comment 18 zhaozhanqi 2020-03-10 11:57:49 UTC
Have a try with above comment 17. works well for me.

Comment 19 zhaozhanqi 2020-03-12 05:52:10 UTC
Verified this bug

Comment 22 errata-xmlrpc 2020-07-13 17:18:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.