Description of problem: openshift-sdn iptables rules are not restored on worker nodes after accidental deletion Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-03-04-103509 How reproducible: Always Steps to Reproduce: 1. oc debug node/<node> -- chroot /host iptables -F 2. wait 35 seconds 3. iptables-save -t filter Actual results: mutiple OPENSHIT-FIREWALL-FORWARD and -ALLOW rules are missing the VXLAN dport 4789 rule is missing the tun0 rule is missing The following rules are absent: -A OPENSHIFT-FIREWALL-FORWARD -d 10.128.0.0/14 -m comment --comment "forward traffic from SDN" -j ACCEPT -A OPENSHIFT-FIREWALL-FORWARD -s 10.128.0.0/14 -m comment --comment "forward traffic to SDN" -j ACCEPT -A OPENSHIFT-FIREWALL-ALLOW -p udp -m udp --dport 4789 -m comment --comment "VXLAN incoming" -j ACCEPT -A OPENSHIFT-FIREWALL-ALLOW -i tun0 -m comment --comment "from SDN to localhost" -j ACCEPT Expected results: The following rules are present: -A OPENSHIFT-FIREWALL-FORWARD -d 10.128.0.0/14 -m comment --comment "forward traffic from SDN" -j ACCEPT -A OPENSHIFT-FIREWALL-FORWARD -s 10.128.0.0/14 -m comment --comment "forward traffic to SDN" -j ACCEPT -A OPENSHIFT-FIREWALL-ALLOW -p udp -m udp --dport 4789 -m comment --comment "VXLAN incoming" -j ACCEPT -A OPENSHIFT-FIREWALL-ALLOW -i tun0 -m comment --comment "from SDN to localhost" -j ACCEPT Additional info:
Further investigating part is we are still able to communicate udp/tcp traffic across the nodes with "vxlan incoming" rule being absent.
You can recover by restarting the sdn... but ... don't delete the rules?
Thanks Ben. The rules can be recovered if we restart sdn pods but the way we test on 4.x is oc rsh <sdn_pod> # iptables --flush Doing above is causing this issue. We expected flush to delete all rules in chain or all chains?
*We expected flush to delete all rules in chain or all chains? and the expected all chains recovery?
To clarify this a regression, the test used to pass. Will attempt to reproduce on 4.3.
This is a regression from 4.3.0-0.nightly-2020-03-04-235307 on 4.3.0-0.nightly-2020-03-04-235307 after an iptables -F the iptables rules are fully restored in ~31 seconds. Did the iptables sync period change from 4.3 to 4.4 maybe?
(In reply to Anurag saxena from comment #4) > Thanks Ben. The rules can be recovered if we restart sdn pods but the way we > test on 4.x is > > oc rsh <sdn_pod> > # iptables --flush > > Doing above is causing this issue. > We expected flush to delete all rules in chain or all chains? Right, so the current behavior is that Kubernetes will recover from "systemctl restart iptables" or "firewall-cmd --reload", but it doesn't recover from arbitrary deletion of rules because we never actually cared about recovering from arbitrary deletion of rules, we only cared about recovering from firewall reloads. I think RHCOS has iptables.service? So you could do oc adm debug node/NODENAME chroot /host systemctl restart iptables if not, then the way to emulate it is: for table in filter mangle nat raw security; do iptables --flush -t $table done (ie, you have to actually flush all of the tables, or it won't necessarily notice) Note that this change is being backported all the way to 3.11 so you will need to update the tests on older releases too.
Thanks Dan Winship for suggestions. So I performed the following steps on latest 4.4 nightly but found that rules still miss to re-appear fully specially those one mentioned in description by Ross. On RHCOS , "Unit iptables.service not found" and "firewall-cmd" is also absent so i took an emulated approach Steps taken 1) oc debug node/<NODE> 2) Checked rules by `iptables -S` 3) for table in filter mangle nat raw security; do iptables --flush -t $table;done 4) Checked rules again by `iptables -S` (missing those 4 rules mentioned in bug description)
(In reply to Anurag saxena from comment #10) > 4) Checked rules again by `iptables -S` (missing those 4 rules mentioned in > bug description) I assume you gave it 10 seconds or so to recover before checking? Also, to clarify, in the initial report Ross also said: > the VXLAN dport 4789 rule is missing > the tun0 rule is missing are those still missing as well? Can you get sdn logs from trying to reproduce it using the new reproducer? (Maybe at log level 4?)
Created attachment 1668681 [details] sdn pod logs at V4
> 6) oc logs <sdn_pod> --loglevel=4 > sdn_logs_log_level_4 (attached) er, no, I meant the logs of the SDN pod while the SDN pod is running at --loglevel=4
sorry, my bad. You need to both flush and delete the tables. So: for table in filter mangle nat raw security; do iptables -t $table -F iptables -t $table -X done *THEN* it will notice and reload the rules.
Have a try with above comment 17. works well for me.
Verified this bug
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409