Description of problem: Customer has discovered that traffic sourced with ip from SDN subnet is being sent out of the cluster non-masqueraded: # tcpdump -s 200 -i eth0 -ln 'src net 10.254.0.0/16' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 200 bytes 17:45:36.740056 IP 10.254.6.20.52740 > 151.187.222.121.15110: Flags [R], seq 2942827236, win 0, length 0 17:48:57.663736 IP 10.254.6.20.54302 > 151.187.222.121.15110: Flags [R], seq 608540959, win 0, length 0 17:50:54.297114 IP 10.254.6.16.33072 > 10.0.26.73.14080: Flags [F.], seq 1163097179, ack 2581733031, win 232, options [nop,nop,TS val 811390342 ecr 1789995216], length 0 17:50:54.497678 IP 10.254.6.16.33072 > 10.0.26.73.14080: Flags [F.], seq 0, ack 1, win 232, options [nop,nop,TS val 811390543 ecr 1789995216], length 0 17:50:54.698725 IP 10.254.6.16.33072 > 10.0.26.73.14080: Flags [F.], seq 0, ack 1, win 232, options [nop,nop,TS val 811390744 ecr 1789995216], length 0 17:50:55.101658 IP 10.254.6.16.33072 > 10.0.26.73.14080: Flags [F.], seq 0, ack 1, win 232, options [nop,nop,TS val 811391147 ecr 1789995216], length 0 Version-Release number of selected component (if applicable): 3.4 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: It was assumed that the this might be the effect of these issues: 1. Netfilter's connection tracking of half-closed tcp connections is not working https://access.redhat.com/solutions/1427963 2. Bug 1215927 - Incomplete connection tracking of half-closed tcp connections https://bugzilla.redhat.com/show_bug.cgi?id=1215927 This unexpected traffic was blocked by inserting the next iptables rile: # iptables -I FORWARD 1 -s 10.254.0.0/16 -m conntrack --ctstate INVALID -j DROP This might be reasonable to have such a rule by default to block any invalid traffic to be sent out.
From https://access.redhat.com/support/cases/#/case/01807669, customer did: "We have changed the default SDN network CIDR accordingly: osm_cluster_network_cidr=10.254.0.0/16 " According to: https://github.com/danwinship/openshift-docs/blob/d5f85deae2c227459f8146f4cbc16c28cbef7851/install_config/configuring_sdn.adoc#renumbering-the-sdn-network. After changing osm_cluster_network_cidr, need restart master and node: systemctl restart atomic-openshift-master systemctl restart atomic-openshift-node Did not see the restart steps in https://access.redhat.com/support/cases/#/case/01807669.
Hello Weibin, This case is not about re-configuring cluster. It is up and running. The issue is that certain traffic leaving a node is not being masqueraded. Could you please clarify how restarting node-service can resolve this? Just to clarify the situation again: - masquerade rule is in place: # iptables -L POSTROUTING -nv -t nat ... 12M 712M MASQUERADE all -- * * 10.254.0.0/16 0.0.0.0/0 ... - at the same time traffic sourced with 10.254.0.0/16 is being sent out through node's physical interface. (see trace in case description) - adding rule to drop packets with invalid conntrack state resolves the issue. # iptables -I FORWARD 1 -s 10.254.0.0/16 -m conntrack --ctstate INVALID -j DROP - this all could meant that host machine cleared out connection information for some reason. It might be timing out if container does not sent any data for certain period. This needs to be checked. The fact is that internal subnet information is exposed to outside. Which is clearly a bug and might be a security issue.
Hi Alexander, Thank you for your good information. I can reproduce the sdn traffic leaking issue in my env now even I follow the steps in https://github.com/danwinship/openshift-docs/blob/d5f85deae2c227459f8146f4cbc16c28cbef7851/install_config/configuring_sdn.adoc#renumbering-the-sdn-network. #### [root@ip-172-18-8-121 ~]# tcpdump -i eth0 -nv host 20.128.0.5 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:52:37.259913 IP (tos 0x0, ttl 63, id 13932, offset 0, flags [DF], proto TCP (6), length 40) 20.128.0.5.49490 > 173.222.212.251.https: Flags [R], cksum 0x5f03 (correct), seq 3410766628, win 0, length 0 11:52:37.456406 IP (tos 0x0, ttl 63, id 14077, offset 0, flags [DF], proto TCP #### 20.128.0.5 is pod's IP address: sh-4.2# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP link/ether 1e:d8:13:5b:65:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 20.128.0.5/23 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1cd8:13ff:fe5b:65a6/64 scope link valid_lft forever preferred_lft forever sh-4.2#
PR 13680 is out for review https://github.com/openshift/origin/pull/13680
Let's get a PR open to nuke the offending rule (per eparis and danw's comments on the above PR) and see if it passes extended network testing. Then we can merge it first thing on Monday if it is clean.
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/2d9a8e38ee15b85670db51557bad0b7bc2a9f516 sdn traffic leaking out of the cluster Customer has discovered that traffic sourced with ip from SDN subnet is being sent out of the cluster non-masqueraded. tcp --ctstate INVALID packets are escaping from the SDN This change adds a FORWARD rule to DROP these packets. filter chain FORWARD -s n.clusterNetworkCIDR -m conntrack --ctstate INVALID -j DROP bug 1438762 https://bugzilla.redhat.com/show_bug.cgi?id=1438762
verified in atomic-openshift-3.6.109-1.git.0.378bacd.el7.x86_64 and didn't see the traffic sourced ip from pod, and one iptables rule below has been added: -A OPENSHIFT-FIREWALL-FORWARD -s 10.128.0.0/14 -m comment --comment "attempted resend after connection close" -m conntrack --ctstate INVALID -j DROP
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716