Bug 1734321
| Summary: | iptables/nft compat mode does not handle reject rules correctly | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ricardo Carrillo Cruz <ricarril> |
| Component: | iptables | Assignee: | Phil Sutter <psutter> |
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-daemons |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.6 | CC: | dcbw, iptables-maint-list, todoleza |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-29 14:33:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We saw this on troubleshooting https://bugzilla.redhat.com/show_bug.cgi?id=1711538 (we are from OpenShift SDN team). Hi Ricardo, (In reply to Ricardo Carrillo Cruz from comment #2) > We saw this on troubleshooting > https://bugzilla.redhat.com/show_bug.cgi?id=1711538 (we are from OpenShift > SDN team). You are reporting this for RHEL8, right? RHEL7 doesn't ship with iptables-nft. I can't reproduce the behaviour with iptables-1.8.2-14.el8.x86_64: # iptables -N foo # iptables -A foo -m comment --comment "ricky test" -p tcp --destination 1.1.1.1 -j REJECT # iptables -L foo Chain foo (0 references) target prot opt source destination REJECT tcp -- anywhere one.one.one.one /* ricky test */ reject-with icmp-port-unreachable # nft list ruleset | grep 1.1.1.1 meta l4proto tcp ip daddr 1.1.1.1 counter packets 0 bytes 0 reject Even if you don't see 'reject' verdict in nft listing, this is likely just a display issue. If 'iptables -L' shows REJECT target, you're good. In order to find out why your traffic doesn't hit that rule, you could check iptables counters (iptables -vnL). sh-4.2# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo) (In reply to Ricardo Carrillo Cruz from comment #4) > sh-4.2# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.6 (Maipo) Weird. What does 'rpm -q iptables' print? Here it goes: <snip> sh-4.2# rpm -q iptables iptables-1.4.21-28.el7.x86_64 </snip? (In reply to Ricardo Carrillo Cruz from comment #6) > Here it goes: > > <snip> > > sh-4.2# rpm -q iptables > iptables-1.4.21-28.el7.x86_64 > > </snip? Thanks. This completely mismatches the pasted version output: > sh-4.2# iptables --version > iptables v1.8.2 (nf_tables) One (hopefully) last request: Please paste the output of 'which iptables'. sh-4.2# which iptables /usr/sbin/iptables Ok, so looking at other BZs, it seems the mismatch is due to https://bugzilla.redhat.com/show_bug.cgi?id=1691439. sh-4.2# cat /host/etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.2 sh-4.2# uname -a Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux (In reply to Ricardo Carrillo Cruz from comment #9) > Ok, so looking at other BZs, it seems the mismatch is due to > https://bugzilla.redhat.com/show_bug.cgi?id=1691439. > > sh-4.2# cat /host/etc/redhat-release > Red Hat Enterprise Linux CoreOS release 4.2 > sh-4.2# uname -a > Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24 > UTC 2019 x86_64 x86_64 x86_64 GNU/Linux So you're running a RHEL7 container on a RHEL8 host but still (somehow) call host's iptables binary? Could you please explain how this setup *exactly* looks like? (In reply to Phil Sutter from comment #10) > (In reply to Ricardo Carrillo Cruz from comment #9) > > Ok, so looking at other BZs, it seems the mismatch is due to > > https://bugzilla.redhat.com/show_bug.cgi?id=1691439. > > > > sh-4.2# cat /host/etc/redhat-release > > Red Hat Enterprise Linux CoreOS release 4.2 > > sh-4.2# uname -a > > Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24 > > UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > So you're running a RHEL7 container on a RHEL8 host but still (somehow) call > host's iptables binary? Could you please explain how this setup *exactly* > looks like? Yes, this is exactly the setup in OpenShift 4.x. The hosts are RHCOS 8 but the container images are built on RHEL7 for now. Since RHEL7 doesn't ship iptables-nft we have to use the host-installed versions to make sure we're using the same legacy/nft setup as the host is. This isn't inside a network namespace (otherwise we wouldn't care that much); the container is only used for process and filesystem isolation. So we bind-mount the hosts bindir into the container's filesystem (at a different locatioN) and then have some wrapper scripts that actually exec the host-mounted iptables-nft binaries. Hi Dan, (In reply to Dan Williams from comment #11) > (In reply to Phil Sutter from comment #10) [...] > > So you're running a RHEL7 container on a RHEL8 host but still (somehow) call > > host's iptables binary? Could you please explain how this setup *exactly* > > looks like? > > Yes, this is exactly the setup in OpenShift 4.x. The hosts are RHCOS 8 but > the container images are built on RHEL7 for now. > > Since RHEL7 doesn't ship iptables-nft we have to use the host-installed > versions to make sure we're using the same legacy/nft setup as the host is. > This isn't inside a network namespace (otherwise we wouldn't care that > much); the container is only used for process and filesystem isolation. So > we bind-mount the hosts bindir into the container's filesystem (at a > different locatioN) and then have some wrapper scripts that actually exec > the host-mounted iptables-nft binaries. Thanks for the insight! Unrelated to this ticket but worth noting: In Fedora, /usr/sbin/iptables symlinks to /etc/alternatives/iptables. So there one needs to bind-mount more than just bindir and use something like chroot to make sure symlinks won't point to outside places. Back to topic: Assuming the functional problem (packets won't hit the rule they are expected to) is unrelated to iptables itself (but merely a matter of broken setup) and the purely cosmectic problem of nft not displaying xtables reject verdict doesn't happen with recent RHEL8 iptables package, I'm closing the ticket. Feel free to reopen in case you disagree. Cheers, Phil Hi there, sorry for delay, I was out in vacation.
The rule is indeed matched, earlier pastes are bogus since iptables/nft commands where not run
on the node hosting the pod running the wget command:
[ricky@ricky-laptop ~]$ cat /tmp/test-service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8080
[ricky@ricky-laptop ~]$ oc create -f /tmp/test-service.yaml
service/my-service created
[ricky@ricky-laptop ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-130-235.ec2.internal Ready worker 20m v1.14.0+f667219f4
ip-10-0-134-73.ec2.internal Ready master 25m v1.14.0+f667219f4
ip-10-0-135-156.ec2.internal Ready master 24m v1.14.0+f667219f4
ip-10-0-140-105.ec2.internal Ready worker 20m v1.14.0+f667219f4
ip-10-0-144-208.ec2.internal Ready master 24m v1.14.0+f667219f4
ip-10-0-150-179.ec2.internal Ready worker 20m v1.14.0+f667219f4
[ricky@ricky-laptop ~]$ oc create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
deployment.apps/hello-node created
[ricky@ricky-laptop ~]$ oc describe pod hello-node-78cd77d68f-zmr67 | grep Node
Node: ip-10-0-130-235.ec2.internal/10.0.130.235
Node-Selectors: <none>
[ricky@ricky-laptop ~]$ oc -n openshift-sdn get pods -l app=sdn --field-selector spec.nodeName=ip-10-0-130-235.ec2.internal
NAME READY STATUS RESTARTS AGE
sdn-zlqcf 1/1 Running 0 19m
Now, open a session on the hello-node pod, other session on sdn-zlqcf:
hello-node
----------
# wget my-service
converted 'http://my-service' (ANSI_X3.4-1968) -> 'http://my-service' (UTF-8)
--2019-08-20 08:29:51-- http://my-service/
Resolving my-service (my-service)... 172.30.121.3
Connecting to my-service (my-service)|172.30.121.3|:80...
sdn-zlqcf
---------
Chain KUBE-SERVICES (3 references)
pkts bytes target prot opt in out source destination
3 180 REJECT tcp -- any any anywhere ip-172-30-121-3.ec2.internal /* default/my-service: has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
Running a tcpdump from hello-node during the wget shows only SYN packets outgoing, no SYN-ACK reply:
# tcpdump host 172.30.121.3
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:33:13.323465 IP hello-node-78cd77d68f-zmr67.47536 > my-service.default.svc.cluster.local.http: Flags [S], seq 2652938099, win 26733, options [mss 8911,sackOK,TS val 1564345239 ecr 0,nop,wscale 7], length 0
08:33:50.715119 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564382631 ecr 0,nop,wscale 7], length 0
08:33:51.723454 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564383640 ecr 0,nop,wscale 7], length 0
08:33:53.771453 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564385688 ecr 0,nop,wscale 7], length 0
08:33:57.803449 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564389720 ecr 0,nop,wscale 7], length 0
08:34:06.059459 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564397976 ecr 0,nop,wscale 7], length 0
Hi Ricardo, Please note that you filed this ticket against iptables component, but the output you pasted does not indicate a problem in that area. Did I miss something? Can you maybe come up with a minimal reproducer exposing the problem in iptables? Cheers, Phil I can try to take a minimal reproducer based off the RHEL used in OCP. However, if you disregard the output from openshift commands, you can see iptables commands output. You can see that there's no reject statement. You say is likely a cosmetic issue, but as you can see in my troubleshooting the rule in question is hit, yet IPTables does not return back anything, just SYN packets from sender. Ricardo, (In reply to Ricardo Carrillo Cruz from comment #15) > I can try to take a minimal reproducer based off the RHEL used in OCP. > > However, if you disregard the output from openshift commands, you can see > iptables commands output. > You can see that there's no reject statement. You say is likely a cosmetic > issue, but as you can see in my > troubleshooting the rule in question is hit, yet IPTables does not return > back anything, just SYN packets > from sender. I am really having a hard time trying to help you. I don't have the slightest idea of how your setup looks like, all you told me is there is a reject rule and you don't see the ICMP replies it should cause. Is there a possibility for me to observe the problem live? Cheers, Phil Let me better try getting a minimal reproducer, cos as I mentioned our clusters are pruned periodically and it would be a pain to just use that. Hi Phil
I'm really puzzled on this one.
As you, I cannot reproduce this on iptables 1.8.2 in a Fedora 30 box:
<snip>
[root@localhost ~]# iptables-nft -A INPUT -m comment --comment "ricky test2" -p tcp --destination 1.1.1.1 -j REJECT
[root@localhost ~]# iptables-nft -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
REJECT tcp -- anywhere one.one.one.one /* ricky test2 */ reject-with icmp-port-unreachable
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them
[root@localhost ~]# nft list ruleset
table ip filter {
chain INPUT {
type filter hook input priority filter; policy accept;
meta l4proto tcp ip daddr 1.1.1.1 counter packets 0 bytes 0 reject comment "ricky test2"
}
chain FORWARD {
type filter hook forward priority filter; policy accept;
}
chain OUTPUT {
type filter hook output priority filter; policy accept;
}
}
[root@localhost ~]# iptables-nft --version
iptables v1.8.2 (nf_tables)
</snip>
I don't know why if using the same iptables-nft version in Fedora the reject statement is correctly added, whereas in OCP is not.
You mentioned that it may be most likely cosmetic, and if it shows in iptables REJECT then it should be fine. However, is there a bug or commit
that you know about this cosmetic issue? Even so, I would have expected to see same behaviour on both environments since same iptables-nft is used.
Any directions you can give to me to debug this further for you would be great.
Hi Ricardo, (In reply to Ricardo Carrillo Cruz from comment #18) > I don't know why if using the same iptables-nft version in Fedora the reject > statement is correctly added, whereas in OCP is not. It is correctly added in both versions. The difference is in listing the ruleset with nft command which on OCP doesn't support xtables match printing. That was fixed with nftables-0.9.0-3.el8 by calling configure with '--with-xtables' flag. Cheers, Phil Ricardo, (In reply to Ricardo Carrillo Cruz from comment #15) > I can try to take a minimal reproducer based off the RHEL used in OCP. > > However, if you disregard the output from openshift commands, you can see > iptables commands output. > You can see that there's no reject statement. You say is likely a cosmetic > issue, but as you can see in my > troubleshooting the rule in question is hit, yet IPTables does not return > back anything, just SYN packets > from sender. Any update here? If a REJECT rule is hit but you don't see respective packets, maybe there's a routing issue? Cheers, Phil Hey Phil Apologies for the delay. Since you confirmed it's not an issue with iptables/nft , I'm closing. Need to circle back cos in K8S upstream they are hitting something similar, need to reach them out for more feedback. THanks (In reply to Ricardo Carrillo Cruz from comment #21) > Apologies for the delay. No problem, Ricardo. Thanks for clarifying! |
Description of problem: iptables/nft compat mode does not handle DROP rules correctly Version-Release number of selected component (if applicable): sh-4.2# iptables --version iptables v1.8.2 (nf_tables) How reproducible: Always Steps to Reproduce: sh-4.2# iptables -t filter -A KUBE-SERVICES -m comment --comment "ricky test" -p tcp --destination 1.1.1.1 -j REJECT sh-4.2# nft list ruleset |grep 1.1.1.1 meta l4proto tcp ip daddr 1.1.1.1 counter packets 0 bytes 0 Actual results: The rule in nftables is added without reject. Expected results: The rule in nftables was added with reject statement. Additional info: Creating a drop rule with iptables shows the correct rule in nft: sh-4.2# iptables -t filter -A KUBE-SERVICES -m comment --comment "ricky test" -p tcp --destination 1.1.1.2 -j DROP sh-4.2# nft list ruleset |grep 1.1.1.2 meta l4proto tcp ip daddr 1.1.1.2 counter packets 0 bytes 0 drop