RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1734321 - iptables/nft compat mode does not handle reject rules correctly
Summary: iptables/nft compat mode does not handle reject rules correctly
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: iptables
Version: 7.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Phil Sutter
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-30 08:55 UTC by Ricardo Carrillo Cruz
Modified: 2019-11-12 11:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-29 14:33:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ricardo Carrillo Cruz 2019-07-30 08:55:57 UTC
Description of problem:

iptables/nft compat mode does not handle DROP rules correctly

Version-Release number of selected component (if applicable):

sh-4.2# iptables --version
iptables v1.8.2 (nf_tables)

How reproducible:

Always

Steps to Reproduce:
sh-4.2# iptables -t filter -A KUBE-SERVICES -m comment --comment "ricky test" -p tcp --destination 1.1.1.1 -j REJECT
sh-4.2# nft list ruleset |grep 1.1.1.1
meta l4proto tcp ip daddr 1.1.1.1 counter packets 0 bytes 0

Actual results:

The rule in nftables is added without reject.

Expected results:

The rule in nftables was added with reject statement.


Additional info:

Creating a drop rule with iptables shows the correct rule in nft:

sh-4.2# iptables -t filter -A KUBE-SERVICES -m comment --comment "ricky test" -p tcp --destination 1.1.1.2 -j DROP  
sh-4.2# nft list ruleset |grep 1.1.1.2
                meta l4proto tcp ip daddr 1.1.1.2 counter packets 0 bytes 0 drop

Comment 2 Ricardo Carrillo Cruz 2019-07-30 09:02:44 UTC
We saw this on troubleshooting https://bugzilla.redhat.com/show_bug.cgi?id=1711538 (we are from OpenShift SDN team).

Comment 3 Phil Sutter 2019-07-30 09:47:10 UTC
Hi Ricardo,

(In reply to Ricardo Carrillo Cruz from comment #2)
> We saw this on troubleshooting
> https://bugzilla.redhat.com/show_bug.cgi?id=1711538 (we are from OpenShift
> SDN team).

You are reporting this for RHEL8, right? RHEL7 doesn't ship with iptables-nft.

I can't reproduce the behaviour with iptables-1.8.2-14.el8.x86_64:

# iptables -N foo
# iptables -A foo -m comment --comment "ricky test" -p tcp --destination 1.1.1.1 -j REJECT
# iptables -L foo
Chain foo (0 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             one.one.one.one      /* ricky test */ reject-with icmp-port-unreachable
# nft list ruleset | grep 1.1.1.1
		meta l4proto tcp ip daddr 1.1.1.1  counter packets 0 bytes 0 reject

Even if you don't see 'reject' verdict in nft listing, this is likely just a display issue. If 'iptables -L' shows REJECT target, you're good.

In order to find out why your traffic doesn't hit that rule, you could check iptables counters (iptables -vnL).

Comment 4 Ricardo Carrillo Cruz 2019-07-30 09:51:18 UTC
sh-4.2# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Comment 5 Phil Sutter 2019-07-30 09:54:12 UTC
(In reply to Ricardo Carrillo Cruz from comment #4)
> sh-4.2# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.6 (Maipo)

Weird. What does 'rpm -q iptables' print?

Comment 6 Ricardo Carrillo Cruz 2019-07-30 09:56:07 UTC
Here it goes:

<snip>

sh-4.2# rpm -q iptables
iptables-1.4.21-28.el7.x86_64

</snip?

Comment 7 Phil Sutter 2019-07-30 10:02:04 UTC
(In reply to Ricardo Carrillo Cruz from comment #6)
> Here it goes:
> 
> <snip>
> 
> sh-4.2# rpm -q iptables
> iptables-1.4.21-28.el7.x86_64
> 
> </snip?

Thanks. This completely mismatches the pasted version output:

> sh-4.2# iptables --version
> iptables v1.8.2 (nf_tables)

One (hopefully) last request: Please paste the output of 'which iptables'.

Comment 8 Ricardo Carrillo Cruz 2019-07-30 10:03:37 UTC
sh-4.2# which iptables
/usr/sbin/iptables

Comment 9 Ricardo Carrillo Cruz 2019-07-30 10:06:42 UTC
Ok, so looking at other BZs, it seems the mismatch is due to https://bugzilla.redhat.com/show_bug.cgi?id=1691439.

sh-4.2# cat /host/etc/redhat-release 
Red Hat Enterprise Linux CoreOS release 4.2
sh-4.2# uname -a
Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Comment 10 Phil Sutter 2019-07-30 10:17:15 UTC
(In reply to Ricardo Carrillo Cruz from comment #9)
> Ok, so looking at other BZs, it seems the mismatch is due to
> https://bugzilla.redhat.com/show_bug.cgi?id=1691439.
> 
> sh-4.2# cat /host/etc/redhat-release 
> Red Hat Enterprise Linux CoreOS release 4.2
> sh-4.2# uname -a
> Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24
> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

So you're running a RHEL7 container on a RHEL8 host but still (somehow) call host's iptables binary? Could you please explain how this setup *exactly* looks like?

Comment 11 Dan Williams 2019-08-05 18:58:34 UTC
(In reply to Phil Sutter from comment #10)
> (In reply to Ricardo Carrillo Cruz from comment #9)
> > Ok, so looking at other BZs, it seems the mismatch is due to
> > https://bugzilla.redhat.com/show_bug.cgi?id=1691439.
> > 
> > sh-4.2# cat /host/etc/redhat-release 
> > Red Hat Enterprise Linux CoreOS release 4.2
> > sh-4.2# uname -a
> > Linux ip-10-0-142-204 4.18.0-80.4.2.el8_0.x86_64 #1 SMP Fri Jun 14 13:20:24
> > UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> So you're running a RHEL7 container on a RHEL8 host but still (somehow) call
> host's iptables binary? Could you please explain how this setup *exactly*
> looks like?

Yes, this is exactly the setup in OpenShift 4.x. The hosts are RHCOS 8 but the container images are built on RHEL7 for now.

Since RHEL7 doesn't ship iptables-nft we have to use the host-installed versions to make sure we're using the same legacy/nft setup as the host is. This isn't inside a network namespace (otherwise we wouldn't care that much); the container is only used for process and filesystem isolation. So we bind-mount the hosts bindir into the container's filesystem (at a different locatioN) and then have some wrapper scripts that actually exec the host-mounted iptables-nft binaries.

Comment 12 Phil Sutter 2019-08-08 12:40:51 UTC
Hi Dan,

(In reply to Dan Williams from comment #11)
> (In reply to Phil Sutter from comment #10)
[...]
> > So you're running a RHEL7 container on a RHEL8 host but still (somehow) call
> > host's iptables binary? Could you please explain how this setup *exactly*
> > looks like?
> 
> Yes, this is exactly the setup in OpenShift 4.x. The hosts are RHCOS 8 but
> the container images are built on RHEL7 for now.
> 
> Since RHEL7 doesn't ship iptables-nft we have to use the host-installed
> versions to make sure we're using the same legacy/nft setup as the host is.
> This isn't inside a network namespace (otherwise we wouldn't care that
> much); the container is only used for process and filesystem isolation. So
> we bind-mount the hosts bindir into the container's filesystem (at a
> different locatioN) and then have some wrapper scripts that actually exec
> the host-mounted iptables-nft binaries.

Thanks for the insight! Unrelated to this ticket but worth noting: In Fedora,
/usr/sbin/iptables symlinks to /etc/alternatives/iptables. So there one needs
to bind-mount more than just bindir and use something like chroot to make sure
symlinks won't point to outside places.

Back to topic:

Assuming the functional problem (packets won't hit the rule they are expected
to) is unrelated to iptables itself (but merely a matter of broken setup) and
the purely cosmectic problem of nft not displaying xtables reject verdict
doesn't happen with recent RHEL8 iptables package, I'm closing the ticket.

Feel free to reopen in case you disagree.

Cheers, Phil

Comment 13 Ricardo Carrillo Cruz 2019-08-20 08:44:53 UTC
Hi there, sorry for delay, I was out in vacation.
The rule is indeed matched, earlier pastes are bogus since iptables/nft commands where not run
on the node hosting the pod running the wget command:

[ricky@ricky-laptop ~]$ cat /tmp/test-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
[ricky@ricky-laptop ~]$ oc create -f /tmp/test-service.yaml 
service/my-service created

[ricky@ricky-laptop ~]$ oc get nodes
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-130-235.ec2.internal   Ready    worker   20m   v1.14.0+f667219f4
ip-10-0-134-73.ec2.internal    Ready    master   25m   v1.14.0+f667219f4
ip-10-0-135-156.ec2.internal   Ready    master   24m   v1.14.0+f667219f4
ip-10-0-140-105.ec2.internal   Ready    worker   20m   v1.14.0+f667219f4
ip-10-0-144-208.ec2.internal   Ready    master   24m   v1.14.0+f667219f4
ip-10-0-150-179.ec2.internal   Ready    worker   20m   v1.14.0+f667219f4

[ricky@ricky-laptop ~]$ oc create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
deployment.apps/hello-node created

[ricky@ricky-laptop ~]$ oc describe pod hello-node-78cd77d68f-zmr67 | grep Node      
Node:               ip-10-0-130-235.ec2.internal/10.0.130.235
Node-Selectors:  <none>   

[ricky@ricky-laptop ~]$ oc -n openshift-sdn get pods -l app=sdn --field-selector spec.nodeName=ip-10-0-130-235.ec2.internal                                                                                                                                  
NAME        READY   STATUS    RESTARTS   AGE
sdn-zlqcf   1/1     Running   0          19m

Now, open a session on the hello-node pod, other session on sdn-zlqcf:

hello-node
----------

# wget my-service
converted 'http://my-service' (ANSI_X3.4-1968) -> 'http://my-service' (UTF-8)
--2019-08-20 08:29:51--  http://my-service/
Resolving my-service (my-service)... 172.30.121.3
Connecting to my-service (my-service)|172.30.121.3|:80...


sdn-zlqcf
---------

Chain KUBE-SERVICES (3 references)                                                                                                                                                                                                                           
 pkts bytes target     prot opt in     out     source               destination                                                                                                                                                                              
    3   180 REJECT     tcp  --  any    any     anywhere             ip-172-30-121-3.ec2.internal  /* default/my-service: has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable   


Running a tcpdump from hello-node during the wget shows only SYN packets outgoing, no SYN-ACK reply:

# tcpdump host 172.30.121.3
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:33:13.323465 IP hello-node-78cd77d68f-zmr67.47536 > my-service.default.svc.cluster.local.http: Flags [S], seq 2652938099, win 26733, options [mss 8911,sackOK,TS val 1564345239 ecr 0,nop,wscale 7], length 0                                             
08:33:50.715119 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564382631 ecr 0,nop,wscale 7], length 0                                             
08:33:51.723454 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564383640 ecr 0,nop,wscale 7], length 0                                             
08:33:53.771453 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564385688 ecr 0,nop,wscale 7], length 0                                             
08:33:57.803449 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564389720 ecr 0,nop,wscale 7], length 0                                             
08:34:06.059459 IP hello-node-78cd77d68f-zmr67.48526 > my-service.default.svc.cluster.local.http: Flags [S], seq 3364694714, win 26733, options [mss 8911,sackOK,TS val 1564397976 ecr 0,nop,wscale 7], length 0

Comment 14 Phil Sutter 2019-08-21 09:42:22 UTC
Hi Ricardo,

Please note that you filed this ticket against iptables component, but the output
you pasted does not indicate a problem in that area. Did I miss something? Can
you maybe come up with a minimal reproducer exposing the problem in iptables?

Cheers, Phil

Comment 15 Ricardo Carrillo Cruz 2019-08-21 09:54:46 UTC
I can try to take a minimal reproducer based off the RHEL used in OCP.

However, if you disregard the output from openshift commands, you can see iptables commands output.
You can see that there's no reject statement. You say is likely a cosmetic issue, but as you can see in my 
troubleshooting the rule in question is hit, yet IPTables does not return back anything, just SYN packets
from sender.

Comment 16 Phil Sutter 2019-08-21 10:17:00 UTC
Ricardo,

(In reply to Ricardo Carrillo Cruz from comment #15)
> I can try to take a minimal reproducer based off the RHEL used in OCP.
> 
> However, if you disregard the output from openshift commands, you can see
> iptables commands output.
> You can see that there's no reject statement. You say is likely a cosmetic
> issue, but as you can see in my 
> troubleshooting the rule in question is hit, yet IPTables does not return
> back anything, just SYN packets
> from sender.

I am really having a hard time trying to help you. I don't have the slightest
idea of how your setup looks like, all you told me is there is a reject rule
and you don't see the ICMP replies it should cause. Is there a possibility for
me to observe the problem live?

Cheers, Phil

Comment 17 Ricardo Carrillo Cruz 2019-08-21 10:23:59 UTC
Let me better try getting a minimal reproducer, cos as I mentioned our clusters are
pruned periodically and it would be a pain to just use that.

Comment 18 Ricardo Carrillo Cruz 2019-08-23 11:28:55 UTC
Hi Phil

I'm really puzzled on this one.
As you, I cannot reproduce this on iptables 1.8.2 in a Fedora 30 box:

<snip>

[root@localhost ~]# iptables-nft -A INPUT -m comment --comment "ricky test2" -p tcp --destination 1.1.1.1 -j REJECT
[root@localhost ~]# iptables-nft -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             one.one.one.one      /* ricky test2 */ reject-with icmp-port-unreachable

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
[root@localhost ~]# nft list ruleset
table ip filter {
        chain INPUT {
                type filter hook input priority filter; policy accept;
                meta l4proto tcp ip daddr 1.1.1.1 counter packets 0 bytes 0 reject comment "ricky test2"
        }

        chain FORWARD {
                type filter hook forward priority filter; policy accept;
        }

        chain OUTPUT {
                type filter hook output priority filter; policy accept;
        }
}
[root@localhost ~]# iptables-nft --version
iptables v1.8.2 (nf_tables)

</snip>

I don't know why if using the same iptables-nft version in Fedora the reject statement is correctly added, whereas in OCP is not.
You mentioned that it may be most likely cosmetic, and if it shows in iptables REJECT then it should be fine. However, is there a bug or commit
that you know about this cosmetic issue? Even so, I would have expected to see same behaviour on both environments since same iptables-nft is used.

Any directions you can give to me to debug this further for you would be great.

Comment 19 Phil Sutter 2019-08-23 12:04:20 UTC
Hi Ricardo,

(In reply to Ricardo Carrillo Cruz from comment #18)
> I don't know why if using the same iptables-nft version in Fedora the reject
> statement is correctly added, whereas in OCP is not.

It is correctly added in both versions. The difference is in listing the
ruleset with nft command which on OCP doesn't support xtables match printing.
That was fixed with nftables-0.9.0-3.el8 by calling configure with
'--with-xtables' flag.

Cheers, Phil

Comment 20 Phil Sutter 2019-10-22 08:43:42 UTC
Ricardo,

(In reply to Ricardo Carrillo Cruz from comment #15)
> I can try to take a minimal reproducer based off the RHEL used in OCP.
> 
> However, if you disregard the output from openshift commands, you can see
> iptables commands output.
> You can see that there's no reject statement. You say is likely a cosmetic
> issue, but as you can see in my 
> troubleshooting the rule in question is hit, yet IPTables does not return
> back anything, just SYN packets
> from sender.

Any update here? If a REJECT rule is hit but you don't see respective packets,
maybe there's a routing issue?

Cheers, Phil

Comment 21 Ricardo Carrillo Cruz 2019-10-29 14:33:32 UTC
Hey Phil

Apologies for the delay.
Since you confirmed it's not an issue with iptables/nft , I'm closing.
Need to circle back cos in K8S upstream they are hitting something similar, need to reach them out
for more feedback.

THanks

Comment 22 Phil Sutter 2019-10-29 17:45:02 UTC
(In reply to Ricardo Carrillo Cruz from comment #21)
> Apologies for the delay.

No problem, Ricardo. Thanks for clarifying!


Note You need to log in before you can comment on or make changes to this bug.