kube-proxy also uses one bit of the packet mark to indicate when a packet will need to be masqueraded. However, it shouldn't be marking that on packets that are destined for an egress IP. Could they try to figure out what iptables rule is getting hit that results in the packet being sent to "-j KUBE-MARK-MASQ" ?
I don't think that rule is marking the packet. I undestand you're referring to this rule: $ head -2 iptables_-t_nat_-nvL ; grep 'MARK or 0x1' iptables_-t_nat_-nvL Chain PREROUTING (policy ACCEPT 10M packets, 622M bytes) pkts bytes target prot opt in out source destination 0 0 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x1 But it has 0 matches. I don't see any rule that could possibly cause this behavior: $ grep -Pi '0x(.*)?[13579BDE]' iptables_-* iptables_-t_filter_-nvL: 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x1/0x1 iptables_-t_nat_-nvL: 0 0 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x1 iptables_-t_nat_-nvL: 0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x1/0x1 iptables_-vnxL: 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x1/0x1 I don't see anywhere in openvswitch the packet being marked in a way that could set the first bit to 1 $ cat ovs-ofctl_-O_OpenFlow13_dump-flows_br0 | grep pkt_mark cookie=0x0, duration=668061.037s, table=100, n_packets=20084153, n_bytes=6475945226, priority=100,ip,reg0=0xfd592 actions=set_field:12:98:03:84:35:80->eth_dst,set_field:0xfd592->pkt_mark,goto_table:101 Could you please have a look at attachment 1577623 [details] ? I'm not so familiar with iptables mark, but I don't see a rule that could cause this. Anyway it makes sense to me that there 1st bit is set to 1 rather than having it increased in one, I will ask the customer to modify the VNID of the netnamespace to an odd number.
> I will ask the customer to modify the VNID of the netnamespace to an odd number. Note that the expected result of doing that is that the ovs flow will set pkt_mark to (VNID - 1) + 0x1000000. eg, if you set the VNID to 1037715, you'll get a pkt_mark of 0x10fd592, not 0xfd593. (Because OpenShift wants to avoid using the bit that kube-proxy is using.)
Sorry for the delay in getting back to this. It seems that there are some incompatibilities in the iptables rules created for LoadBalancer/Ingress services, and the rules created for egress IPs. This means pods in namespaces using egress IPs may not be able to connect to load balancer IPs. However, load balancer IPs are intended for use from outside the cluster anyway; pods inside the cluster can just connect to the service by its internal service IP, which will result in bouncing the packet across the network fewer times as well (The packet will go client-node -> server-node, rather than client-node -> egress-node -> load-balancer -> server-node).
> please note that this is also being followed via BZ#1762580, Ok, we do not need two bugs actively tracking this. Closing this one as a dup; when a fix is available we can open new bugs for backports. *** This bug has been marked as a duplicate of bug 1762580 ***