Can we get the routes info from before the change ? Or at least know which version was being before in order to extract those ? @adubey I'm mostly a bit confused about the ask here: is it about additive default-route behavior ? (i.e. keeping both the eth0 and net1 ifaces with a default route ?)
Hello Miguel, Thank you so much for looking into this. They were using v4.8.36 when RoutingTable already had the entries for NodePort traffic. It's about the missing routing entry as when the customer manually added the entry traffic starts to receive. It might be source IP address for the NodePort traffic within the pod network in v4.8.36 and hence the routing table already had entries for the pod network. Let me know if any other information is needed.
Thanks. Let's see what Doug comes up with. (mostly just clearing out the `needinfo` flag).
Hi Team, Is there an update on this issue? The customer is expecting a response from our end. Can we please prioritize this?
After some review of bridge CNI between versions -- I don't believe that there is a significant change that would cause this. I think we'll need some analysis from OVN-K team to see if there's a possibility that a change to nodePort services caused this issue. Thanks.
Hi Team, Is there an update we can state to customer? They are really waiting for a response from us.
Hi @mduarted @surya I am attaching the route info as shared by the customer today Output of ip route in the pod, OCP v4.10: ~~~ default via 199.219.44.1 dev net1 172.18.0.0/15 via 172.18.19.1 dev eth0 172.18.19.0/25 dev eth0 proto kernel scope link src 172.18.19.12 192.168.48.0/20 via 172.18.19.1 dev eth0 199.219.44.0/24 dev net1 proto kernel scope link src 199.219.44.15 ~~~ That is for a pod for which the 2nd ip address is 199.219.44.15. Pod network is 172.18.0.0/15 and service ip network is 192.168.48.0/20 They deployed a OCP v4.8.36 cluster to share the below routing table info: ~~~ default via 140.223.56.1 dev net1 140.223.56.0/24 dev net1 proto kernel scope link src 140.223.56.206 172.18.0.0/15 via 172.18.9.1 dev eth0 172.18.9.0/25 dev eth0 proto kernel scope link src 172.18.9.64 192.168.48.0/20 via 172.18.9.1 dev eth0 ~~~ That is for a pod for which the 2nd ip address is 140.223.56.206. Pod network is 172.18.0.0/15 and service ip network is 192.168.48.0/20 This is what they said with sharing these details ~~~ I don't recall there being anything much different in the IP routing tables between the two versions. The problem is the the IP address of the traffic coming into the pod changed (to a 100.64 ip address...prior to 4.10 it was an ip address from the pod subnet). However the IP routing table doesn't have an entry for this 100.64.x.x range like it does for the pod subnet (and the service ip subnet). ~~~
The 100.64.x.x range is how we started routing service traffic starting from OCP 4.8 for shared gateway. We adopted the same strategy for LGW from 4.10 to not have to use the DGP anymore and bring both the modes closer. So before those versions, this topology with making the secondary interface on the pod as the default route just worked as pure coincidence or luck. It was not intended to work that way since one is consciously setting which default route they want the traffic to go out through and in the previous versions the topology just happened to allow it to work that way. Speaking with my team lead, we have agreed that this was never supported from the start and if this is needed, then please open a RFE, closing this bug since there is nothing more we can do here.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days