Bug 1757777 - hostIP and podIP differs with hostNetwork: true
Summary: hostIP and podIP differs with hostNetwork: true
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.1.z
Hardware: x86_64
OS: All
unspecified
medium
Target Milestone: ---
: 4.3.0
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-02 12:29 UTC by Radim Vansa
Modified: 2020-05-13 21:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-13 21:26:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-05-13 21:26:27 UTC

Description Radim Vansa 2019-10-02 12:29:48 UTC
Description of problem:

OCP 4.1.14 UPI on bare-metal with multiple NICs. By setting up reverse hostname lookup in DNS properly I can make all nodes use the correct NICs (host IPs are in my 192.168.0.x range). However, it seems that when there's a pod with `hostNetwork: true` this gets PodIP that matches the default route (at least that's how it seems) instead of matching the HostIP. This is incorrect: the default route goes to the public internet and these public IPs on some nodes are not reachable from another nodes. That causes issues, e.g. not being able to scrape openshift-monitoring/node-exporters. The public IPs should be used exclusively for reaching outside world (downloading images...), internally OCP should use the interface bound to node IP.

Example info:
```
> oc get po -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP             NODE           NOMINATED NODE   READINESS GATES
node-exporter-87t54                            2/2     Running   0          12s   192.168.0.22   master2        <none>           <none>
node-exporter-j76g8                            2/2     Running   0          11s   10.1.184.117   benchserver4   <none>           <none>
node-exporter-mbwt8                            2/2     Running   0          13s   10.1.184.154   benchserver6   <none>           <none>
node-exporter-ml7pr                            2/2     Running   0          11s   192.168.0.23   master3        <none>           <none>
node-exporter-zbcz7                            2/2     Running   0          5s    192.168.0.21   master1        <none>           <none>
...
```

```
> oc get node -o wide
NAME           STATUS   ROLES    AGE   VERSION             INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
benchserver4   Ready    worker   9d    v1.13.4+3bd346709   192.168.0.80    <none>        OpenShift Enterprise                                       3.10.0-1062.1.1.el7.x86_64   cri-o://1.13.11-0.1.dev.rhaos4.1.git59b6bdb.el7-dev
benchserver6   Ready    worker   13d   v1.13.4+3bd346709   192.168.0.100   <none>        Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa)   4.18.0-80.7.2.el8_0.x86_64   cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev
master1        Ready    master   55d   v1.13.4+3bd346709   192.168.0.21    <none>        Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa)   4.18.0-80.7.2.el8_0.x86_64   cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev
master2        Ready    master   55d   v1.13.4+3bd346709   192.168.0.22    <none>        Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa)   4.18.0-80.7.2.el8_0.x86_64   cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev
master3        Ready    master   55d   v1.13.4+3bd346709   192.168.0.23    <none>        Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa)   4.18.0-80.7.2.el8_0.x86_64   cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev
```

Version-Release number of selected component (if applicable):

4.1.14

Actual results:

Nodes on benchserver4 and benchserver6 get IPs from range 10.1.184.x.

Expected results:

Nodes on benchserver4 and benchserver6 should get IPs 192.168.0.80 and 192.168.0.100, respectively.

Comment 1 Ben Bennett 2019-10-02 12:56:27 UTC
This is not a regression in 4.2.0 (and I suspect it's behaved like this forever).  Pushing to 4.3.0 to consider the solution, and then we can decide if it merits a backport.

Comment 2 Casey Callendrello 2019-11-13 13:00:59 UTC
Dan, I think you fixed this with the latest CRIO changes for ipv6, right?

Comment 3 Peter Hunt 2019-11-13 13:47:05 UTC
Yes, this has been fixed with the changes that recently got merged into 1.16

Comment 5 zhaozhanqi 2019-11-14 05:15:56 UTC
Verified this bug on 4.3.0-0.nightly-2019-11-13-233341 

<none>
openshift-monitoring                                    node-exporter-6k6dh                                               2/2     Running     0          161m   192.168.0.18          <none>           <none>
openshift-monitoring                                    node-exporter-9kmdk                                               2/2     Running     0          74m    192.168.0.29      <none>           <none>
openshift-monitoring                                    node-exporter-dgjxm                                               2/2     Running     0          161m   192.168.0.16      <none>           <none>
openshift-monitoring                                    node-exporter-h9nr4                                               2/2     Running     0          161m   192.168.0.13          <none>           <none>
openshift-monitoring                                    node-exporter-hnwtt                                               2/2     Running     0          161m   192.168.0.22          <none>           <none>
openshift-monitoring                                    node-exporter-kxc8c                                               2/2     Running     0          161m   192.168.0.20      <none>           <none>
openshift-monitoring                                    node-exporter-x6mvr                                               2/2     Running     0          161m   192.168.0.14      <none>           <none>

Comment 7 errata-xmlrpc 2020-05-13 21:26:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.