Description of problem: OCP 4.1.14 UPI on bare-metal with multiple NICs. By setting up reverse hostname lookup in DNS properly I can make all nodes use the correct NICs (host IPs are in my 192.168.0.x range). However, it seems that when there's a pod with `hostNetwork: true` this gets PodIP that matches the default route (at least that's how it seems) instead of matching the HostIP. This is incorrect: the default route goes to the public internet and these public IPs on some nodes are not reachable from another nodes. That causes issues, e.g. not being able to scrape openshift-monitoring/node-exporters. The public IPs should be used exclusively for reaching outside world (downloading images...), internally OCP should use the interface bound to node IP. Example info: ``` > oc get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-exporter-87t54 2/2 Running 0 12s 192.168.0.22 master2 <none> <none> node-exporter-j76g8 2/2 Running 0 11s 10.1.184.117 benchserver4 <none> <none> node-exporter-mbwt8 2/2 Running 0 13s 10.1.184.154 benchserver6 <none> <none> node-exporter-ml7pr 2/2 Running 0 11s 192.168.0.23 master3 <none> <none> node-exporter-zbcz7 2/2 Running 0 5s 192.168.0.21 master1 <none> <none> ... ``` ``` > oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME benchserver4 Ready worker 9d v1.13.4+3bd346709 192.168.0.80 <none> OpenShift Enterprise 3.10.0-1062.1.1.el7.x86_64 cri-o://1.13.11-0.1.dev.rhaos4.1.git59b6bdb.el7-dev benchserver6 Ready worker 13d v1.13.4+3bd346709 192.168.0.100 <none> Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa) 4.18.0-80.7.2.el8_0.x86_64 cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev master1 Ready master 55d v1.13.4+3bd346709 192.168.0.21 <none> Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa) 4.18.0-80.7.2.el8_0.x86_64 cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev master2 Ready master 55d v1.13.4+3bd346709 192.168.0.22 <none> Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa) 4.18.0-80.7.2.el8_0.x86_64 cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev master3 Ready master 55d v1.13.4+3bd346709 192.168.0.23 <none> Red Hat Enterprise Linux CoreOS 410.8.20190830.0 (Ootpa) 4.18.0-80.7.2.el8_0.x86_64 cri-o://1.13.11-0.4.dev.rhaos4.1.git59b6bdb.el8-dev ``` Version-Release number of selected component (if applicable): 4.1.14 Actual results: Nodes on benchserver4 and benchserver6 get IPs from range 10.1.184.x. Expected results: Nodes on benchserver4 and benchserver6 should get IPs 192.168.0.80 and 192.168.0.100, respectively.
This is not a regression in 4.2.0 (and I suspect it's behaved like this forever). Pushing to 4.3.0 to consider the solution, and then we can decide if it merits a backport.
Dan, I think you fixed this with the latest CRIO changes for ipv6, right?
Yes, this has been fixed with the changes that recently got merged into 1.16
Verified this bug on 4.3.0-0.nightly-2019-11-13-233341 <none> openshift-monitoring node-exporter-6k6dh 2/2 Running 0 161m 192.168.0.18 <none> <none> openshift-monitoring node-exporter-9kmdk 2/2 Running 0 74m 192.168.0.29 <none> <none> openshift-monitoring node-exporter-dgjxm 2/2 Running 0 161m 192.168.0.16 <none> <none> openshift-monitoring node-exporter-h9nr4 2/2 Running 0 161m 192.168.0.13 <none> <none> openshift-monitoring node-exporter-hnwtt 2/2 Running 0 161m 192.168.0.22 <none> <none> openshift-monitoring node-exporter-kxc8c 2/2 Running 0 161m 192.168.0.20 <none> <none> openshift-monitoring node-exporter-x6mvr 2/2 Running 0 161m 192.168.0.14 <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062