Description of problem: Compute instances on GCP are not passing load balancer health checks because the route table in the VM is not getting modified properly Version-Release number of selected component (if applicable): GCP image: rhcos-420-8-20190611-0 How reproducible: Always Steps to Reproduce: 1. Create a compute instance on GCP that listens on a specific port (6443 in the case of the bootstrap machine). 2. Create a load balancer that points to the instance with a TCP health check for that port. 3. Wait for the load balancer to report the instance as healthy. Actual results: The instance never reports as healthy Expected results: The instance reports as healthy because the port is open. Additional info: On the machine, this is the status of the gcp-routes systemd unit: # systemctl status gcp-routes ● gcp-routes.service - Update GCP routes for forwarded IPs. Loaded: loaded (/usr/lib/systemd/system/gcp-routes.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-07-12 16:29:38 UTC; 11min ago Main PID: 1215 (bash) Tasks: 2 (limit: 26213) Memory: 3.5M CGroup: /system.slice/gcp-routes.service ├─1215 /bin/bash /sbin/gcp-routes.sh └─6697 sleep 30 Jul 12 16:39:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161". Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6 Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161 Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161". Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6 Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161 Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161". Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6 Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161 Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161". The route table does not show the external ip: # ip route list table local local 10.0.0.5 dev ens4 proto kernel scope host src 10.0.0.5 broadcast 10.0.0.5 dev ens4 proto kernel scope link src 10.0.0.5 broadcast 10.88.0.0 dev cni0 proto kernel scope link src 10.88.0.1 linkdown local 10.88.0.1 dev cni0 proto kernel scope host src 10.88.0.1 broadcast 10.88.255.255 dev cni0 proto kernel scope link src 10.88.0.1 linkdown broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 This is a run of /sbin/gcp-routes.sh with bash debugging: # bash -x /sbin/gcp-routes.sh + declare -A routes + : + run + net_path=network-interfaces/ ++ curler network-interfaces/ ++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/ + for vif in $(curler ${net_path}) ++ curler network-interfaces/0/mac ++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/mac + hw_addr=42:01:0a:00:00:05 + fwip_path=network-interfaces/0/forwarded-ips/ ++ get_ifname 42:01:0a:00:00:05 ++ sysfs_path=/sys/class/net +++ find /sys/class/net -maxdepth 1 -mindepth 1 ++ for dev in $(find ${sysfs_path} -maxdepth 1 -mindepth 1) ++ local mac=42:01:0a:00:00:05 +++ basename /sys/class/net/ens4 ++ local name=ens4 ++ '[' 42:01:0a:00:00:05 == 42:01:0a:00:00:05 ']' ++ echo ens4 ++ return + dev_name=ens4 ++ curler network-interfaces/0/forwarded-ips/ ++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips/ + for level in $(curler ${fwip_path}) ++ curler network-interfaces/0/forwarded-ips//0 ++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips//0 + for fwip in $(curler ${fwip_path}/${level}) + echo 'Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6' Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6 + routes[$dev_name]+=10.0.0.6 + for level in $(curler ${fwip_path}) ++ curler network-interfaces/0/forwarded-ips//1 ++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips//1 + for fwip in $(curler ${fwip_path}/${level}) + echo 'Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161' Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161 + routes[$dev_name]+=34.74.167.161 + set_routes ens4 + local dev=ens4 + read -a dev_routes ++ ip route show dev ens4 table local proto 66 ++ awk '{print$2}' + for route in ${dev_routes[@]} + ip route replace to local 10.0.0.634.74.167.161 dev ens4 proto 66 Error: any valid prefix is expected rather than "10.0.0.634.74.167.161". + unset dev_routes + routes[$dev_name]= + unset hw_addr + unset fwip_path + unset dev_name + sleep 30
gcp-routes.service came from https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/363/diffs
This is blocking progress on GCP.
*** Bug 1729575 has been marked as a duplicate of this bug. ***
Verified with 4.2.0-0.nightly-2019-08-01-111413. ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-01-111413 True False 70m Cluster version is 4.2.0-0.nightly-2019-08-01-111413 $ oc get nodes NAME STATUS ROLES AGE VERSION miabbo-xbrjk-m-0.c.openshift-gce-devel.internal Ready master 88m v1.14.0+c569285e9 miabbo-xbrjk-m-1.c.openshift-gce-devel.internal Ready master 89m v1.14.0+c569285e9 miabbo-xbrjk-m-2.c.openshift-gce-devel.internal Ready master 88m v1.14.0+c569285e9 miabbo-xbrjk-w-a-qmzf2.c.openshift-gce-devel.internal Ready worker 77m v1.14.0+c569285e9 miabbo-xbrjk-w-b-v9dsh.c.openshift-gce-devel.internal Ready worker 77m v1.14.0+c569285e9 miabbo-xbrjk-w-c-cx464.c.openshift-gce-devel.internal Ready worker 77m v1.14.0+c569285e9 $ oc debug node/miabbo-xbrjk-m-0.c.openshift-gce-devel.internal Starting pod/miabbo-xbrjk-m-0copenshift-gce-develinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.3 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:672656723e9695788c04d90e235430769dae90d38cafed0521fb0415a2e89bcc CustomOrigin: Managed by machine-config-operator Version: 42.80.20190731.2 (2019-07-31T13:52:59Z) pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e57683aef2630a24a7fef421f148135ff0bc22cbb1465801fa2ecce703687a5 CustomOrigin: Image generated via coreos-assembler Version: 42.80.20190725.1 (2019-07-25T13:53:07Z) sh-4.4# systemctl status gcp-routes ● gcp-routes.service - Update GCP routes for forwarded IPs. Loaded: loaded (/usr/lib/systemd/system/gcp-routes.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-08-01 16:28:02 UTC; 1h 34min ago Main PID: 1038 (bash) Tasks: 2 (limit: 26213) Memory: 3.9M CPU: 22.701s CGroup: /system.slice/gcp-routes.service ├─ 1038 /bin/bash /usr/sbin/gcp-routes.sh └─65268 sleep 30 Aug 01 17:57:53 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 17:58:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 17:58:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 17:59:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 17:59:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 18:00:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 18:00:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 18:01:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 18:01:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 Aug 01 18:02:25 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124 sh-4.4# ip route list table local local 10.0.0.3 dev ens4 proto kernel scope host src 10.0.0.3 broadcast 10.0.0.3 dev ens4 proto kernel scope link src 10.0.0.3 broadcast 10.128.0.0 dev tun0 proto kernel scope link src 10.128.0.1 local 10.128.0.1 dev tun0 proto kernel scope host src 10.128.0.1 broadcast 10.128.1.255 dev tun0 proto kernel scope link src 10.128.0.1 local 34.68.45.124 dev ens4 proto 66 scope host broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 ``` Since the original problem stated that this was preventing a cluster install from happening and now I have a successful cluster...LGTM.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days