Bug 1729576 - RHCOS L7 Routing Support on GCP not working [NEEDINFO]
Summary: RHCOS L7 Routing Support on GCP not working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.2.0
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Ben Howard
QA Contact: Micah Abbott
URL:
Whiteboard:
: 1729575 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-12 17:06 UTC by Cesar Wong
Modified: 2019-10-16 06:29 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:29:43 UTC
Target Upstream Version:
smilner: needinfo? (behoward)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 None None None 2019-10-16 06:29:58 UTC

Description Cesar Wong 2019-07-12 17:06:20 UTC
Description of problem:
Compute instances on GCP are not passing load balancer health checks because the route table in the VM is not getting modified properly

Version-Release number of selected component (if applicable):
GCP image: rhcos-420-8-20190611-0


How reproducible:
Always

Steps to Reproduce:
1. Create a compute instance on GCP that listens on a specific port (6443 in the case of the bootstrap machine). 
2. Create a load balancer that points to the instance with a TCP health check for that port.
3. Wait for the load balancer to report the instance as healthy. 

Actual results:
The instance never reports as healthy

Expected results:
The instance reports as healthy because the port is open.

Additional info:

On the machine, this is the status of the gcp-routes systemd unit:

# systemctl status gcp-routes
● gcp-routes.service - Update GCP routes for forwarded IPs.
   Loaded: loaded (/usr/lib/systemd/system/gcp-routes.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-07-12 16:29:38 UTC; 11min ago
 Main PID: 1215 (bash)
    Tasks: 2 (limit: 26213)
   Memory: 3.5M
   CGroup: /system.slice/gcp-routes.service
           ├─1215 /bin/bash /sbin/gcp-routes.sh
           └─6697 sleep 30

Jul 12 16:39:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161".
Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6
Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161
Jul 12 16:39:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161".
Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6
Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161
Jul 12 16:40:10 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161".
Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6
Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161
Jul 12 16:40:40 cewong-n2lhl-bootstrap.c.openshift-dev-installer.internal bash[1215]: Error: any valid prefix is expected rather than "10.0.0.634.74.167.161".


The route table does not show the external ip:

# ip route list table local
local 10.0.0.5 dev ens4 proto kernel scope host src 10.0.0.5
broadcast 10.0.0.5 dev ens4 proto kernel scope link src 10.0.0.5
broadcast 10.88.0.0 dev cni0 proto kernel scope link src 10.88.0.1 linkdown
local 10.88.0.1 dev cni0 proto kernel scope host src 10.88.0.1
broadcast 10.88.255.255 dev cni0 proto kernel scope link src 10.88.0.1 linkdown
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1

This is a run of /sbin/gcp-routes.sh with bash debugging:
# bash -x /sbin/gcp-routes.sh
+ declare -A routes
+ :
+ run
+ net_path=network-interfaces/
++ curler network-interfaces/
++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/
+ for vif in $(curler ${net_path})
++ curler network-interfaces/0/mac
++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/mac
+ hw_addr=42:01:0a:00:00:05
+ fwip_path=network-interfaces/0/forwarded-ips/
++ get_ifname 42:01:0a:00:00:05
++ sysfs_path=/sys/class/net
+++ find /sys/class/net -maxdepth 1 -mindepth 1
++ for dev in $(find ${sysfs_path} -maxdepth 1  -mindepth 1)
++ local mac=42:01:0a:00:00:05
+++ basename /sys/class/net/ens4
++ local name=ens4
++ '[' 42:01:0a:00:00:05 == 42:01:0a:00:00:05 ']'
++ echo ens4
++ return
+ dev_name=ens4
++ curler network-interfaces/0/forwarded-ips/
++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips/
+ for level in $(curler ${fwip_path})
++ curler network-interfaces/0/forwarded-ips//0
++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips//0
+ for fwip in $(curler ${fwip_path}/${level})
+ echo 'Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6'
Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 10.0.0.6
+ routes[$dev_name]+=10.0.0.6
+ for level in $(curler ${fwip_path})
++ curler network-interfaces/0/forwarded-ips//1
++ curl --silent -L -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips//1
+ for fwip in $(curler ${fwip_path}/${level})
+ echo 'Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161'
Processing route for NIC 0/42:01:0a:00:00:05 as ens4 for 34.74.167.161
+ routes[$dev_name]+=34.74.167.161
+ set_routes ens4
+ local dev=ens4
+ read -a dev_routes
++ ip route show dev ens4 table local proto 66
++ awk '{print$2}'
+ for route in ${dev_routes[@]}
+ ip route replace to local 10.0.0.634.74.167.161 dev ens4 proto 66
Error: any valid prefix is expected rather than "10.0.0.634.74.167.161".
+ unset dev_routes
+ routes[$dev_name]=
+ unset hw_addr
+ unset fwip_path
+ unset dev_name
+ sleep 30

Comment 1 Colin Walters 2019-07-12 17:25:34 UTC
gcp-routes.service came from https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/363/diffs

Comment 2 Abhinav Dahiya 2019-07-12 22:26:51 UTC
This is blocking progress on GCP.

Comment 4 Steve Milner 2019-07-15 17:15:14 UTC
*** Bug 1729575 has been marked as a duplicate of this bug. ***

Comment 7 Micah Abbott 2019-08-01 18:05:05 UTC
Verified with 4.2.0-0.nightly-2019-08-01-111413.

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-08-01-111413   True        False         70m     Cluster version is 4.2.0-0.nightly-2019-08-01-111413

$ oc get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
miabbo-xbrjk-m-0.c.openshift-gce-devel.internal         Ready    master   88m   v1.14.0+c569285e9
miabbo-xbrjk-m-1.c.openshift-gce-devel.internal         Ready    master   89m   v1.14.0+c569285e9
miabbo-xbrjk-m-2.c.openshift-gce-devel.internal         Ready    master   88m   v1.14.0+c569285e9
miabbo-xbrjk-w-a-qmzf2.c.openshift-gce-devel.internal   Ready    worker   77m   v1.14.0+c569285e9
miabbo-xbrjk-w-b-v9dsh.c.openshift-gce-devel.internal   Ready    worker   77m   v1.14.0+c569285e9
miabbo-xbrjk-w-c-cx464.c.openshift-gce-devel.internal   Ready    worker   77m   v1.14.0+c569285e9

$ oc debug node/miabbo-xbrjk-m-0.c.openshift-gce-devel.internal
Starting pod/miabbo-xbrjk-m-0copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:672656723e9695788c04d90e235430769dae90d38cafed0521fb0415a2e89bcc
              CustomOrigin: Managed by machine-config-operator
                   Version: 42.80.20190731.2 (2019-07-31T13:52:59Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e57683aef2630a24a7fef421f148135ff0bc22cbb1465801fa2ecce703687a5
              CustomOrigin: Image generated via coreos-assembler
                   Version: 42.80.20190725.1 (2019-07-25T13:53:07Z)
sh-4.4# systemctl status gcp-routes
● gcp-routes.service - Update GCP routes for forwarded IPs.
   Loaded: loaded (/usr/lib/systemd/system/gcp-routes.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-08-01 16:28:02 UTC; 1h 34min ago
 Main PID: 1038 (bash)
    Tasks: 2 (limit: 26213)
   Memory: 3.9M
      CPU: 22.701s
   CGroup: /system.slice/gcp-routes.service
           ├─ 1038 /bin/bash /usr/sbin/gcp-routes.sh
           └─65268 sleep 30

Aug 01 17:57:53 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 17:58:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 17:58:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 17:59:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 17:59:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 18:00:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 18:00:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 18:01:24 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 18:01:54 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124
Aug 01 18:02:25 miabbo-xbrjk-m-0.c.openshift-gce-devel.internal bash[1038]: Processing route for NIC 0/42:01:0a:00:00:03 as ens4 for 34.68.45.124

sh-4.4# ip route list table local
local 10.0.0.3 dev ens4 proto kernel scope host src 10.0.0.3 
broadcast 10.0.0.3 dev ens4 proto kernel scope link src 10.0.0.3 
broadcast 10.128.0.0 dev tun0 proto kernel scope link src 10.128.0.1 
local 10.128.0.1 dev tun0 proto kernel scope host src 10.128.0.1 
broadcast 10.128.1.255 dev tun0 proto kernel scope link src 10.128.0.1 
local 34.68.45.124 dev ens4 proto 66 scope host 
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 
```

Since the original problem stated that this was preventing a cluster install from happening and now I have a successful cluster...LGTM.

Comment 8 errata-xmlrpc 2019-10-16 06:29:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.