Bug 1846647
| Summary: | gcp-routes service too slow to not route traffic into GCP SDN | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.5 | CC: | kgarriso |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-13 17:43:59 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1845903 | ||
| Bug Blocks: | |||
|
Description
OpenShift BugZilla Robot
2020-06-12 12:59:06 UTC
Checked in 4.5.0-0.nightly-2020-06-18-210518 env (mine is IPI on GCP). On masters, check the file, the sleep time is 1 second now in the loop of detecting the down status, as the PR.
[root@xxia0619dr2-mj6qh-master-0 /]# vi /opt/libexec/openshift-gcp-routes.sh
...
sleep_or_watch() {
...
for i in {0..5}; do
for vip in "${!vips[@]}"; do
if [[ "${vips[${vip}]}" != down ]] && [[ -e "${RUN_DIR}/${vip}.down" ]]; then
echo "new downfile detected"
break 2
elif [[ "${vips[${vip}]}" = down ]] && ! [[ -e "${RUN_DIR}/${vip}.down" ]]; then
echo "downfile disappeared"
break 2
fi
done
sleep 1 # keep this small enough to not make gcp-routes slower than LBs on recovery
done
...
}
...
Checked https://thedataguy.in/where-gcp-internal-load-balancer-fails/ , understood GCP routes local request to internal LB to the same local node. Checked comment 0, got to know the gcp-routes must notice the down status change more quickly enough to "puts an iptables rule for traffic rediction in place such that local clients do not send traffic" to the internal LB as Stefan helped clarify in Slack. Based on all the info and above file content, moving to VERIFIED.
Checked the service BTW as auxiliary info:
[root@xxia0619dr2-mj6qh-master-1 /]# systemctl list-unit-files | grep gcp-routes
gcp-routes.service enabled
openshift-gcp-routes.service enabled
[root@xxia0619dr2-mj6qh-master-1 /]# systemctl status openshift-gcp-routes.service
● openshift-gcp-routes.service - Update GCP routes for forwarded IPs.
Loaded: loaded (/etc/systemd/system/openshift-gcp-routes.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-06-19 02:37:06 UTC; 8h ago
...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |