Red Hat Bugzilla – Bug 1462675
Router is not synchronized
Last modified: 2017-06-21 12:13:45 EDT
Created attachment 1289051 [details]
Description of problem:
openshift-router process is not updating the ha-proxy files and seems to be stalled in some part of the loop.
Logs and files have not been updated for weeks (since May 31st)
-rwxrwxrwx. 1 root root 517 May 31 13:01 cert_config.map
-rw-r--r--. 1 root root 2035 Apr 11 15:00 default_pub_keys.pem
-rw-r--r--. 1 root root 3278 Apr 11 15:00 error-page-503.http
-rw-r--r--. 1 root root 31863 Apr 11 15:00 haproxy-config.template
-rwxrwxrwx. 1 root root 36469 May 31 13:01 haproxy.config
-rwxrwxrwx. 1 root root 279 May 31 13:01 os_edge_http_be.map
-rwxrwxrwx. 1 root root 2114 May 31 13:01 os_http_be.map
-rwxrwxrwx. 1 root root 512 May 31 13:01 os_reencrypt.map
-rwxrwxrwx. 1 root root 64 May 31 13:01 os_route_http_expose.map
-rwxrwxrwx. 1 root root 362 May 31 13:01 os_route_http_redirect.map
-rwxrwxrwx. 1 root root 275 May 31 13:01 os_sni_passthrough.map
-rwxrwxrwx. 1 root root 783 May 31 13:01 os_tcp_be.map
-rwxrwxrwx. 1 root root 2 May 31 13:01 os_wildcard_domain.map
oc get routes works within the pod. There are two pods and the other one is working properly.
It is not even doing the full refresh every 10 minutes.
Version-Release number of selected component (if applicable):
Only seen once
Steps to Reproduce:
1. Create a route
Configuration files and logs are only updated on one of the pods
Both pods should be udpated with the latest configuration.
More information about the environment:
- Non containerized
- Mixed deployment with Azure and Openstack
- network plugin: ovs-multitenant
The culprit was probably https://bugzilla.redhat.com/show_bug.cgi?id=1415112 (fixed in 184.108.40.206 I believe).
*** This bug has been marked as a duplicate of bug 1415112 ***
It is either https://bugzilla.redhat.com/show_bug.cgi?id=1415112 or https://bugzilla.redhat.com/show_bug.cgi?id=1429823 (fixed in 220.127.116.11).
Both identified problems with the EventQueue that we were using and both can lead to router lock-ups. The symptoms of a router that stops updating the config file better match the notes in https://bugzilla.redhat.com/show_bug.cgi?id=1429823 but in our investigation we found that https://bugzilla.redhat.com/show_bug.cgi?id=1415112 can lead to the same symptom. It is easier to make the event queue lock up by changing the labels, but rapid route insertions and deletions were later found to expose the same bug.
*** This bug has been marked as a duplicate of bug 1429823 ***