Bug 1462675 - Router is not synchronized
Router is not synchronized
Status: CLOSED DUPLICATE of bug 1429823
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.5.0
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Ben Bennett
zhaozhanqi
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-19 05:05 EDT by Ruben Romero Montes
Modified: 2017-06-21 12:13 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-21 10:04:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dc router (6.39 KB, text/plain)
2017-06-19 05:05 EDT, Ruben Romero Montes
no flags Details

  None (edit)
Description Ruben Romero Montes 2017-06-19 05:05:53 EDT
Created attachment 1289051 [details]
dc router

Description of problem:
openshift-router process is not updating the ha-proxy files and seems to be stalled in some part of the loop.

Logs and files have not been updated for weeks (since May 31st)

-rwxrwxrwx. 1 root root   517 May 31 13:01 cert_config.map
-rw-r--r--. 1 root root  2035 Apr 11 15:00 default_pub_keys.pem
-rw-r--r--. 1 root root  3278 Apr 11 15:00 error-page-503.http
-rw-r--r--. 1 root root 31863 Apr 11 15:00 haproxy-config.template
-rwxrwxrwx. 1 root root 36469 May 31 13:01 haproxy.config
-rwxrwxrwx. 1 root root   279 May 31 13:01 os_edge_http_be.map
-rwxrwxrwx. 1 root root  2114 May 31 13:01 os_http_be.map
-rwxrwxrwx. 1 root root   512 May 31 13:01 os_reencrypt.map
-rwxrwxrwx. 1 root root    64 May 31 13:01 os_route_http_expose.map
-rwxrwxrwx. 1 root root   362 May 31 13:01 os_route_http_redirect.map
-rwxrwxrwx. 1 root root   275 May 31 13:01 os_sni_passthrough.map
-rwxrwxrwx. 1 root root   783 May 31 13:01 os_tcp_be.map
-rwxrwxrwx. 1 root root     2 May 31 13:01 os_wildcard_domain.map

oc get routes works within the pod. There are two pods and the other one is working properly.

It is not even doing the full refresh every 10 minutes.

Version-Release number of selected component (if applicable):
OCP 3.5.5.5
registry.access.redhat.com/openshift3/ose-haproxy-router         v3.5.5.5

How reproducible:
Only seen once

Steps to Reproduce:
1. Create a route

Actual results:
Configuration files and logs are only updated on one of the pods

Expected results:
Both pods should be udpated with the latest configuration.

Additional info:
Comment 2 Ruben Romero Montes 2017-06-19 05:09:19 EDT
More information about the environment:
 - Non containerized
 - Mixed deployment with Azure and Openstack
 - network plugin: ovs-multitenant
Comment 3 Ben Bennett 2017-06-19 13:24:28 EDT
The culprit was probably https://bugzilla.redhat.com/show_bug.cgi?id=1415112 (fixed in 3.5.5.7 I believe).

*** This bug has been marked as a duplicate of bug 1415112 ***
Comment 5 Ben Bennett 2017-06-21 10:04:55 EDT
It is either https://bugzilla.redhat.com/show_bug.cgi?id=1415112 or https://bugzilla.redhat.com/show_bug.cgi?id=1429823 (fixed in 3.5.5.8).

Both identified problems with the EventQueue that we were using and both can lead to router lock-ups.  The symptoms of a router that stops updating the config file better match the notes in https://bugzilla.redhat.com/show_bug.cgi?id=1429823 but in our investigation we found that https://bugzilla.redhat.com/show_bug.cgi?id=1415112 can lead to the same symptom.  It is easier to make the event queue lock up by changing the labels, but rapid route insertions and deletions were later found to expose the same bug.

*** This bug has been marked as a duplicate of bug 1429823 ***

Note You need to log in before you can comment on or make changes to this bug.