Bug 1462675 - Router is not synchronized
Summary: Router is not synchronized
Keywords:
Status: CLOSED DUPLICATE of bug 1429823
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-19 09:05 UTC by Ruben Romero Montes
Modified: 2022-08-04 22:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-21 14:04:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dc router (6.39 KB, text/plain)
2017-06-19 09:05 UTC, Ruben Romero Montes
no flags Details

Description Ruben Romero Montes 2017-06-19 09:05:53 UTC
Created attachment 1289051 [details]
dc router

Description of problem:
openshift-router process is not updating the ha-proxy files and seems to be stalled in some part of the loop.

Logs and files have not been updated for weeks (since May 31st)

-rwxrwxrwx. 1 root root   517 May 31 13:01 cert_config.map
-rw-r--r--. 1 root root  2035 Apr 11 15:00 default_pub_keys.pem
-rw-r--r--. 1 root root  3278 Apr 11 15:00 error-page-503.http
-rw-r--r--. 1 root root 31863 Apr 11 15:00 haproxy-config.template
-rwxrwxrwx. 1 root root 36469 May 31 13:01 haproxy.config
-rwxrwxrwx. 1 root root   279 May 31 13:01 os_edge_http_be.map
-rwxrwxrwx. 1 root root  2114 May 31 13:01 os_http_be.map
-rwxrwxrwx. 1 root root   512 May 31 13:01 os_reencrypt.map
-rwxrwxrwx. 1 root root    64 May 31 13:01 os_route_http_expose.map
-rwxrwxrwx. 1 root root   362 May 31 13:01 os_route_http_redirect.map
-rwxrwxrwx. 1 root root   275 May 31 13:01 os_sni_passthrough.map
-rwxrwxrwx. 1 root root   783 May 31 13:01 os_tcp_be.map
-rwxrwxrwx. 1 root root     2 May 31 13:01 os_wildcard_domain.map

oc get routes works within the pod. There are two pods and the other one is working properly.

It is not even doing the full refresh every 10 minutes.

Version-Release number of selected component (if applicable):
OCP 3.5.5.5
registry.access.redhat.com/openshift3/ose-haproxy-router         v3.5.5.5

How reproducible:
Only seen once

Steps to Reproduce:
1. Create a route

Actual results:
Configuration files and logs are only updated on one of the pods

Expected results:
Both pods should be udpated with the latest configuration.

Additional info:

Comment 2 Ruben Romero Montes 2017-06-19 09:09:19 UTC
More information about the environment:
 - Non containerized
 - Mixed deployment with Azure and Openstack
 - network plugin: ovs-multitenant

Comment 3 Ben Bennett 2017-06-19 17:24:28 UTC
The culprit was probably https://bugzilla.redhat.com/show_bug.cgi?id=1415112 (fixed in 3.5.5.7 I believe).

*** This bug has been marked as a duplicate of bug 1415112 ***

Comment 5 Ben Bennett 2017-06-21 14:04:55 UTC
It is either https://bugzilla.redhat.com/show_bug.cgi?id=1415112 or https://bugzilla.redhat.com/show_bug.cgi?id=1429823 (fixed in 3.5.5.8).

Both identified problems with the EventQueue that we were using and both can lead to router lock-ups.  The symptoms of a router that stops updating the config file better match the notes in https://bugzilla.redhat.com/show_bug.cgi?id=1429823 but in our investigation we found that https://bugzilla.redhat.com/show_bug.cgi?id=1415112 can lead to the same symptom.  It is easier to make the event queue lock up by changing the labels, but rapid route insertions and deletions were later found to expose the same bug.

*** This bug has been marked as a duplicate of bug 1429823 ***


Note You need to log in before you can comment on or make changes to this bug.