1496647 – Router (Haproxy) not updating routes immediately after rollout of latest deployment config.

Bug 1496647 - Router (Haproxy) not updating routes immediately after rollout of latest deployment config.

Summary: Router (Haproxy) not updating routes immediately after rollout of latest depl...

Keywords:
Status:	CLOSED DUPLICATE of bug 1451854
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Rajat Chopra
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-28 03:50 UTC by Miheer Salunke
Modified:	2022-08-04 22:20 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-31 19:40:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Miheer Salunke 2017-09-28 03:50:12 UTC

Description of problem:

Router (Haproxy) not updating routes immediately after rollout of latest deployment config.

Commands used in pipeline to watch for rollout status (transactions-1-0 is app name)
oc rollout latest dc/transactions-1-0
oc rollout status dc/transactions-1-0 --watch=true

Result: 
(This has been tested with multiple apps)
Curl using pod ip/name immediately sends response code 200 OK
Curl using Service ip/name immediately sends response code 200 OK

Curl using external url takes time to respond. It looks like router is not updating routes immediately
 - Few times it takes around 15seconds to respond. During that time services are not available, which is not desirable (refer script_output.txt file for results)
 - Few times it takes between 60sec- 300sec to respond. During that time services are not available, which is not desirable. Also this breaks the builds.
 - And very rare times it responds immediately.

(I had attached output, we have captured these results using custom script)

Enviornment:
There are no constraints at resource level on infra nodes and we are using two routers.

Required logs and config output as requested:

1) oc exec -it $(oc get pods | grep router | awk '{print $1}' | head -n 1) -- /bin/bash

cat /var/lib/containers/router/routes.json

Ans: Two filed attached immediate before and after rollout dc (router-before-rollout.json and router-after-rollout-1.json)

2) try curling svcname:port/your-health-check (oc get svc -n namespace)

Ans: output attached in files (script_output.txt)

3) try curling serviceip:port/your-health-check (oc get svc -n namespace)

Ans: output attached in files (script_output.txt)

4) try curling endpoint-ip:port/your-health-check (oc get ep -n namespace)

Ans: output attached in files (script_output.txt)

5) oc get route <route-name> -o yaml
Attached file route_app_trasactions.yaml

6) oc logs dc/router -n default
Attached file router_logs_router-4-3pk65.logs and router_logs_router-4-npdn2.logs

7) oc get dc/router -o yaml -n default
Attached file router_dc.yaml

8)oc get route <NAME_OF_ROUTE> -n <PROJECT>
attached file commands_output

9) oc get endpoints --all-namespaces 
attached file commands_output

10)oc exec -it $ROUTER_POD -- ls -la 
Attached file script_output.txt (has both before and after dc rollout)

11) oc exec -it $ROUTER_POD -- find /var/lib/haproxy -regex ".*\(.map\|config.*\|.json\)" -print -exec cat {} \; > haproxy_configs_and_maps
Attached file 
haproxy_configs_and_maps_router-4-3pk65_before_rollout
haproxy_configs_and_maps_router-4-npdn2_before_rollout
haproxy_configs_and_maps_router-4-npdn2_1 (after rollout)
haproxy_configs_and_maps_router-4-3pk65_1 (after rollout)


12)oc get svc svc-which-present-in-the-route -o yaml
service_app_trasactions.yaml


13) oc get dc dc-of-pod -o yaml
pod_app_trasactions.yaml

Please let me know if you require anything else on this


Router config files are updated everytime new dc rollout is done so no strace of the router process are captured



Version-Release number of selected component (if applicable):
OCP 3.5

How reproducible:
Always on customer end

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 7 Miheer Salunke 2017-10-04 08:57:27 UTC

As discussed in today , I am providing following details

- TCP dump from two infra nodes where router/haproxy is installed
- tcpdump from node where application pod is deployed
- Output of commands 'ovs-ofctl -O OpenFlow13 dump-flows br0" and "ovs-dpctl show" from above three nodes
- Arp cache from above three nodes
- script used to capture output of curl 

Observation :
I had changed script to capture arp from all three nodes to check if the pod IP is there in cache and has got proper mac or not . (this time script is run from our ansible host).
Observation here is that whenever curl is failing the new pod has some old arp address entry in arp cache of infra nodes. And it takes sometime for these arp entries to get updated . As soon as the arp entries are updated with latest mac of pod, curl works perfectly fine.
Please refer to output of script for this.

Comment 10 Rajat Chopra 2017-10-06 00:43:16 UTC

Apparent duplicate of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1451854

Comment 11 Ben Bennett 2017-10-31 19:40:53 UTC


*** This bug has been marked as a duplicate of bug 1451854 ***

Note You need to log in before you can comment on or make changes to this bug.