Bug 1688390 - OCP haproxy router creating new session on old PIDs after FINISH signal has been sent.
Summary: OCP haproxy router creating new session on old PIDs after FINISH signal has b...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks: 1690146
TreeView+ depends on / blocked
 
Reported: 2019-03-13 16:39 UTC by Ryan Howe
Modified: 2022-08-04 22:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1690146 (view as bug list)
Environment:
Last Closed: 2019-10-23 14:47:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ryan Howe 2019-03-13 16:39:26 UTC
Description of problem:

503 errors are seen coming from router when making a connection. The OpenShift router with nbthreads configured is routing to old endpoints that no longer exist, due to creating new sessions linked to old pids that were send the FINISH (SIGTTOU and SIGUSR1) signal. 


At this time this is only seen when nbthread is configured. 


Version-Release number of selected component (if applicable):
OpenShift 3.9 -> 3.11 haproxy router 
Haproxy 1.8.8

Additional info:

- PID 51938 in pod:
    51938   Mar05  250:12  /usr/sbin/haproxy
     |
     v
- Links to pid 4104 on the host
1000000+   4104  2.9  0.1 355740 60468 ?        Ssl  Mar05 250:10 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 21158 51857
     |
     v
- lsof has the following established connections
haproxy     4104 10000000   15u     IPv4         1005375260         0t0        TCP y.y.y.18:https->x.x.x.14:40834 (ESTABLISHED)
haproxy     4104 10000000   19u     IPv4         1005375226         0t0        TCP y.y.y.18:https->x.x.x.13:47778 (ESTABLISHED)
haproxy     4104 10000000   21u     IPv4         1005365865         0t0        TCP y.y.y.18:https->x.x.x.13:46924 (ESTABLISHED)
haproxy     4104 10000000   27u     IPv4         1005379692         0t0        TCP y.y.y.18:https->x.x.x.14:40676 (ESTABLISHED)
haproxy     4104 10000000   29u     IPv4         1005366039         0t0        TCP y.y.y.18:https->x.x.x.13:46978 (ESTABLISHED)
haproxy     4104 10000000   52u     IPv4         1005345463         0t0        TCP y.y.y.18:https->x.x.x.14:59186 (ESTABLISHED)


- The PID for 51938 was sent finish signal  (likely by another pid but remains in list). The finish signal was sent at or before Mar 9th:
 
   1000000+  41881  2.7  0.1 355488 56384 ?        Ssl  Mar09  73:43 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 21158 51938 65730

- Haproxy logs show new sessions with this old pid:

  Mar 11 15:49:23 localhost haproxy[51938]: x.x.x.14:42972 [11/Mar/2019:15:49:23.872] fe_sni~ be_edge_http:myapp2:myapp2-bg/<NOSRV> 0/-1/-1/-1/0 503 3278 - - SCNN 1/1/0/0/0 0/0 "HEAD /lae2cae.html HTTP/1.1"
  Mar 11 15:49:19 localhost haproxy[51938]: x.x.x.13:50820 [11/Mar/2019:15:49:19.651] fe_sni~ be_edge_http:myapp:myapp-test/<NOSRV> 0/-1/-1/-1/0 503 3278 - - SCNN 2/1/0/0/0 0/0 "HEAD /lae2cae.html HTTP/1.1"


- Info from haproxy 
$ echo "show info" | socat /var/lib/haproxy/run/haproxy.sock stdio
Name: HAProxy
Version: 1.8.8
Release_date: 2018/04/19
Nbthread: 4
Nbproc: 1
Process_num: 1
Pid: 44561

Comment 2 Dan Mace 2019-03-14 21:45:37 UTC
Thanks for the detailed report, Ryan. Will let you know if we need any more information to continue diagnosis.


Note You need to log in before you can comment on or make changes to this bug.