Bug 1575761 - During a rollout update pods continue to receive network traffic
Summary: During a rollout update pods continue to receive network traffic
Keywords:
Status: CLOSED DUPLICATE of bug 1573207
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.1
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-07 20:13 UTC by Vítor Corrêa
Modified: 2018-05-09 13:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-09 13:04:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vítor Corrêa 2018-05-07 20:13:33 UTC
Openshift will send traffic to pods even after signalling them to die.
During a rolling update we experienced some failing requests (customers would get an error message). After doing some investigations we setup the a deployment with 2 pods, a service and a route. We call the route 5 times per second, and during a deployment we can see one or more requests failing + 5-10 requests taking 10-15 seconds. This happens when the old pods are being removed from the service.
We modified the application running in the pod to intercept SIGTERM and wait five seconds before shutting down it's HTTP listener and ending it's ongoing connections.
This is an excerpt from the log of a pod that is shutting down:
Mon Apr 23 2018 13:48:16 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=471EB4F9-B9D0-4563-B625-FA178BAF4CAC curl/7.43.0
Mon Apr 23 2018 13:48:17 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=9FC87787-B125-4464-A016-B8065881A2FB curl/7.43.0
Mon Apr 23 2018 13:48:17 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=EC613AF2-88F6-4861-ABC6-54B9D6692DE0 curl/7.43.0
Mon Apr 23 2018 13:48:17 GMT+0000 (UTC) : Received SIGTERM
Mon Apr 23 2018 13:48:18 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=EBC90F1F-B2A6-4B93-8C11-06C3E84FDB8D curl/7.43.0
Mon Apr 23 2018 13:48:18 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=7B4EC873-D366-4E70-AF13-1409217B2011 curl/7.43.0
Mon Apr 23 2018 13:48:18 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=FA4FA1D6-E78C-4295-8CBB-B7D8D4A09110 curl/7.43.0
Mon Apr 23 2018 13:48:18 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=636D877A-88F9-4543-A26A-412C7FB7D2B3 curl/7.43.0
Mon Apr 23 2018 13:48:19 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=0D8FAA66-068D-48CD-9BD9-274D69D68224 curl/7.43.0
Mon Apr 23 2018 13:48:19 GMT+0000 (UTC) ::ffff:10.38.4.72 GET /?id=80E81885-75C2-4E7D-BE4D-14A29791099A curl/7.43.0
Mon Apr 23 2018 13:48:21 GMT+0000 (UTC) ::ffff:10.38.4.72 GET / Go-http-client/1.1
Mon Apr 23 2018 13:48:21 GMT+0000 (UTC) ::ffff:10.38.4.72 GET / Go-http-client/1.1
Mon Apr 23 2018 13:48:22 GMT+0000 (UTC) : Closing...
Mon Apr 23 2018 13:48:22 GMT+0000 (UTC) : All connections closed

As can be seen here, the pod keeps receiving transactions for 1-4 seconds after SIGTERM is sent to the pod. This is quite unfortunate as those requests will normally fail (as most applications will stop accepting new connections after receiving a SIGTERM). 
We would expect the service to stop sending traffic to a pod when it sends it the SIGTERM signal.

This looks like it is the same as reported here: https://github.com/openshift/origin/issues/18914







$ oc version
oc v3.7.42
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.7.42
kubernetes v1.7.6+a08f5eeb62

Comment 1 Ben Bennett 2018-05-09 13:04:21 UTC

*** This bug has been marked as a duplicate of bug 1573207 ***


Note You need to log in before you can comment on or make changes to this bug.