Bug 1573207

Summary:	HTTP requests failing during deployment scaleup and scaledown
Product:	OpenShift Container Platform	Reporter:	Sudarshan Chaudhari <suchaudh>
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	aos-bugs, public, vcorrea
Version:	3.7.0
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-30 17:22:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sudarshan Chaudhari 2018-04-30 13:37:14 UTC

Description of problem: 
We are facing the issues in accessing the application when the pods were scaled down at the same time.

How reproducible:

this issue can be reproduced easily.

Steps to Reproduce:
1. Create a new application "hello-openshift"
2. used the loop to curl the route as specified below:

~~~~~~~~~~~~
# Start in separate terminal (make sure to use correct domain)
while true; do
  if ! curl --fail --silent --show-error --max-time 1 -o /dev/null "http://hello-openshift-test.apps.example.com"; then
    echo "$(date): Request failed"
  fi
done

# Repeatedly rescale deployment
while sleep 2; do
  current=$(oc -n test get dc/hello-openshift -o json | jq .spec.replicas)
  if (( current < 20 )); then
    replicas=20
  else
    replicas=10
  fi
  echo "$(date): Scale from ${current} to ${replicas} replicas"
  oc -n test scale dc/hello-openshift --replicas="$replicas"

  oc -n test deploy --follow --latest dc/hello-openshift

  echo "$(date): Wait for pods to become available"
  while [[ "$(oc -n test get dc/hello-openshift -o json | jq .status.unavailableReplicas)" -ne 0 ]]; do sleep 1; done
  echo
done

~~~~~~~~~~~~

Actual results:
---
Fre Apr 20 13:25:55 CEST 2018: Scale from 20 to 10 replicas
deploymentconfig "hello-openshift" scaled
Command "deploy" is deprecated, Use the `rollout latest` and `rollout cancel` commands instead.
Flag --latest has been deprecated, use 'oc rollout latest' instead
Started deployment #12
--> Scaling up hello-openshift-12 from 0 to 10, scaling down hello-openshift-9 from 10 to 0 (keep 10 pods available, don't exceed 11 pods)
    Scaling hello-openshift-12 up to 1
    Scaling hello-openshift-9 down to 9
    Scaling hello-openshift-12 up to 2
    Scaling hello-openshift-9 down to 8
    Scaling hello-openshift-12 up to 3
    Scaling hello-openshift-9 down to 7
    Scaling hello-openshift-12 up to 4
    Scaling hello-openshift-9 down to 6
    Scaling hello-openshift-12 up to 5
    Scaling hello-openshift-9 down to 5
    Scaling hello-openshift-12 up to 6
    Scaling hello-openshift-9 down to 4
    Scaling hello-openshift-12 up to 7
    Scaling hello-openshift-9 down to 3
    Scaling hello-openshift-12 up to 8
    Scaling hello-openshift-9 down to 2
    Scaling hello-openshift-12 up to 9
    Scaling hello-openshift-9 down to 1
    Scaling hello-openshift-12 up to 10 
    Scaling hello-openshift-9 down to 0
--> Success
Fre Apr 20 13:27:53 CEST 2018: Wait for pods to become available
---

Meanwhile:
---
Fre Apr 20 13:25:58 CEST 2018: Request failed
curl: (28) Operation timed out after 1000 milliseconds with 0 out of -1 bytes received
Fre Apr 20 13:25:59 CEST 2018: Request failed
curl: (28) Operation timed out after 1000 milliseconds with 0 out of -1 bytes received
Fre Apr 20 13:26:01 CEST 2018: Request failed
curl: (28) Operation timed out after 1000 milliseconds with 0 out of -1 bytes received
Fre Apr 20 13:26:02 CEST 2018: Request failed
curl: (28) Operation timed out after 1000 milliseconds with 0 out of -1 bytes received
Fre Apr 20 13:27:48 CEST 2018: Request failed
---


Expected results:

The route should properly route the http request to active pod node the pods which are tagged for termination and there should be no request timeout in zero-downtime environment.

Comment 2 hansmi 2018-04-30 15:11:49 UTC

I'm the reporter of this issue via a support case.

> 1. Create a new application "hello-openshift"

This is not entirely correct. I supplied a full configuration which contains readiness and liveness probes.

If I'd have to guess what's happening at a high level it's that the application router is still forwarding requests to terminating pods, or that they're sent the signal to terminate before the request has been handled. I'd expected a deployment to wait until all outstanding requests have been handled (within the termination grace period) and to ensure that no traffic is directed at a pod destined for termination.

My reproduction code in full (hostname replaced):

---
ns=poc-rolling-updates

oc new-project --skip-config-write "$ns"

oc -n "$ns" apply -f - <<'EOF'
apiVersion: v1
kind: List
metadata: {}
items:

- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    name: deployment-example
  spec:
    replicas: 10
    revisionHistoryLimit: 2
    selector:
      app: deployment-example
      deploymentconfig: deployment-example
    strategy:
      activeDeadlineSeconds: 21600
      resources: {}
      rollingParams:
        intervalSeconds: 1
        maxSurge: 1
        maxUnavailable: 0
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        annotations: null
        creationTimestamp: null
        labels:
          app: deployment-example
          deploymentconfig: deployment-example
      spec:
        containers:
        - image: openshift/hello-openshift:latest
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 1
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: deployment-example
          ports:
          - containerPort: 8080
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 1
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
    test: false
    triggers: []

- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: null
    labels:
      app: deployment-example
    name: deployment-example
  spec:
    ports:
    - name: 8080-tcp
      port: 8080
      protocol: TCP
      targetPort: 8080
    selector:
      app: deployment-example
      deploymentconfig: deployment-example
    sessionAffinity: None
    type: ClusterIP

- apiVersion: v1
  kind: Route
  metadata:
    name: deployment-example
  spec:
    port:
      targetPort: 8080-tcp
    to:
      kind: Service
      name: deployment-example
      weight: 100
    wildcardPolicy: None
EOF

# Start in separate terminal (make sure to use correct domain)
while true; do
  if ! curl --fail --silent --show-error --max-time 1 -o /dev/null "http://deployment-example-${ns}.app.example.com"; then
    echo "$(date): Request failed"
  fi
done

# Repeatedly rescale deployment
while sleep 2; do
  current=$(oc -n "$ns" get dc/deployment-example -o json | jq .spec.replicas)
  if (( current < 20 )); then
    replicas=20
  else
    replicas=10
  fi
  echo "$(date): Scale from ${current} to ${replicas} replicas"
  oc -n "$ns" scale dc/deployment-example --replicas="$replicas"

  oc -n "$ns" deploy --follow --latest dc/deployment-example

  echo "$(date): Wait for pods to become available"
  while [[ "$(oc -n "$ns" get dc/deployment-example -o json | jq .status.unavailableReplicas)" -ne 0 ]]; do sleep 1; done
  echo
done
---

Comment 3 Ben Bennett 2018-04-30 15:14:59 UTC

First, does your deployment config have liveness and readiness checks enabled?

Second, what is your router's RELOAD_INTERVAL set to?  It defaults to 5 seconds.  The lowest it can go is 1s (so, 1 second).  But if you have a lot of routes, then you may need to slow down the deployment to compensate (.spec.minReadySeconds).

I was unable to reproduce this as soon as I enabled readiness checks on my hello-openshift pod.

Comment 4 hansmi 2018-04-30 15:16:33 UTC

Ben, I guess our comments overlapped. We do have readiness and liveness probes in the actual reproduction case.

I did try setting RELOAD_INTERVAL=2s as per  Sudarshan's request in the support case.

Comment 5 Ben Bennett 2018-04-30 16:03:05 UTC

Setting:
  oc annotate route hello-openshift router.openshift.io/haproxy.health.check.interval=500ms --overwrite

Helped considerably, BUT you will be sending TCP health checks to pods every 1/2 second.


The real problem is that the termination is being sent and the pod is getting sent the TERM signal, which it doesn't handle and thus, exits immediately.

You need to add a little delay between when the TERM is received and when the pod shuts down so that you can handle the slight delay between when the pod is killed and when the router notices it.

See: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

If you can't or don't want to change the image so it handles it, you can install a preStop hook that sleeps for 10 seconds.  You'll need to get a sleep binary into the pod, but that should not be hard.

Comment 6 Ben Bennett 2018-04-30 17:23:03 UTC

Closed because I believe this is functioning as designed... but if adding a termination handler or a preStop hook doesn't resolve the problem, please feel free to re-open it.

Comment 7 Ben Bennett 2018-05-09 13:04:21 UTC

*** Bug 1575761 has been marked as a duplicate of this bug. ***