Bug 1278240 - openshift doesn't wait for readiness check to succeed before attempting a liveness check
Summary: openshift doesn't wait for readiness check to succeed before attempting a liv...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Paul Weil
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-05 03:24 UTC by Erik M Jacobs
Modified: 2015-11-05 03:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-05 03:46:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Erik M Jacobs 2015-11-05 03:24:54 UTC
atomic-openshift-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-clients-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-master-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-node-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-sdn-ovs-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-utils-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-filter-plugins-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-lookup-plugins-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-playbooks-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-roles-3.0.7-1.git.48.75d357c.el7aos.noarch
tuned-profiles-atomic-openshift-node-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64

given the following DC:

apiVersion: v1
kind: DeploymentConfig
metadata:
  creationTimestamp: 2015-11-05T03:20:22Z
  labels:
    template: quickstart-keyvalue-application
  name: frontend
  namespace: quickstart
  resourceVersion: "11454"
  selfLink: /oapi/v1/namespaces/quickstart/deploymentconfigs/frontend
  uid: 264b0062-836c-11e5-b039-525400b33d1d
spec:
  replicas: 2
  selector:
    name: frontend
  strategy:
    resources: {}
    rollingParams:
      intervalSeconds: 1
      maxSurge: 25%
      maxUnavailable: 25%
      timeoutSeconds: 600
      updatePeriodSeconds: 1
    type: Rolling
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: frontend
    spec:
      containers:
      - env:
        - name: MYSQL_USER
          value: user3T4
        - name: MYSQL_PASSWORD
          value: 6hMkErOq
        - name: MYSQL_DATABASE
          value: root
        image: 172.30.129.155:5000/quickstart/ruby-sample@sha256:7d74e5b93ebf336afbe9e78b87fde4558d6fc2306f367d3027e255a63f946944
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 1
        name: ruby-helloworld
        ports:
        - containerPort: 8080
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /
            port: 8080
            scheme: HTTP
          timeoutSeconds: 1
        resources: {}
        securityContext:
          capabilities: {}
          privileged: false
        terminationMessagePath: /dev/termination-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
  triggers:
  - imageChangeParams:
      automatic: true
      containerNames:
      - ruby-helloworld
      from:
        kind: ImageStreamTag
        name: ruby-sample:latest
      lastTriggeredImage: 172.30.129.155:5000/quickstart/ruby-sample@sha256:7d74e5b93ebf336afbe9e78b87fde4558d6fc2306f367d3027e255a63f946944
    type: ImageChange
  - type: ConfigChange
status:
  details:
    causes:
    - imageTrigger:
        from:
          kind: DockerImage
          name: 172.30.129.155:5000/quickstart/ruby-sample:latest
      type: ImageChange
  latestVersion: 1

The following event history occurs after the build completes and deployments start:
0s        0s        1         frontend-1-5lbxn   Pod                 Scheduled   {scheduler }   Successfully assigned frontend-1-5lbxn to ose3-node2.example.com
0s        0s        1         frontend-1   ReplicationController             SuccessfulCreate   {replication-controller }   Created pod: frontend-1-5lbxn
0s        0s        1         frontend-1-5lbxn   Pod       implicitly required container POD   Pulled    {kubelet ose3-node2.example.com}   Container image "openshift3/ose-pod:v3.0.2.906" already present on machine
0s        0s        1         frontend-1-5lbxn   Pod       implicitly required container POD   Created   {kubelet ose3-node2.example.com}   Created with docker id 9e45351f532c
0s        0s        1         frontend-1-5lbxn   Pod       implicitly required container POD   Started   {kubelet ose3-node2.example.com}   Started with docker id 9e45351f532c
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Pulling   {kubelet ose3-node2.example.com}   pulling image "172.30.129.155:5000/quickstart/ruby-sample@sha256:7d74e5b93ebf336afbe9e78b87fde4558d6fc2306f367d3027e255a63f946944"
2s        2s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Pulled    {kubelet ose3-node2.example.com}   Successfully pulled image "172.30.129.155:5000/quickstart/ruby-sample@sha256:7d74e5b93ebf336afbe9e78b87fde4558d6fc2306f367d3027e255a63f946944"
1s        1s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Created   {kubelet ose3-node2.example.com}   Created with docker id 697fbf535808
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Started   {kubelet ose3-node2.example.com}   Started with docker id 697fbf535808
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
10s       0s        2         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
20s       0s        3         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
30s       0s        4         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
40s       0s        5         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
50s       0s        6         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Liveness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
1m        0s        7         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Unhealthy   {kubelet ose3-node2.example.com}   Readiness probe failed: Get http://10.1.0.52:8080/: dial tcp 10.1.0.52:8080: connection refused
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Killing   {kubelet ose3-node2.example.com}   Killing with docker id 697fbf535808
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Pulled    {kubelet ose3-node2.example.com}   Container image "172.30.129.155:5000/quickstart/ruby-sample@sha256:7d74e5b93ebf336afbe9e78b87fde4558d6fc2306f367d3027e255a63f946944" already present on machine
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Created   {kubelet ose3-node2.example.com}   Created with docker id 9fb9b75a3496
0s        0s        1         frontend-1-5lbxn   Pod       spec.containers{ruby-helloworld}   Started   {kubelet ose3-node2.example.com}   Started with docker id 9fb9b75a3496

You will notice that readiness checks for the pod have not successfully completed yet, but the liveness probe starts and then suddenly the pod is considered failed and then it is killed and restarted -- before it ever became ready in the first place.

Shouldn't liveness not be checked until AFTER we are ready the first time (if a readiness check is defined)?

Comment 2 Paul Weil 2015-11-05 03:45:50 UTC
Liveness and readiness are two independent probes that control different things.

Liveness indicates that the container is running.  A failure of a liveness probe is an indication that the container is unhealthy and needs restarted.  If the liveness probe fails the kubelet will kill the container and restart it.

Readiness indicates that the container is ready to receive traffic.  When a pod is ready it is added to the endpoints of any service that is selecting the pod. 

kube docs: https://github.com/kubernetes/kubernetes/blob/b9cfab87e33ea649bdd13a1bd243c502d76e5d22/docs/user-guide/pod-states.md#container-probes


Note You need to log in before you can comment on or make changes to this bug.