Bug 1474683 - template for postgress has buggy liveness probe [NEEDINFO]
template for postgress has buggy liveness probe
Status: ASSIGNED
Product: Red Hat Software Collections
Classification: Red Hat
Component: rh-postgresql94-docker (Show other bugs)
rh-postgresql94
Unspecified Unspecified
medium Severity medium
: ---
: 3.1
Assigned To: Petr Kubat
qe-baseos-daemons
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-25 03:19 EDT by daniel
Modified: 2017-08-18 03:11 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dmoessne: needinfo? (jorton)


Attachments (Terms of Use)

  None (edit)
Description daniel 2017-07-25 03:19:27 EDT
Description of problem:
Templates delivered by  `openshift-ansible-roles-3.5.91-1.git.0.28b3ddb.el7.noarch` like
- /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/quickstart-templates/django-postgresql-persistent.json
- /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/db-templates/postgresql-ephemeral-template.json

have the following livenessProbe included:

~~~
          livenessProbe:
            initialDelaySeconds: 30
            tcpSocket:
              port: 5432
            timeoutSeconds: 1
~~~

which lead to the following in the postgress log within the container every 30 sec:

~~~
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
~~~

making the log file quite unusable. 

Version-Release number of selected component (if applicable):
openshift-ansible-roles-3.5.91-1.git.0.28b3ddb.el7.noarch

How reproducible:

Steps to Reproduce:

oc process -f /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/db-templates/postgresql-ephemeral-template.json  -v POSTGRESQL_USER=jim -v POSTGRESQL_PASSWORD=redhat -v POSTGRESQL_DATABASE=testdb | oc create -f -
secret "postgresql" created
service "postgresql" created
deploymentconfig "postgresql" created
[root@inf150 ~]# oc get all
NAME            REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/postgresql   1          1         1         config,image(postgresql:9.5)

NAME              DESIRED   CURRENT   READY     AGE
rc/postgresql-1   1         1         1         1m

NAME             CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
svc/postgresql   172.30.83.86   <none>        5432/TCP   1m

NAME                    READY     STATUS    RESTARTS   AGE
po/postgresql-1-2j6sl   1/1       Running   0          57s
[root@inf150 ~]# 
[root@inf150 ~]# oc rsh postgresql-1-2j6sl
sh-4.2$ tail -f /var/lib/pgsql/data/userdata/pg_log/postgresql-Tue.log 
LOG:  autovacuum launcher started
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet



Actual results:
"LOG:  incomplete startup packet" every 30 seconds in /var/lib/pgsql/data/userdata/pg_log/postgresql-Tue.log 


Expected results:
Our template checks should not cuase more logs


Additional info:
Having a look at 
- http://www.postgresql-archive.org/Incomplete-startup-packet-help-needed-td5199030.html

is stating that doing just a tcp connect without sending data at all leads to exactly this message.
The check in our templates is:

~~~
~~~
          livenessProbe:
            initialDelaySeconds: 30
            tcpSocket:
              port: 5432
            timeoutSeconds: 1
~~~
which is exactly what is decribed by just doing tcp connect without sending/requesting further input.

A check like the following could prevent the issue:

~~~
       livenessProbe:
          exec:
            command:
            - /bin/sh
            - -i
            - -c
            - pg_isready -h 127.0.0.1 -p 5432
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
~~~

Note You need to log in before you can comment on or make changes to this bug.