1474683 – template for postgress has buggy liveness probe

Bug 1474683 - template for postgress has buggy liveness probe

Summary: template for postgress has buggy liveness probe

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Templates
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Ben Parees
QA Contact:	XiuJuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-25 07:19 UTC by daniel
Modified:	2021-09-09 12:27 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Bad liveness probe caused excess output in container logs. Consequence: Logs contained spurious error messages. Fix: Change liveness probe to something that does not generate spurious log messages. Result: Logs no longer contain excessive spurious error messages.
Clone Of:
Environment:
Last Closed:	2018-08-28 18:31:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description daniel 2017-07-25 07:19:27 UTC

Description of problem:
Templates delivered by  `openshift-ansible-roles-3.5.91-1.git.0.28b3ddb.el7.noarch` like
- /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/quickstart-templates/django-postgresql-persistent.json
- /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/db-templates/postgresql-ephemeral-template.json

have the following livenessProbe included:

~~~
          livenessProbe:
            initialDelaySeconds: 30
            tcpSocket:
              port: 5432
            timeoutSeconds: 1
~~~

which lead to the following in the postgress log within the container every 30 sec:

~~~
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
~~~

making the log file quite unusable. 

Version-Release number of selected component (if applicable):
openshift-ansible-roles-3.5.91-1.git.0.28b3ddb.el7.noarch

How reproducible:

Steps to Reproduce:

oc process -f /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.5/db-templates/postgresql-ephemeral-template.json  -v POSTGRESQL_USER=jim -v POSTGRESQL_PASSWORD=redhat -v POSTGRESQL_DATABASE=testdb | oc create -f -
secret "postgresql" created
service "postgresql" created
deploymentconfig "postgresql" created
[root@inf150 ~]# oc get all
NAME            REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/postgresql   1          1         1         config,image(postgresql:9.5)

NAME              DESIRED   CURRENT   READY     AGE
rc/postgresql-1   1         1         1         1m

NAME             CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
svc/postgresql   172.30.83.86   <none>        5432/TCP   1m

NAME                    READY     STATUS    RESTARTS   AGE
po/postgresql-1-2j6sl   1/1       Running   0          57s
[root@inf150 ~]# 
[root@inf150 ~]# oc rsh postgresql-1-2j6sl
sh-4.2$ tail -f /var/lib/pgsql/data/userdata/pg_log/postgresql-Tue.log 
LOG:  autovacuum launcher started
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet
LOG:  incomplete startup packet



Actual results:
"LOG:  incomplete startup packet" every 30 seconds in /var/lib/pgsql/data/userdata/pg_log/postgresql-Tue.log 


Expected results:
Our template checks should not cuase more logs


Additional info:
Having a look at 
- http://www.postgresql-archive.org/Incomplete-startup-packet-help-needed-td5199030.html

is stating that doing just a tcp connect without sending data at all leads to exactly this message.
The check in our templates is:

~~~
~~~
          livenessProbe:
            initialDelaySeconds: 30
            tcpSocket:
              port: 5432
            timeoutSeconds: 1
~~~
which is exactly what is decribed by just doing tcp connect without sending/requesting further input.

A check like the following could prevent the issue:

~~~
       livenessProbe:
          exec:
            command:
            - /bin/sh
            - -i
            - -c
            - pg_isready -h 127.0.0.1 -p 5432
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
~~~

Comment 6 Petr Kubat 2017-10-25 07:58:29 UTC

Tracked upstream:
https://github.com/sclorg/postgresql-container/issues/195

Comment 7 Petr Kubat 2017-12-13 13:06:32 UTC

This is now fixed upstream:
https://github.com/sclorg/postgresql-container/commit/8230ed1221225ed9b9d1706810c1f1f228931856

Comment 8 Petr Kubat 2018-02-28 13:45:20 UTC

Moving the bug back to Openshift as the templates are not released by the RHSCL team so it should not be tracked against our components.

Comment 9 Ben Parees 2018-02-28 14:44:48 UTC

Petr, the django example template is not fixed yet:
https://github.com/sclorg/django-ex/blob/master/openshift/templates/django-postgresql.json#L387-L393

Once it is fixed you can return this bug to me and i'll track when openshift releases the updated templates.

Comment 10 Petr Kubat 2018-02-28 15:04:51 UTC

Thanks for spotting this. I opened up a PR against django-ex:

https://github.com/sclorg/django-ex/pull/119

Comment 11 Petr Kubat 2018-03-12 07:06:18 UTC

rails and django templates changes have been merged.

Comment 12 Ben Parees 2018-03-12 17:30:31 UTC

should be fixed by https://github.com/openshift/origin/pull/18621

Comment 14 Wenjing Zheng 2018-06-06 06:32:47 UTC

Verified with below version:
v3.10.0-0.58.0

sh-4.2$ tail -f /var/lib/pgsql/data/userdata/pg_log/postgresql-Wed.log
LOG:  database system is ready to accept connections
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2018-06-06 05:58:44 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

Note You need to log in before you can comment on or make changes to this bug.