Bug 1320335 - mysql deployment config has bad readinessProbe
Summary: mysql deployment config has bad readinessProbe
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Templates
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Michal Fojtik
QA Contact: Wenjing Zheng
URL:
Whiteboard:
: 1320015 (view as bug list)
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2016-03-22 21:28 UTC by Matt Woodson
Modified: 2016-05-12 16:33 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:33:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
"oc describe pod" of broken readinessProbe (4.40 KB, text/plain)
2016-03-22 21:28 UTC, Matt Woodson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Matt Woodson 2016-03-22 21:28:15 UTC
Created attachment 1139237 [details]
"oc describe pod" of broken readinessProbe

Description of problem:

I was rolling out containers today to test out PV's. I was using the mysql-persistent and mysql-ephemeral examples.  In doing so, the container would come up, but would never become ready.

After tracking this down the bug, we believe readinessProbe in the DC of mysql is broken.  We noticed that our older versions do NOT have readinessProbe configured in the DC.

Here is what the readinessProbe looks like.

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/files/examples/v1.1/db-templates/mysql-ephemeral-template.json#L95


I believe the line:

 "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"

is bad, because we are surrounding the  $MYSQL_PASSWORD with single quotes.  The readiness check never returns an positive code.

from "oc describe pod <mysql-pod-name>" we see the following error:

 Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)

I have edited the DC and removed the single quotes and things start working.


Version-Release number of selected component (if applicable):

3.1.1.6 of openshift; 

How reproducible:

Very with the latest version of the mysql DC.

Steps to Reproduce:
1. install latest mysql example
2. deploy mysql pod
3. do oc get pods to see if it is ready

Actual results:

pod never becomes ready

Expected results:

pod to pass readiness check.


I have attached output of the oc describe pod in the failed state

Comment 3 Michal Fojtik 2016-03-23 06:27:16 UTC
Indeed a bug, going to fix this today.

Comment 4 Michal Fojtik 2016-03-23 12:44:04 UTC
https://github.com/openshift/origin/pull/8208

Comment 5 Ben Parees 2016-03-23 19:56:23 UTC
*** Bug 1320015 has been marked as a duplicate of this bug. ***

Comment 6 Ben Parees 2016-03-23 19:57:41 UTC
I think per the new process this is supposed to go to modified and then will get moved to ON_QA once OSE is updated.

Comment 7 Troy Dawson 2016-03-23 21:28:31 UTC
The pull request failed, and still needs to be merged.
Can someone make sure the pull request makes it.

Comment 8 Ben Parees 2016-03-23 21:51:10 UTC
looks like the merge is still running.  the tests flaked.

Comment 9 openshift-github-bot 2016-03-23 23:05:16 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/38064e6fa86b42d755c8574bd08c247ffb8a6ae1
Bug 1320335: Fix quoting for mysql probes

Comment 10 Troy Dawson 2016-03-28 18:48:38 UTC
This has been merged into OSE and is in release v3.2.0.8

Comment 11 Wenjing Zheng 2016-03-29 08:41:00 UTC
Checked on openshift v3.2.0.8, still can reproduce this issue:
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath		Type		Reason		Message
  ---------	--------	-----	----						-------------		--------	------		-------
  <invalid>	<invalid>	1	{default-scheduler }							Normal		Scheduled	Successfully assigned mysql-1-ns1rc to openshift-147.lab.sjc.redhat.com
  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Normal		Pulling		pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest"
  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Normal		Pulled		Successfully pulled image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest"
  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Normal		Created		Created container with docker id a52ab872787f
  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Normal		Started		Started container with docker id a52ab872787f
  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Warning		Unhealthy	Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)

  <invalid>	<invalid>	1	{kubelet openshift-147.lab.sjc.redhat.com}	spec.containers{mysql}	Warning	Unhealthy	Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
ERROR 1045 (28000): Access denied for user 'userX8B'@'127.0.0.1' (using password: YES)

Comment 13 Michal Fojtik 2016-03-30 08:21:25 UTC
The password seems right. Does the mysql eventually become ready after some time? Or the liveness probe will fail as well and keep mysql in restart loop?

Comment 14 Wenjing Zheng 2016-03-30 09:21:46 UTC
After waiting for sometime, the deploy pod is error and below error appears.
$ oc get pods
NAME             READY     STATUS    RESTARTS   AGE
mysql-1-deploy   0/1       Error     0          12m
$oc describe pods mysql-1-deploy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----						-------------			--------	------		-------
  11m		11m		1	{default-scheduler }								Normal		Scheduled	Successfully assigned mysql-1-deploy to openshift-101.lab.sjc.redhat.com
  11m		11m		1	{kubelet openshift-101.lab.sjc.redhat.com}	spec.containers{deployment}	Normal		Pulled		Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer:v3.2.0.8" already present on machine
  11m		11m		1	{kubelet openshift-101.lab.sjc.redhat.com}	spec.containers{deployment}	Normal		Created		Created container with docker id a0a1aadf92a5
  11m		11m		1	{kubelet openshift-101.lab.sjc.redhat.com}	spec.containers{deployment}	Normal		Started		Started container with docker id a0a1aadf92a5
  56s		56s		1	{kubelet openshift-101.lab.sjc.redhat.com}					Warning		FailedSync	Error syncing pod, skipping: failed to "TeardownNetwork" for "mysql-1-deploy_wzheng" with TeardownNetworkError: "Failed to teardown network for pod \"c1762227-f656-11e5-8022-fa163e4c0ffe\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1"

Comment 15 Michal Fojtik 2016-03-30 10:30:41 UTC
I cannot reproduce this in the latest Origin master (the mysql service is ready after 1 failed check and everything works as expected). I used this template:

https://github.com/openshift/origin/blob/master/examples/db-templates/mysql-ephemeral-template.json

(the persistent template readiness probe is the same).

The networking error might be related? Also does mysql have persistent storage attached? Is this rhel7 mysql image or centos7 mysql image?

Comment 16 Wenjing Zheng 2016-03-30 10:49:20 UTC
I cannot reproduce with the template you sent either, but can reproduce with the template installed by default in OSE env. The error is the same for both mysql-ephemeral-template.json and mysql-persistent-template.json(both using rhel7 image)

Comment 17 Michal Fojtik 2016-03-30 12:13:26 UTC
(In reply to Wenjing Zheng from comment #16)
> I cannot reproduce with the template you sent either, but can reproduce with
> the template installed by default in OSE env. The error is the same for both
> mysql-ephemeral-template.json and mysql-persistent-template.json(both using
> rhel7 image)

What is the diff between those two?

Comment 18 Wenjing Zheng 2016-03-31 06:26:22 UTC
(In reply to Michal Fojtik from comment #17)
> (In reply to Wenjing Zheng from comment #16)
> > I cannot reproduce with the template you sent either, but can reproduce with
> > the template installed by default in OSE env. The error is the same for both
> > mysql-ephemeral-template.json and mysql-persistent-template.json(both using
> > rhel7 image)
> 
> What is the diff between those two?

I just compared persistent template, now OSE 3.0.2.9 has the same template with origin which have above issue, I compared with persistent template between orgin, 3.0.2.8 and 3.0.2.9, here are differences:
[wzheng@openshiftqe tmp]$ diff persistent_origin.json persistent_329.json
8c8
<       "description": "MySQL database service, with persistent storage.  Scaling to more than one replica is not supported.  You must have persistent volumes available in your cluster to use this template.",
---
>       "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported",
[wzheng@openshiftqe tmp]$ diff persistent_origin.json persistent_ose_328.json
8c8
<       "description": "MySQL database service, with persistent storage.  Scaling to more than one replica is not supported.  You must have persistent volumes available in your cluster to use this template.",
---
>       "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported",
81c81
<                 "namespace": "${NAMESPACE}"
---
>                 "namespace": "openshift"
117c117
<                       "MYSQL_PWD=\"$MYSQL_PASSWORD\" mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"]
---
>                       "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"]
180,181c180,181
<       "displayName": "Memory Limit",
<       "description": "Maximum amount of memory the container can use.",
---
>       "displayName": "Memory limit",
>       "description": "Maximum amount of memory the container can use",
185,190d184
<       "name": "NAMESPACE",
<       "displayName": "Namespace",
<       "description": "The OpenShift Namespace where the ImageStream resides.",
<       "value": "openshift"
<     },
<     {
192,193c186,187
<       "displayName": "Database Service Name",
<       "description": "The name of the OpenShift Service exposed for the database.",
---
>       "displayName": "Database service name",
>       "description": "The name of the OpenShift Service exposed for the database",
199,200c193,194
<       "displayName": "MySQL User",
<       "description": "Username for MySQL user that will be used for accessing the database.",
---
>       "displayName": "MySQL user",
>       "description": "Username for MySQL user that will be used for accessing the database",
207,208c201,202
<       "displayName": "MySQL Password",
<       "description": "Password for the MySQL user.",
---
>       "displayName": "MySQL password",
>       "description": "Password for the MySQL user",
215,216c209,210
<       "displayName": "MySQL Database Name",
<       "description": "Name of the MySQL database accessed.",
---
>       "displayName": "MySQL database name",
>       "description": "Name of the MySQL database accessed",
222,223c216,217
<       "displayName": "Volume Capacity",
<       "description": "Volume space available for data, e.g. 512Mi, 2Gi.",
---
>       "displayName": "Volume capacity",
>       "description": "Volume space available for data, e.g. 512Mi, 2Gi",

Comment 19 Michal Fojtik 2016-03-31 06:56:01 UTC
OK, this is the issue:

>                       "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"]


We should bump the template in OSE repo to match with origin.

Comment 20 Wenjing Zheng 2016-03-31 07:12:33 UTC
One more thing, if replace the string  persistentVolumeClaim  to   PersistentVolumeClaim in persistent template, although the error exists, pod will finally become running:
$ oc describe pods mysql-1-pakfd
 2m		2m		1	{kubelet openshift-114.lab.sjc.redhat.com}	spec.containers{mysql}	Warning		Unhealthy	Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
[wzheng@openshiftqe test]$ oc get pods
NAME            READY     STATUS    RESTARTS   AGE
mysql-1-pakfd   1/1       Running   0          3m

Comment 21 Wenjing Zheng 2016-03-31 07:34:10 UTC
Please overlook my comment #20, although pod is running with that update, pv is not attached, sorry for confusion.

Comment 22 Wenjing Zheng 2016-03-31 08:12:16 UTC
Currently, templates between origin and OSE 3.0.2.9 are the same now. Pod is running and works well although there is still below error:
  8s		8s		1	{kubelet openshift-126.lab.sjc.redhat.com}	spec.containers{mysql}	Warning		Unhealthy	Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)

Comment 23 Wenjing Zheng 2016-03-31 09:52:34 UTC
After confirming with Michal, the error is fine, so verify this bug per comment #22.

Comment 25 errata-xmlrpc 2016-05-12 16:33:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064


Note You need to log in before you can comment on or make changes to this bug.