Created attachment 1139237 [details] "oc describe pod" of broken readinessProbe Description of problem: I was rolling out containers today to test out PV's. I was using the mysql-persistent and mysql-ephemeral examples. In doing so, the container would come up, but would never become ready. After tracking this down the bug, we believe readinessProbe in the DC of mysql is broken. We noticed that our older versions do NOT have readinessProbe configured in the DC. Here is what the readinessProbe looks like. https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/files/examples/v1.1/db-templates/mysql-ephemeral-template.json#L95 I believe the line: "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'" is bad, because we are surrounding the $MYSQL_PASSWORD with single quotes. The readiness check never returns an positive code. from "oc describe pod <mysql-pod-name>" we see the following error: Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111) I have edited the DC and removed the single quotes and things start working. Version-Release number of selected component (if applicable): 3.1.1.6 of openshift; How reproducible: Very with the latest version of the mysql DC. Steps to Reproduce: 1. install latest mysql example 2. deploy mysql pod 3. do oc get pods to see if it is ready Actual results: pod never becomes ready Expected results: pod to pass readiness check. I have attached output of the oc describe pod in the failed state
Indeed a bug, going to fix this today.
https://github.com/openshift/origin/pull/8208
*** Bug 1320015 has been marked as a duplicate of this bug. ***
I think per the new process this is supposed to go to modified and then will get moved to ON_QA once OSE is updated.
The pull request failed, and still needs to be merged. Can someone make sure the pull request makes it.
looks like the merge is still running. the tests flaked.
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/38064e6fa86b42d755c8574bd08c247ffb8a6ae1 Bug 1320335: Fix quoting for mysql probes
This has been merged into OSE and is in release v3.2.0.8
Checked on openshift v3.2.0.8, still can reproduce this issue: Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- <invalid> <invalid> 1 {default-scheduler } Normal Scheduled Successfully assigned mysql-1-ns1rc to openshift-147.lab.sjc.redhat.com <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Normal Pulling pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest" <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Normal Pulled Successfully pulled image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest" <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Normal Created Created container with docker id a52ab872787f <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Normal Started Started container with docker id a52ab872787f <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Warning Unhealthy Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111) <invalid> <invalid> 1 {kubelet openshift-147.lab.sjc.redhat.com} spec.containers{mysql} Warning Unhealthy Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell ERROR 1045 (28000): Access denied for user 'userX8B'@'127.0.0.1' (using password: YES)
Also I tried with this template https://raw.githubusercontent.com/openshift/origin/38064e6fa86b42d755c8574bd08c247ffb8a6ae1/examples/db-templates/mysql-persistent-template.json, failed with same reason.
The password seems right. Does the mysql eventually become ready after some time? Or the liveness probe will fail as well and keep mysql in restart loop?
After waiting for sometime, the deploy pod is error and below error appears. $ oc get pods NAME READY STATUS RESTARTS AGE mysql-1-deploy 0/1 Error 0 12m $oc describe pods mysql-1-deploy >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned mysql-1-deploy to openshift-101.lab.sjc.redhat.com 11m 11m 1 {kubelet openshift-101.lab.sjc.redhat.com} spec.containers{deployment} Normal Pulled Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer:v3.2.0.8" already present on machine 11m 11m 1 {kubelet openshift-101.lab.sjc.redhat.com} spec.containers{deployment} Normal Created Created container with docker id a0a1aadf92a5 11m 11m 1 {kubelet openshift-101.lab.sjc.redhat.com} spec.containers{deployment} Normal Started Started container with docker id a0a1aadf92a5 56s 56s 1 {kubelet openshift-101.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "TeardownNetwork" for "mysql-1-deploy_wzheng" with TeardownNetworkError: "Failed to teardown network for pod \"c1762227-f656-11e5-8022-fa163e4c0ffe\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1"
I cannot reproduce this in the latest Origin master (the mysql service is ready after 1 failed check and everything works as expected). I used this template: https://github.com/openshift/origin/blob/master/examples/db-templates/mysql-ephemeral-template.json (the persistent template readiness probe is the same). The networking error might be related? Also does mysql have persistent storage attached? Is this rhel7 mysql image or centos7 mysql image?
I cannot reproduce with the template you sent either, but can reproduce with the template installed by default in OSE env. The error is the same for both mysql-ephemeral-template.json and mysql-persistent-template.json(both using rhel7 image)
(In reply to Wenjing Zheng from comment #16) > I cannot reproduce with the template you sent either, but can reproduce with > the template installed by default in OSE env. The error is the same for both > mysql-ephemeral-template.json and mysql-persistent-template.json(both using > rhel7 image) What is the diff between those two?
(In reply to Michal Fojtik from comment #17) > (In reply to Wenjing Zheng from comment #16) > > I cannot reproduce with the template you sent either, but can reproduce with > > the template installed by default in OSE env. The error is the same for both > > mysql-ephemeral-template.json and mysql-persistent-template.json(both using > > rhel7 image) > > What is the diff between those two? I just compared persistent template, now OSE 3.0.2.9 has the same template with origin which have above issue, I compared with persistent template between orgin, 3.0.2.8 and 3.0.2.9, here are differences: [wzheng@openshiftqe tmp]$ diff persistent_origin.json persistent_329.json 8c8 < "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported. You must have persistent volumes available in your cluster to use this template.", --- > "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported", [wzheng@openshiftqe tmp]$ diff persistent_origin.json persistent_ose_328.json 8c8 < "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported. You must have persistent volumes available in your cluster to use this template.", --- > "description": "MySQL database service, with persistent storage. Scaling to more than one replica is not supported", 81c81 < "namespace": "${NAMESPACE}" --- > "namespace": "openshift" 117c117 < "MYSQL_PWD=\"$MYSQL_PASSWORD\" mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"] --- > "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"] 180,181c180,181 < "displayName": "Memory Limit", < "description": "Maximum amount of memory the container can use.", --- > "displayName": "Memory limit", > "description": "Maximum amount of memory the container can use", 185,190d184 < "name": "NAMESPACE", < "displayName": "Namespace", < "description": "The OpenShift Namespace where the ImageStream resides.", < "value": "openshift" < }, < { 192,193c186,187 < "displayName": "Database Service Name", < "description": "The name of the OpenShift Service exposed for the database.", --- > "displayName": "Database service name", > "description": "The name of the OpenShift Service exposed for the database", 199,200c193,194 < "displayName": "MySQL User", < "description": "Username for MySQL user that will be used for accessing the database.", --- > "displayName": "MySQL user", > "description": "Username for MySQL user that will be used for accessing the database", 207,208c201,202 < "displayName": "MySQL Password", < "description": "Password for the MySQL user.", --- > "displayName": "MySQL password", > "description": "Password for the MySQL user", 215,216c209,210 < "displayName": "MySQL Database Name", < "description": "Name of the MySQL database accessed.", --- > "displayName": "MySQL database name", > "description": "Name of the MySQL database accessed", 222,223c216,217 < "displayName": "Volume Capacity", < "description": "Volume space available for data, e.g. 512Mi, 2Gi.", --- > "displayName": "Volume capacity", > "description": "Volume space available for data, e.g. 512Mi, 2Gi",
OK, this is the issue: > "MYSQL_PWD='$MYSQL_PASSWORD' mysql -h 127.0.0.1 -u $MYSQL_USER -D $MYSQL_DATABASE -e 'SELECT 1'"] We should bump the template in OSE repo to match with origin.
One more thing, if replace the string persistentVolumeClaim to PersistentVolumeClaim in persistent template, although the error exists, pod will finally become running: $ oc describe pods mysql-1-pakfd 2m 2m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{mysql} Warning Unhealthy Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111) [wzheng@openshiftqe test]$ oc get pods NAME READY STATUS RESTARTS AGE mysql-1-pakfd 1/1 Running 0 3m
Please overlook my comment #20, although pod is running with that update, pv is not attached, sorry for confusion.
Currently, templates between origin and OSE 3.0.2.9 are the same now. Pod is running and works well although there is still below error: 8s 8s 1 {kubelet openshift-126.lab.sjc.redhat.com} spec.containers{mysql} Warning Unhealthy Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
After confirming with Michal, the error is fine, so verify this bug per comment #22.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064