Bug 1281733

Summary: Not all pods get started when deploying 50 mysql concurrently
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: DocumentationAssignee: Thien-Thi Nguyen <tnguyen>
Status: CLOSED CURRENTRELEASE QA Contact: Vikram Goyal <vigoyal>
Severity: low Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 3.1.0CC: amelicha, aos-bugs, bparees, dma, hhorak, jeder, jokerman, mmccomas, mnagy, tnguyen, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-11 13:39:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description DeShuai Ma 2015-11-13 10:41:59 UTC
Description of problem:
When deploy 50 mysql concurrently, some pods can't be running. Their status are CrashLoopBackOff

Version-Release number of selected component (if applicable):
[root@openshift-146 deploy_test]# openshift version
openshift v3.1.0.4-3-ga6353c7
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

1-master 2-node
every vm has 8G memory and 30G disk

How reproducible:
sometimes

Steps to Reproduce:
1.Create 50 projects and in every project deploy a mysql

2.Check all pods

Actual results:
2. Some pods is failed with error.
[root@openshift-137 ~]# docker logs 158a16505dde
Starting local mysqld server ...
Waiting for MySQL to start ...
151113  5:06:56 [Note] /opt/rh/mysql55/root/usr/libexec/mysqld (mysqld 5.5.45) starting as process 9 ...
151113  5:06:56 [Warning] One can only use the --user switch if running as root

151113  5:06:56 [Note] Plugin 'FEDERATED' is disabled.
151113  5:06:56 InnoDB: The InnoDB memory heap is disabled
151113  5:06:56 InnoDB: Mutexes and rw_locks use GCC atomic builtins
151113  5:06:56 InnoDB: Compressed tables use zlib 1.2.7
151113  5:06:56 InnoDB: Using Linux native AIO
151113  5:06:56  InnoDB: Warning: io_setup() failed with EAGAIN. Will make 5 attempts before giving up.
InnoDB: Warning: io_setup() attempt 1 failed.
InnoDB: Warning: io_setup() attempt 2 failed.
Waiting for MySQL to start ...
InnoDB: Warning: io_setup() attempt 3 failed.
InnoDB: Warning: io_setup() attempt 4 failed.
Waiting for MySQL to start ...
InnoDB: Warning: io_setup() attempt 5 failed.
151113  5:06:59  InnoDB: Error: io_setup() failed with EAGAIN after 5 attempts.
InnoDB: You can disable Linux Native AIO by setting innodb_use_native_aio = 0 in my.cnf
151113  5:06:59 InnoDB: Fatal error: cannot initialize AIO sub-system
151113  5:06:59 [ERROR] Plugin 'InnoDB' init function returned error.
151113  5:06:59 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
151113  5:06:59 [ERROR] Unknown/unsupported storage engine: InnoDB
151113  5:06:59 [ERROR] Aborting

151113  5:06:59 [Note] /opt/rh/mysql55/root/usr/libexec/mysqld: Shutdown complete

Expected results:
2.All pod are running

Additional info:
disk usage was about 62%

Comment 1 Ben Parees 2015-11-13 12:55:24 UTC
Is this 50 independent pods, or is this a master/slave scenario?  Since you say 50 projects I assume it's 50 independent ones.

which volume type are you using for the pods?

Comment 2 Martin Nagy 2015-11-13 15:07:48 UTC
DeShuai, there appear to be two possible solutions, can you please try both of them, independently?

1) Increase fs.aio-max-nr kernel limit on the nodes. See the answer here: https://unix.stackexchange.com/questions/116520/mysql-server-wont-install-to-a-new-os-debian-ubuntu

2) Set the MYSQL_AIO environment variable to 0 to all MySQL pods that you create. This should disable native AIO and the limit should not be hit.

Comment 3 DeShuai Ma 2015-11-14 01:16:19 UTC
(In reply to Ben Parees from comment #1)
> Is this 50 independent pods, or is this a master/slave scenario?  Since you
> say 50 projects I assume it's 50 independent ones.
> 
> which volume type are you using for the pods?

50 pods are independent. all pod is created using file:https://github.com/openshift/origin/blob/master/examples/db-templates/mysql-ephemeral-template.json

Comment 4 DeShuai Ma 2015-11-16 02:29:22 UTC
(In reply to Martin Nagy from comment #2)
> DeShuai, there appear to be two possible solutions, can you please try both
> of them, independently?
> 
> 1) Increase fs.aio-max-nr kernel limit on the nodes. See the answer here:
> https://unix.stackexchange.com/questions/116520/mysql-server-wont-install-to-
> a-new-os-debian-ubuntu
> 
> 2) Set the MYSQL_AIO environment variable to 0 to all MySQL pods that you
> create. This should disable native AIO and the limit should not be hit.

After try both two method, pod can be running well.thanks.

Comment 5 Jeremy Eder 2015-11-17 12:57:11 UTC
Now, what to do with this ... Doubt this is something we'd want to do globally on the kernel side for systems running openshift.

My thought is that we should convert this to a documentation bugzilla and put a note in the openshift docs.

Comment 6 Martin Nagy 2015-11-18 13:02:29 UTC
I agree. Should we move this bug to the Documentation component?

Comment 7 Ben Parees 2016-01-04 15:16:56 UTC
Martin, please document this in the openshift mysql image docs.  Thanks.

Comment 8 Vikram Goyal 2016-02-05 01:20:07 UTC
Martin, did you want the docs team to add this note? If yes, I will assign it to one of the writers.

Comment 9 Thien-Thi Nguyen 2016-03-01 18:16:04 UTC
(In reply to Vikram Goyal from comment #8)
> Martin, did you want the docs team to add this note?
> If yes, I will assign it to one of the writers.

Martin and i discussed this today.  The plan is to add to the MySQL images section an explanation behind why MySQL fails w/ the above-mentioned error messages, as well as documentation on the two resolution paths:
- increase the kernel resource limit on the node
- set the env var
The latter will mention that editing /etc/my.cfg directly is not the OpenShift way, and xref the "Managing Environment Variables" section of the Dev Guide.

Martin, additions/corrections welcome.

Assigning this to myself.

Comment 10 Thien-Thi Nguyen 2016-03-08 10:58:57 UTC
PR: https://github.com/openshift/openshift-docs/pull/1701

Comment 11 Martin Nagy 2016-03-08 14:08:05 UTC
Commented in the pull, looks great!

Comment 13 Thien-Thi Nguyen 2016-03-15 21:02:32 UTC
Hi DeShuai, WDYT?

Comment 14 Thien-Thi Nguyen 2016-03-15 21:03:23 UTC
Hi DeShuai, WDYT?

Comment 15 DeShuai Ma 2016-03-16 01:45:46 UTC
The doc pr LGTM. When pr is merged will verify this bug.

Comment 16 Thien-Thi Nguyen 2016-03-17 20:04:51 UTC
Hi DeShuai,

PR merged: https://github.com/openshift/openshift-docs/pull/1701#event-593982568

WDYT?

Comment 17 Thien-Thi Nguyen 2016-03-18 07:38:36 UTC
Thanks DeShuai.

Moving status to RELEASE_PENDING.