Bug 1043412

Summary: [performance] encounter failure frequently when try to create jbosseap-6 app
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: ImageStreamsAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED NOTABUG QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 2.0.0CC: bleanhar, jpazdziora, tiwillia
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-07 15:42:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
mcollective.log none

Description Gaoyun Pei 2013-12-16 09:15:43 UTC
Description of problem:
When trying to create jbosseap-6 app concurrently, it would fail with a core dumped error. 
Sometime this also happened when trying to create just one scalable jbosseap app. 
But it seems that this issue just could be reproduced on my testing env, which blocks the ose-2.0 performance testing.

Version-Release number of selected component (if applicable):
2.0/2013-11-26.1
openshift-origin-cartridge-jbosseap-2.11.1-2.el6op.noarch

How reproducible:
almost always

Steps to Reproduce:
1.Modify the ProxyTimeout to 600 in 000002_openshift_origin_broker_proxy.conf to avoid timeout error, create 5 scalable jbosseap-6 app concurrently.

[root@broker ~]# for i in `seq 1 5`; do rhc app create app$i jbosseap -s -predhat& done
[1] 10099
[2] 10100
[3] 10101
[4] 10102
[5] 10103
...
...
Your application 'app2' is now available.

  URL:        http://app2-1111.stress.com/
  SSH to:     52aade1134c48ce149000001.com
  Git remote: ssh://52aade1134c48ce149000001.com/~/git/app2.git/
  Cloned to:  /root/app2

Run 'rhc show-app app2' for more details about your app.

Unable to complete the requested operation due to: The server node.stress.com that your application is running on failed to respond in time.
This may be due to a system restart..
Reference ID: 5fec8922705c05cae33c9b3e393509a5

Unable to complete the requested operation due to: The server node.stress.com that your application is running on failed to respond in time.
This may be due to a system restart..
Reference ID: a2051d7c640f37178fc53d6767ce05b1

Unable to complete the requested operation due to: The server node.stress.com that your application is running on failed to respond in time.
This may be due to a system restart..
Reference ID: 4b4c97f989d29006c907011520dfdcfe


Actual results:
Errors could be seen in the mcollective.log
------
Failed to execute: 'control start' for /var/lib/openshift/52aaad7234c48c60d60000a1/jbosseap
CLIENT_MESSAGE: Starting jbosseap cartridge
CLIENT_MESSAGE: Found 127.1.246.1:8080 listening port
CLIENT_MESSAGE: Found 127.1.246.1:9999 listening port
CLIENT_ERROR: Killed (core dumped)
------)

Expected results:
Should create apps successsfully

Additional info:
The whole mcollective log of creating app is in the attachment.

Comment 1 Gaoyun Pei 2013-12-16 09:16:59 UTC
Created attachment 837158 [details]
mcollective.log

Comment 4 Gaoyun Pei 2014-06-17 09:33:19 UTC
This issue could be reproduced on my performance testing env with puddle 2.1.z/2014-05-29.3.

When trying to create a jbosseap-6 application, would get failed as following:
...
Using jbosseap-6 (JBoss Enterprise Application Platform 6) for 'jbosseap'

Application Options
-------------------
Domain:     00
Cartridges: jbosseap-6
Gear Size:  default
Scaling:    no

Creating application 'app1' ... 
Starting jbosseap cartridge
Found 127.10.197.129:8080 listening port
Found 127.10.197.129:9999 listening port
Failed to execute: 'control start' for /var/lib/openshift/53a007cb34c48ce9ee0000ce/jbosseap
Terminated (core dumped)


After making some changes in /etc/openshift/resource_limits.conf as BZ#1043414 said, this issue disappeared, jbosseap-6 app could always be created.
limits_nproc=500
memory_limit_in_bytes=5368709120 # 5G
memory_memsw_limit_in_bytes=5473566720 # 5G + 100M (100M swap)

Comment 6 Johnny Liu 2014-06-25 08:20:29 UTC
*** Bug 1110162 has been marked as a duplicate of this bug. ***

Comment 7 Jan Pazdziora 2014-06-25 08:38:15 UTC
(In reply to Gaoyun Pei from comment #4)
> 
> After making some changes in /etc/openshift/resource_limits.conf as
> BZ#1043414 said, this issue disappeared, jbosseap-6 app could always be
> created.
> limits_nproc=500
> memory_limit_in_bytes=5368709120 # 5G
> memory_memsw_limit_in_bytes=5473566720 # 5G + 100M (100M swap)

For the record, in nonconcurrent situation on OSE 2.1, the limits_nproc change was sufficient -- bug 1110162 comment 10.