Bug 1035613

Summary: scaling up events exit with errno 11
Product: OpenShift Container Platform Reporter: Juergen Hoffmann <juhoffma>
Component: ImageStreamsAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED NOTABUG QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.2.1CC: juhoffma
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-29 10:34:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
jmeter test suite to simulate lots of users none

Description Juergen Hoffmann 2013-11-28 08:27:07 UTC
Created attachment 830067 [details]
jmeter test suite to simulate lots of users

Description of problem:
I have setup OSE 1.2.1 in my lab. For testing of scaling I have created a test application from the kitchensink example.

rhc app create -a kitchenscale -t jbosseap --from-code git://github.com/openshift/kitchensink-example.git -s

then I put some load on the gear using jmeter. I an attaching the test suite to this case.

After simulating 5000 users hitting the site with a ramp up phase of 10 seconds, only one gear was created. If I understand the configuration of haproxy and the scaling code correctly, we have a maxconn setting of 256 meaning haproxy will not accept more connections than this. Within the scaling code we do have a MAX_SESSIONS_PER_GEAR setting of 16.0 (see https://github.com/openshift/origin-server/blob/openshift-origin-release-2/cartridges/openshift-origin-cartridge-haproxy/usr/bin/haproxy_ctld.rb#L99) Looking at https://github.com/openshift/origin-server/blob/openshift-origin-release-2/cartridges/openshift-origin-cartridge-haproxy/usr/bin/haproxy_ctld.rb#L159 it seems we are trying to distribute the load between all gears and make sure we have an additional gear if we have more than 16 sessions per gear.

During my tests I monitored the scaling-events log which printed the correct usage but failed to finish the scale up events:
I, [2013-11-28T02:51:08.109463 #25422]  INFO -- : GEAR_UP - capacity: 212.5% gear_count: 1 sessions: 34 up_thresh: 90.0%
D, [2013-11-28T02:51:13.314401 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 20951 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:51:13.413059 #25422] DEBUG -- : GEAR_INFO - capacity: 306.25% gear_count: 1 sessions: 49 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 115 gear
_remove_thresh: 0/20
I, [2013-11-28T02:51:16.506160 #25422]  INFO -- : GEAR_UP - capacity: 762.5% gear_count: 1 sessions: 122 up_thresh: 90.0%
D, [2013-11-28T02:51:25.814215 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 20980 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:51:26.206974 #25422] DEBUG -- : GEAR_INFO - capacity: 131.25% gear_count: 1 sessions: 21 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 111 gear
_remove_thresh: 0/20
I, [2013-11-28T02:51:29.515573 #25422]  INFO -- : GEAR_UP - capacity: 650.0% gear_count: 1 sessions: 104 up_thresh: 90.0%
D, [2013-11-28T02:51:35.305046 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21019 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:51:35.517902 #25422] DEBUG -- : GEAR_INFO - capacity: 700.0% gear_count: 1 sessions: 112 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 114 gear
_remove_thresh: 0/20
I, [2013-11-28T02:51:38.607511 #25422]  INFO -- : GEAR_UP - capacity: 600.0% gear_count: 1 sessions: 96 up_thresh: 90.0%
D, [2013-11-28T02:51:46.104945 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21039 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:51:46.213241 #25422] DEBUG -- : GEAR_INFO - capacity: 406.25% gear_count: 1 sessions: 65 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 113 gear_remove_thresh: 0/20
I, [2013-11-28T02:51:49.412718 #25422]  INFO -- : GEAR_UP - capacity: 550.0% gear_count: 1 sessions: 88 up_thresh: 90.0%
D, [2013-11-28T02:52:00.501189 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21059 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:00.703825 #25422] DEBUG -- : GEAR_INFO - capacity: 750.0% gear_count: 1 sessions: 120 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 109 gear_remove_thresh: 0/20
I, [2013-11-28T02:52:03.901944 #25422]  INFO -- : GEAR_UP - capacity: 125.0% gear_count: 1 sessions: 20 up_thresh: 90.0%
D, [2013-11-28T02:52:12.015385 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21542 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:12.103658 #25422] DEBUG -- : GEAR_INFO - capacity: 231.25% gear_count: 1 sessions: 37 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 112 gear_remove_thresh: 0/20
I, [2013-11-28T02:52:15.217038 #25422]  INFO -- : GEAR_UP - capacity: 706.25% gear_count: 1 sessions: 113 up_thresh: 90.0%
D, [2013-11-28T02:52:23.312704 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21564 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:23.407078 #25422] DEBUG -- : GEAR_INFO - capacity: 625.0% gear_count: 1 sessions: 100 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 112 gear_remove_thresh: 0/20
I, [2013-11-28T02:52:26.509022 #25422]  INFO -- : GEAR_UP - capacity: 543.75% gear_count: 1 sessions: 87 up_thresh: 90.0%
D, [2013-11-28T02:52:34.802143 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21588 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:35.010723 #25422] DEBUG -- : GEAR_INFO - capacity: 393.75% gear_count: 1 sessions: 63 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 112 gear_remove_thresh: 0/20
I, [2013-11-28T02:52:38.105981 #25422]  INFO -- : GEAR_UP - capacity: 381.25% gear_count: 1 sessions: 61 up_thresh: 90.0%
D, [2013-11-28T02:52:45.217444 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21609 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:45.218343 #25422] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 113 gear_remove_thresh: 0/20
I, [2013-11-28T02:52:48.305904 #25422]  INFO -- : GEAR_UP - capacity: 143.75% gear_count: 1 sessions: 23 up_thresh: 90.0%
D, [2013-11-28T02:52:53.910140 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 21628 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:52:54.000785 #25422] DEBUG -- : GEAR_INFO - capacity: 206.25% gear_count: 1 sessions: 33 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 115 gear_remove_thresh: 0/20
I, [2013-11-28T02:53:03.424342 #25422]  INFO -- : GEAR_UP - capacity: 693.75% gear_count: 1 sessions: 111 up_thresh: 90.0%
D, [2013-11-28T02:53:09.609657 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 22081 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

D, [2013-11-28T02:53:09.707919 #25422] DEBUG -- : GEAR_INFO - capacity: 587.5% gear_count: 1 sessions: 94 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 114 gear_remove_thresh: 0/20
I, [2013-11-28T02:53:12.710972 #25422]  INFO -- : GEAR_UP - capacity: 256.25% gear_count: 1 sessions: 41 up_thresh: 90.0%
D, [2013-11-28T02:53:20.902993 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 22103 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

The gear count stayed at 1 although the command

09:15 $ rhc app show kitchenscale --gears
ID                       State   Cartridges               Size  SSH URL
------------------------ ------- ------------------------ ----- ------------------------------------------------------------------------------
5295eb4fe3c9c3fee1000022 started jbosseap-6.0 haproxy-1.4 small 5295eb4fe3c9c3fee1000022.gsso.redhat.com
5296f1fae3c9c3fee1000043 new     jbosseap-6.0 haproxy-1.4 small 5296f1fae3c9c3fee1000043.gsso.redhat.com

showed a new gear.

Version-Release number of selected component (if applicable):


How reproducible (Steps to reproduce):
in your OSE 1.2.1 installation run:
1. rhc app create -a kitchenscale -t jbosseap --from-code git://github.com/openshift/kitchensink-example.git -s

2. take the attached testsuite. Modify the url in HTTP Request Defaults
run the tests

3. rhc ssh kitchenscale 
4. cd haproxy/logs
5. tail -f scale-events.log

Actual results:
I, [2013-11-28T02:53:12.710972 #25422]  INFO -- : GEAR_UP - capacity: 256.25% gear_count: 1 sessions: 41 up_thresh: 90.0%
D, [2013-11-28T02:53:20.902993 #25422] DEBUG -- : GEAR_UP - add-gear: exit: pid 22103 exit 1  stdout: [FATAL] Failed to create timer thread (errno: 11)

Expected results:
Creation of new gears successfully

Additional info:

Comment 2 Juergen Hoffmann 2013-11-28 11:43:19 UTC
additional information

[kitchenscale-juhoffma.ose.gsso.redhat.com 5295eb4fe3c9c3fee1000022]\> ctl_all restart
Stopping services
Stopping gear...
can't create Thread (11)
Starting services
[FATAL] Failed to create timer thread (errno: 11)
[kitchenscale-juhoffma.ose.gsso.redhat.com 5295eb4fe3c9c3fee1000022]\> less haproxy/run/haproxy_ctld.rb.output
[kitchenscale-juhoffma.ose.gsso.redhat.com 5295eb4fe3c9c3fee1000022]\> cat haproxy/run/haproxy_ctld.rb.output
An error occurred; try again later: Failed to get information from haproxy

Comment 3 Juergen Hoffmann 2013-11-29 10:34:32 UTC
after upgrading my lab to the latest version it seems to work fine. After yum -y update on the broker and the node:

[kitchenscale-juhoffma.ose.gsso.redhat.com logs]\> ctl_all restart
Stopping services
Stopping gear...
CLIENT_RESULT: HAProxy instance is stopped
Stopping jbosseap cart
Sending SIGTERM to jboss:6957 ...
Starting services
Starting gear...
CLIENT_RESULT: HAProxy instance is started
Starting jbosseap cart
Found 127.0.252.1:8080 listening port